@MMiller_WMF correct me if I am wrong, but this Phab task encompasses a general test API that the research team can help with "until" someone can take over our code for productionalization. Just for the growth team to start building around it and know what to expect as output, start giving feedback, spot corner cases, etc. We created this task to clearly distinguish it from the work required to port the model into prod.

Aug 13 2020, 10:12 AM · Growth-Structured-Tasks, Growth-Team

Aug 5 2020

DED updated the task description for T236299: Port sock-puppet detection model in-house.

Aug 5 2020, 12:55 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T236299: Port sock-puppet detection model in-house.

@kaldari The current model is ready, at least as a first iteration. I am in the process of handing over the code and have someone test it internally. @Niharika may know more about the specifics of the deployment responsibilities, is this something you can help with?
Also, we have the same constraints that @Ladsgroup brought up.

Aug 5 2020, 12:55 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

Jun 22 2020

DED added a comment to T236299: Port sock-puppet detection model in-house.

Talk pages are now included in the data.
I generate a new contribution graph. It's a bipartite of (users) and (wikipages/talkpages) with edit edges (weighted with the number of edits)
I tried multiple graph mining algorithms on the contribution graph to detect "sub-communities". So far, these techniques either didn't improve the performances, or the algos didn't scale to the data.

Jun 22 2020, 8:55 AM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T242666: Template recommendation exploratory research work.

no updates

Jun 22 2020, 8:46 AM · Research-Freezer

Jun 19 2020

DED updated the task description for T246837: Research Showcase June 2020.

Jun 19 2020, 6:29 PM · Research (FY2019-20-Research-April-June)

Jun 17 2020

DED updated the task description for T246837: Research Showcase June 2020.

Jun 17 2020, 10:03 PM · Research (FY2019-20-Research-April-June)

Jun 15 2020

DED added a comment to T242666: Template recommendation exploratory research work.

Deployed a model to recommend properties for Wikidata. This could be an idea for template recommendation in wiki-pages.
https://tools.wmflabs.org/wiki2prop/?subject=Q201787&lang=en&n=5

Jun 15 2020, 2:21 PM · Research-Freezer

DED added a comment to T236299: Port sock-puppet detection model in-house.

updates:

Tested a new model by adding concept-vectors and interaction graph.
The model is now slightly more difficult to interpret but achieves a better AUC (75%), using XGBoost.
Refactored the data preparation code in Scala. The code is much more scalable and can regenerate the necessary training data in 1 days on our analytics cluster.
Discussed with the product team the api endpoints and the potential env. for deployment (ORES ?)

Jun 15 2020, 2:15 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

Jun 8 2020

DED added a comment to T242664: Submit work on section alignement .

The paper was submitted to RecSys

Jun 8 2020, 4:24 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242666: Template recommendation exploratory research work.

no updates

Jun 8 2020, 4:23 PM · Research-Freezer

DED added a comment to T236299: Port sock-puppet detection model in-house.

updates:

I was finally able to process a large enough view of wikipedia history (from 2015 onwards). This should match with the SSO rollout to use the user_text as a unique id across wikis.
Transitioned to a new model based on word analysis to accommodate multiple wikis. I'll check what this is capable of. Basically, I gave up on sentiment analysis.

Jun 8 2020, 4:04 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

Jun 5 2020

DED added a comment to T111775: Infoboxes are mistaken for abstracts in page abstract dumps..

Not a problem. Could you possibly point me to the source code?

Jun 5 2020, 6:50 PM · ActiveAbstract, TextExtracts, Research-Freezer, Dumps-Generation

DED updated subscribers of T111775: Infoboxes are mistaken for abstracts in page abstract dumps..

hi @ArielGlenn do you know who maintains this dataset?

Jun 5 2020, 4:14 PM · ActiveAbstract, TextExtracts, Research-Freezer, Dumps-Generation

Jun 2 2020

DED updated the task description for T246837: Research Showcase June 2020.

Jun 2 2020, 4:13 PM · Research (FY2019-20-Research-April-June)

DED updated the task description for T246837: Research Showcase June 2020.

Jun 2 2020, 4:09 PM · Research (FY2019-20-Research-April-June)

Jun 1 2020

DED added a comment to T236299: Port sock-puppet detection model in-house.

Hello @srijan. I didn't compute these metrics. Basically, processing only parts of enwiki creates an incomplete fingerprint for any user. Unless my current effort in making a pass on the full data succeeds, I plan on sampling users to obtain targeted full edit history.

Jun 1 2020, 10:24 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T242664: Submit work on section alignement .

Abstract uploaded, and work ongoing to restructure the paper for recsys.

Jun 1 2020, 9:59 PM · Research (FY2019-20-Research-April-June)

DED updated the task description for T246837: Research Showcase June 2020.

Jun 1 2020, 9:58 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T246837: Research Showcase June 2020.

Theme: Credibility and Verifiability
Two speakers confirmed: Connie Moon Sehat and Tiziano Piccardi

Jun 1 2020, 9:57 PM · Research (FY2019-20-Research-April-June)

DED updated the task description for T246837: Research Showcase June 2020.

Jun 1 2020, 9:55 PM · Research (FY2019-20-Research-April-June)

DED closed T246835: Research Showcase March 2020 as Resolved.

Jun 1 2020, 9:53 PM · Research

DED updated the task description for T246835: Research Showcase March 2020.

Jun 1 2020, 9:53 PM · Research

DED closed T246836: Research Showcase May 2020 as Resolved.

Jun 1 2020, 9:52 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T236299: Port sock-puppet detection model in-house.

First model is ready but with relatively low performance (~60% AUC). It was trained on a subset of the data in the english language. Calculating all-time edit diffs remains a challenge for such a large Wiki.
- Ongoing work on tuning the model to improve the results.

Jun 1 2020, 9:50 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T242666: Template recommendation exploratory research work.

no updates

Jun 1 2020, 9:44 PM · Research-Freezer

DED updated the task description for T246836: Research Showcase May 2020.

Jun 1 2020, 7:26 PM · Research (FY2019-20-Research-April-June)

May 25 2020

DED added a comment to T242664: Submit work on section alignement .

No updates yet, but I might submit the abstract today.

May 25 2020, 11:32 AM · Research (FY2019-20-Research-April-June)

DED added a comment to T246836: Research Showcase May 2020.

May's showcase is concluded.

May 25 2020, 11:27 AM · Research (FY2019-20-Research-April-June)

DED added a comment to T242666: Template recommendation exploratory research work.

no updates

May 25 2020, 11:24 AM · Research-Freezer

DED updated the task description for T236299: Port sock-puppet detection model in-house.

May 25 2020, 11:24 AM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T236299: Port sock-puppet detection model in-house.

Continued progress in building the model and preparing for the demo.
Meeting with Amir and Niharika: We discussed the potential of integrating his code, ethical considerations, and the features that can be added/hidden.

May 25 2020, 11:23 AM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED changed the status of T253484: Revision_text field of mediawiki_wikitext_current is Not properly mapped from Invalid to Resolved.

To run the query on Hive if some fields contain a newline char:

May 25 2020, 8:40 AM · Analytics

May 24 2020

DED added a comment to T111775: Infoboxes are mistaken for abstracts in page abstract dumps..

I came to report this issue and I found that it exists since 2015.
My estimate is that half of the abstracts dataset does not contain any info at all, but rather few bytes from the info-boxes. A simple parsing issue I assume.

May 24 2020, 10:44 PM · ActiveAbstract, TextExtracts, Research-Freezer, Dumps-Generation

DED created T253484: Revision_text field of mediawiki_wikitext_current is Not properly mapped.

May 24 2020, 7:20 PM · Analytics

May 19 2020

DED updated the task description for T246836: Research Showcase May 2020.

May 19 2020, 8:41 AM · Research (FY2019-20-Research-April-June)

DED updated the task description for T246836: Research Showcase May 2020.

May 19 2020, 8:40 AM · Research (FY2019-20-Research-April-June)

May 15 2020

DED added a comment to T236299: Port sock-puppet detection model in-house.

Progress in building the model and updating the code.
Setup a timeline for deployment.

May 15 2020, 4:05 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T242666: Template recommendation exploratory research work.

no updates

May 15 2020, 4:02 PM · Research-Freezer

DED added a comment to T242664: Submit work on section alignement .

No specific updates here. (note: abstract due May 25th)

May 15 2020, 4:01 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T246836: Research Showcase May 2020.

Gathered the abstracts and preparing for the showcase next week.

May 15 2020, 3:59 PM · Research (FY2019-20-Research-April-June)

DED updated the task description for T246836: Research Showcase May 2020.

May 15 2020, 7:52 AM · Research (FY2019-20-Research-April-June)

May 10 2020

QEDK awarded T236299: Port sock-puppet detection model in-house a Like token.

May 10 2020, 9:23 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

May 9 2020

DED added a comment to T246836: Research Showcase May 2020.

I am waiting to receive the abstracts/titles, I'll update here when I receive them.

May 9 2020, 4:45 AM · Research (FY2019-20-Research-April-June)

DED added a comment to T236299: Port sock-puppet detection model in-house.

Building a new ground truth dataset from archived SPI reports

May 9 2020, 4:42 AM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T242666: Template recommendation exploratory research work.

no updates

May 9 2020, 4:40 AM · Research-Freezer

DED added a comment to T242664: Submit work on section alignement .

no updates for last week. I expect to start this work and send an email around with status mid-week.

May 9 2020, 4:40 AM · Research (FY2019-20-Research-April-June)

May 5 2020

DED added a comment to T241798: Research showcase improvements for 2020.

Step 5: Yes, we slightly tailor the communication with the speakers to clarify this. I will add a short paragraph on relevance in the email.
Step 7: Good idea.

May 5 2020, 2:08 PM · Research (FY2019-20-Research-January-March)

May 4 2020

DED added a comment to T236299: Port sock-puppet detection model in-house.

Re-implemented most of the code now but missing training data and "embedding" pipeline for users.
Gathering recent sock-puppet investigation outcomes for training.

May 4 2020, 3:15 PM · Research (FY2019-20-Research-April-June), Anti-Harassment, artificial-intelligence

DED added a comment to T242666: Template recommendation exploratory research work.

No updates

May 4 2020, 3:12 PM · Research-Freezer

DED added a comment to T242664: Submit work on section alignement .

no updates (RecSys deadline June 1st)

May 4 2020, 3:11 PM · Research (FY2019-20-Research-April-June)

Mar 23 2020

DED added a comment to T242666: Template recommendation exploratory research work.

No news here.

Mar 23 2020, 3:56 PM · Research-Freezer

DED added a comment to T242665: Link recommendation.

Weekly update:
Trying link-graph embedding methods for link recommendation.

Mar 23 2020, 3:55 PM · Research (FY2019-20-Research-April-June)

Mar 16 2020

DED added a comment to T242663: Research showcase take-over transition.

Weekly update:
finalize the details of the March showcase (zoom, team meeting, moderation etc.)

Mar 16 2020, 3:57 PM · Research (FY2019-20-Research-January-March)

DED added a comment to T242664: Submit work on section alignement .

no updates.

Mar 16 2020, 3:55 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242665: Link recommendation.

Weekly update:
Refining the code and the documentation.

Mar 16 2020, 3:55 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242666: Template recommendation exploratory research work.

Weekly update:
I didn't get to discuss a formal collaboration yet. The above is just building a dataset, which is handled by a student, with some ideas from me (this may still lead to a "resource" paper).
Other exploratory work that may impact this OKR is ongoing: I am working on graph embeddings, which is closely related.

Mar 16 2020, 3:53 PM · Research-Freezer

Mar 9 2020

DED added a comment to T242663: Research showcase take-over transition.

Planning a team update during the 3/18 showcase where the theme is topic models

Mar 9 2020, 4:14 PM · Research (FY2019-20-Research-January-March)

DED added a comment to T242666: Template recommendation exploratory research work.

Weekly update: exploring ideas on mapping Wikidata to Wikipedia sections using title fuzzy matching.

Mar 9 2020, 3:48 PM · Research-Freezer

DED added a comment to T242664: Submit work on section alignement .

no updates. Needs to finish paper on section alignment in March, but submission might go to April (recsys?)

Mar 9 2020, 3:46 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242665: Link recommendation.

Weekly update:
Built the model for English and annotated provided articles,

Mar 9 2020, 3:46 PM · Research (FY2019-20-Research-April-June)

Mar 2 2020

DED added a comment to T242664: Submit work on section alignement .

no updates.

Mar 2 2020, 4:57 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242663: Research showcase take-over transition.

Weekly update: Nothing to report this week.

Mar 2 2020, 4:52 PM · Research (FY2019-20-Research-January-March)

DED added a comment to T242665: Link recommendation.

Weekly update:
Gave a demo to the growth team with the presence of some ambassadors. Training the model for English articles for internal evaluation.

Mar 2 2020, 4:51 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242666: Template recommendation exploratory research work.

Weekly update: no further work on this front yet.

Mar 2 2020, 4:47 PM · Research-Freezer

Feb 24 2020

DED added a comment to T242663: Research showcase take-over transition.

Weekly update: I booked a speaker for March on multilang NLP (Jordan Boyd-Graber).

Feb 24 2020, 5:00 PM · Research (FY2019-20-Research-January-March)

DED added a comment to T242664: Submit work on section alignement .

Weekly update:

Feb 24 2020, 4:58 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242666: Template recommendation exploratory research work.

Weekly update: no further work on this front yet. Discussing with the growth team the potential for article structure recommendation.

Feb 24 2020, 4:57 PM · Research-Freezer

DED added a comment to T242665: Link recommendation.

Finished a new version of the link rec, and sent it out for ambassadors evaluation. Next step, discuss deployment details with the growth team.

Feb 24 2020, 4:56 PM · Research (FY2019-20-Research-April-June)

Feb 17 2020

DED added a comment to T245330: Add a link: evaluate link recommendation (Feb 14 2020).

@PPham Yes, I understand, this is extremely helpful, thank you so much. I will incorporate this observation.

Feb 17 2020, 5:31 PM · Growth-Structured-Tasks, User-Urbanecm, User-Dyolf77, Growth-Team (Sprint 0 (Growth Team))

Feb 15 2020

DED added a comment to T245330: Add a link: evaluate link recommendation (Feb 14 2020).

@PPham Thanks for the feedback! I am separating words that are formatted or in quotes because I took it that the added style is meant to highlight (or single out) the word. This is more of a rule that I apply to all languages.
More articles will come. Thanks again.

Feb 15 2020, 7:35 PM · Growth-Structured-Tasks, User-Urbanecm, User-Dyolf77, Growth-Team (Sprint 0 (Growth Team))

Feb 10 2020

DED added a comment to T242664: Submit work on section alignement .

Weekly update:

Feb 10 2020, 4:53 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242665: Link recommendation.

Working on an updated version of the linkrec, entity detection in text wasn't satisfactory.

Feb 10 2020, 4:48 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242663: Research showcase take-over transition.

Weekly update: booked speaker for April on Human-ML, and looking for others on topic modeling

Feb 10 2020, 4:47 PM · Research (FY2019-20-Research-January-March)

Jan 22 2020

DED added a comment to T242665: Link recommendation.

Weekly update:
Compiled a list of actions from the feedback received.
Generate links for English articles for evaluation.

Jan 22 2020, 5:25 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242663: Research showcase take-over transition.

Weekly update: synced with Jonathan to brainstorm a list of themes for the next 6 months

Jan 22 2020, 5:20 PM · Research (FY2019-20-Research-January-March)

DED added a comment to T242664: Submit work on section alignement .

Weekly update:

I got to a point where I can rerun Diego's code (not PSL though) and reproduce the current numbers. Modification to make cross-validation turned out to be more challenging than expected (not enough space, and time-consuming). I am re-engineering the data pipeline.
Postponed the planned submission to February.

Jan 22 2020, 5:16 PM · Research (FY2019-20-Research-April-June)

DED added a comment to T242666: Template recommendation exploratory research work.

Weekly update: doing cross related work review for this task as it is relevant to the section alignment. A first attempt is underway leveraging research on property recommendations for wikidata items.

Jan 22 2020, 5:09 PM · Research-Freezer