Page MenuHomePhabricator

Halfak (Aaron Halfaker, EpochFail, halfak)
Principal Research Scientist

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:05 PM (338 w, 2 d)
Availability
Available
IRC Nick
halfak
LDAP User
Halfak
MediaWiki User
EpochFail [ Global Accounts ]

Hi! I'm a socio-technologist. I do science so that I can build new technologies for social systems.

You can find me as:

Recent Activity

Tue, Apr 13

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

This change will look a lot like this work for ptwiki: https://github.com/wikimedia/editquality/pull/225/files

Tue, Apr 13, 5:57 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Halfak added a comment to T278723: ORES deployment - Spring 2021.

Deploy failed with the following error:

Tue, Apr 13, 4:59 PM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Thu, Apr 8

Halfak added a comment to T278723: ORES deployment - Spring 2021.

Thank you! Will run a test on beta when I get a chance and report back here.

Thu, Apr 8, 6:47 PM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Thu, Apr 1

Halfak added a comment to T277609: Generate dump of scored-revisions from 2018-2020 for English Wikipedia.

I wonder if this is related to: T104004: Can't download large datasets from datasets.wikimedia.org

Thu, Apr 1, 5:14 PM · Data-Services, artificial-intelligence, editquality-modeling, ORES, Machine-Learning-Team, Analytics

Tue, Mar 30

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

I've finally got the deployment of ORES unblocked. That was a surprising large amount of work to get things cleaned up. We're now blocked on getting this to production before we can get retrained Turkish models out. See T278723: ORES deployment - Spring 2021.

Tue, Mar 30, 4:49 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Halfak moved T278723: ORES deployment - Spring 2021 from Active to Review on the Machine-Learning-Team (Active Tasks) board.
Tue, Mar 30, 4:36 PM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak claimed T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 4:01 PM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak removed a parent task for T246909: Follow-up cleanup to topic models: T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 12:58 AM · drafttopic-modeling, Machine-Learning-Team
Halfak removed a subtask for T278723: ORES deployment - Spring 2021: T246909: Follow-up cleanup to topic models.
Tue, Mar 30, 12:58 AM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a parent task for T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available): T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 12:57 AM · Machine-Learning-Team (Active Tasks), Serbian-Sites, Growth-Scaling, Growth-Team
Halfak added a subtask for T278723: ORES deployment - Spring 2021: T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).
Tue, Mar 30, 12:57 AM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a parent task for T223782: Build article quality model for Dutch Wikipedia: T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 12:56 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a parent task for T246909: Follow-up cleanup to topic models: T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 12:56 AM · drafttopic-modeling, Machine-Learning-Team
Halfak added a parent task for T249520: Fit more topic models into ORES: T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 12:56 AM · drafttopic-modeling, Machine-Learning-Team
Halfak added subtasks for T278723: ORES deployment - Spring 2021: T223782: Build article quality model for Dutch Wikipedia, T249520: Fit more topic models into ORES, T246909: Follow-up cleanup to topic models.
Tue, Mar 30, 12:56 AM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak created T278723: ORES deployment - Spring 2021.
Tue, Mar 30, 12:53 AM · artificial-intelligence, drafttopic-modeling, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Mon, Mar 29

Halfak committed rOWCd3d777592e5c: Handles flake8 issue with nlwiki features. (authored by Halfak).
Handles flake8 issue with nlwiki features.
Mon, Mar 29, 12:45 AM
Halfak committed rOWCf8efe0f07755: Update nlwiki.py (authored by Psingh07).
Update nlwiki.py
Mon, Mar 29, 12:45 AM
Halfak committed rOWC9ab054bb5311: nlwiki update (authored by Cdrpar07 <shgcdr07@ores-misc-01.ores-staging.eqiad1.wikimedia.cloud>).
nlwiki update
Mon, Mar 29, 12:45 AM
Halfak committed rOWC1b55cc1e5210: Fixes template names for dutch citation needed (authored by Aaron Halfaker <ahalfaker@wikimedia.org>).
Fixes template names for dutch citation needed
Mon, Mar 29, 12:45 AM
Halfak committed rOWC0d73caf59808: Adds nlwiki model with basic features. (authored by Aaron Halfaker <ahalfaker@wikimedia.org>).
Adds nlwiki model with basic features.
Mon, Mar 29, 12:45 AM

Fri, Mar 26

Halfak committed R2300:08b9cebc5e01: Adds vectors for eu, hy, hu, sr, uk, and wikidatawiki (authored by Halfak).
Adds vectors for eu, hy, hu, sr, uk, and wikidatawiki
Fri, Mar 26, 9:41 AM

Thu, Mar 25

Halfak added a hashtag to Machine-Learning-Team: #scoring-platform-team.
Thu, Mar 25, 5:15 PM
Halfak added a hashtag to Machine-Learning-Team: #scoring_platform_team.
Thu, Mar 25, 5:14 PM

Mar 16 2021

Halfak updated the task description for T277609: Generate dump of scored-revisions from 2018-2020 for English Wikipedia.
Mar 16 2021, 10:13 PM · Data-Services, artificial-intelligence, editquality-modeling, ORES, Machine-Learning-Team, Analytics
Halfak updated the task description for T277609: Generate dump of scored-revisions from 2018-2020 for English Wikipedia.
Mar 16 2021, 10:13 PM · Data-Services, artificial-intelligence, editquality-modeling, ORES, Machine-Learning-Team, Analytics
Halfak created T277609: Generate dump of scored-revisions from 2018-2020 for English Wikipedia.
Mar 16 2021, 10:13 PM · Data-Services, artificial-intelligence, editquality-modeling, ORES, Machine-Learning-Team, Analytics

Mar 5 2021

Halfak added a comment to T276598: Create Draft Model Deployment Guidelines .

Currently there are no set of policies in place that candidate models (internally and externally) must meet in order to be deployed. This is highly problematic.

Mar 5 2021, 11:25 PM · AI-Governance, ORES, Lift-Wing, artificial-intelligence, Machine-Learning-Team (Active Tasks)

Feb 12 2021

Halfak added a comment to T135908: Add a possibility to delete a draft.

FWIW, I think there's a big difference between "delete" and "archive". Delete breaks links and hides past activity. Archive gets stuff I don't want to see out of the way. I think "archive" is the right metaphor here. I would hate it if someone could no longer download the results of a query because some user decided to delete it.

Feb 12 2021, 5:37 PM · Quarry

Feb 10 2021

Halfak changed the status of T117802: WikiData model: Unsupported operand type(s) for /: 'NoneType' and 'float' from Declined to Resolved.

Looks like the issue was actually resolved. I don't see this error in production anymore.

Feb 10 2021, 5:01 PM · wb_vandalism

Feb 2 2021

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Fantastic! I can work with the data you have provided to update the model. I'll try to get that work in soon but as I'm just a volunteer with a new baby, I can't give you any guarantees on when I'll be able to get to it. But a week or two seems likely at this point.

Feb 2 2021, 10:37 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Fae awarded T214201: Implement NSFW image classifier using Open NSFW a Dislike token.
Feb 2 2021, 5:26 PM · Structured-Data-Backlog, artificial-intelligence

Jan 31 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Thank you, @Mbch331! Very helpful.

Jan 31 2021, 4:09 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I just created an initial model with basic features. See https://github.com/wikimedia/articlequality/pull/162

Jan 31 2021, 3:16 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I've updated https://quarry.wmflabs.org/query/47900 to exclude redirects and limit results to main namespace pages.

Jan 31 2021, 2:16 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jan 24 2021

He7d3r awarded T155541: [Epic] Article importance prediction model a Love token.
Jan 24 2021, 10:34 PM · Research, Machine-Learning-Team, artificial-intelligence

Jan 17 2021

Halfak updated subscribers of T223782: Build article quality model for Dutch Wikipedia.

I just started working with @Psingh07's dataset.

Jan 17 2021, 4:46 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Dec 10 2020

Halfak added a comment to T256887: Enable ORES filters for ukwiki (Ukrainian Wikipedia).

Ping.

Dec 10 2020, 6:11 PM · Growth-Team (Current Sprint), Edit-Review-Improvements-Integrated-Filters, editquality-modeling, Machine-Learning-Team, artificial-intelligence
Halfak added a parent task for T256887: Enable ORES filters for ukwiki (Ukrainian Wikipedia): T130294: Deploy edit quality models for ukwiki.
Dec 10 2020, 6:10 PM · Growth-Team (Current Sprint), Edit-Review-Improvements-Integrated-Filters, editquality-modeling, Machine-Learning-Team, artificial-intelligence
Halfak added a subtask for T130294: Deploy edit quality models for ukwiki: T256887: Enable ORES filters for ukwiki (Ukrainian Wikipedia).
Dec 10 2020, 6:10 PM · artificial-intelligence, Machine-Learning-Team, editquality-modeling

Dec 1 2020

Halfak updated subscribers of T256887: Enable ORES filters for ukwiki (Ukrainian Wikipedia).

I don't think anything is blocking the deployment of the filters. This should be on the Growth-Team backlog. @MMiller_WMF, it looks like the UK Wikipedians have been waiting a while for the filters to be turned on in recent changes.

Dec 1 2020, 3:01 AM · Growth-Team (Current Sprint), Edit-Review-Improvements-Integrated-Filters, editquality-modeling, Machine-Learning-Team, artificial-intelligence

Oct 22 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

AHa! I figured out what is going on. It turns out that specific edit was reverted. We flag sysop edits for review when they are reverted just in case something weird was going on (e.g. a damaging mistake that was made in good faith). So all may be fine. Let me know if you see other examples and I'll dig into them

Oct 22 2020, 4:48 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Woops! That's definitely not right. I'm guessing there was a step in the script that didn't work as intended. I might need to re-work the sample, so hold off on labeling until we have found the issue, OK?

Oct 22 2020, 4:16 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I created this gist to show how to extract the stub class from XML dumps for nlwiki: https://gist.github.com/halfak/d3d6976dd303575f235d2a0f1e44e141

Oct 22 2020, 3:57 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Oct 21 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

As promised, I've loaded the new labeling campaign and I've included a summary of the actions I performed below for documenting this process.

Oct 21 2020, 2:59 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team

Oct 20 2020

Halfak added a comment to T102680: Investigate and remove NFS mounts in the snuggle project.

Still around. Definitely wanting to find a new maintainer though.

Oct 20 2020, 7:04 PM · cloud-services-team (Kanban), Cloud-Services

Oct 14 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Thanks for the ping. I do still have this on my todo list and I should be able to give Kevin the stuff he needs to get it done with week.

Oct 14 2020, 6:22 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team

Sep 29 2020

Halfak updated subscribers of T131553: Look into matching images of the same painting.

I think image embeddings could be a really effective strategy for this and it would also be relevant to a lot of other types of image modeling tasks. So, I figure @Miriam might be interested in it. Essentially, the idea (if I understand it) is to dedupe commons.

Sep 29 2020, 11:30 PM · artificial-intelligence, Machine-Learning-Team (Research)
Halfak reopened T131553: Look into matching images of the same painting as "Open".

Putting this on the main AI board that is designed for interesting ideas that no one is ready to pick up yet like this one.

Sep 29 2020, 11:04 PM · artificial-intelligence, Machine-Learning-Team (Research)

Sep 28 2020

Halfak added a comment to T263910: ORES redis: max number of clients reached....

I'm not familiar with this problem. Anything change with the deployment recently? Did any overload errors happen during the outage? If not, that would line up with uwsgi-level failures.

Sep 28 2020, 6:35 PM · User-Ladsgroup, Sustainability (Incident Followup), Patch-For-Review, Okapi [Wikimedia Enterprise], serviceops, SRE, ORES, Machine-Learning-Team

Sep 25 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Sorry for the delay. Just drove across a continent and I'm moving into a new house! I should be able to get back to supporting this task next week.

Sep 25 2020, 10:51 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Halfak added a comment to T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).

FWIW, I believe that @HAKSOAT built these models and that they are basically ready for deployment. The primary concern with doing that deployment was related to memory usage of the models. @HAKSOAT did a lot of work to ensure that the models would fit in memory. In fact, I expect our memory footprint to decrease with the deployment of these new models and their embeddings because they reduce the memory footprint per language by about 90%. In my last conversation with @calbon, he said he wanted to be cautious with new deployment while the team is in transition. I'm happy to make time to advise and support getting these models out the door. Feel free to reach out if/when you're ready to get a new deployment configuration together.

Sep 25 2020, 10:50 PM · Machine-Learning-Team (Active Tasks), Serbian-Sites, Growth-Scaling, Growth-Team

Sep 23 2020

Halfak added a comment to T258735: Build articlequality model for Hindi wiki.

@Navinsingh133, for what it's worth, most wikis do not have quality scales like English Wikipedia and ORES supports them anyway. There were two wikis that ORES supports that didn't have any quality criteria beforehand. Wikidata developed a quality scale from scratch and we were able to run a "labeling campaign" with Wikidata editors to provide training examples for the model. Basque Wikipedians decided to wholesale translate the English Wikipedia quality scale and then make modifications to it to suit their wiki. We also ran a "labeling campaign" with them to gather training examples. Both of those models are alive and seeing relatively heavy use today.

Sep 23 2020, 3:16 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling

Sep 16 2020

Halfak added a comment to T253038: 'endorsementcomment' is required on jadeproposeorendorse. Shouldn't be..

It might be a good idea to stick these details in a new task that can wait in the backlog until you're ready to spec out Jade v2

Sep 16 2020, 3:07 PM · Machine-Learning-Team, Jade

Sep 15 2020

Halfak awarded T152434: Add method to Revision to check if it was a Revert, and whether an edit was Reverted a Meh! token.
Sep 15 2020, 2:18 PM · Google-Summer-of-Code (2020), Growth-Team, Platform Team Legacy (Watching / External), Readers-Web-Backlog (Tracking), Product-Infrastructure-Team-Backlog, Trending-Service, Epic, MediaWiki-Page-editing, Contributors-Team, MediaWiki-Interface
Halfak added a comment to T152434: Add method to Revision to check if it was a Revert, and whether an edit was Reverted.

Fantastic! Wonderful work!

Sep 15 2020, 2:18 PM · Google-Summer-of-Code (2020), Growth-Team, Platform Team Legacy (Watching / External), Readers-Web-Backlog (Tracking), Product-Infrastructure-Team-Backlog, Trending-Service, Epic, MediaWiki-Page-editing, Contributors-Team, MediaWiki-Interface
Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

"trusted_groups" are user groups of users who we don't want to waste your time asking you to review. E.g., we can be reasonably sure that admins aren't vandalizing Wikipedia. Is that true for people who are given the Patroller right? Either way, we'll be asking you to review any edits by editors in these "trusted_groups" that were reverted just in case there was some unintentional damage involved.

Sep 15 2020, 2:15 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team

Sep 14 2020

Halfak added a comment to T251571: Build article quality model for Ukrainian Wikipedia.

I just checked a couple of those articles and the rises and falls in predicted quality tend to correspond with additions and removals of content. E.g., It looks like Комптонівське розсіювання goes up and back down in quality around substantial content deletions.

Sep 14 2020, 6:59 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling, Wikilabels

Sep 11 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Nice work on the progress @kevinbazira!

Sep 11 2020, 6:53 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team

Sep 10 2020

Halfak added a comment to T261850: compare model accuracy with and without property suggester.

It's not quite fair to compare the old an new feature sets. It does look like the property suggestor was having a minor positive effect, but that seems like it was not worth the additional API call. Everything that follows is just me nerding out about the stats.

Sep 10 2020, 4:17 PM · User-Ladsgroup, Item Quality Scoring Improvement (Item Quality Scoring Improvement - Sprint 3), Wikidata

Sep 8 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Here's a query that gathers a random sample of 20k revisions from the last year: https://quarry.wmflabs.org/query/47980

Sep 8 2020, 6:07 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

This query gets all of the articles in the A-level category: https://quarry.wmflabs.org/query/47900

Sep 8 2020, 1:52 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Sep 3 2020

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I juts looked at this with Ciell. In addition to looking for articles in the "Rough Diamonds" page, we can also look for articles that appear in a level 3 header on this page: https://nl.wikipedia.org/wiki/Wikipedia:Etalage/Aanmelding_kandidaten/Aanmeldingen but do not appear in the category of featured articles here: https://nl.wikipedia.org/wiki/Categorie:Wikipedia:Etalage-artikelen

Sep 3 2020, 3:42 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Sep 1 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

@Evrifaessa, I'd moved to a new job, so I'm not managing the backlog for ORES anymore. For the now, @calbon is responsible for prioritizing tasks like this one.

Sep 1 2020, 2:59 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team

Aug 18 2020

Halfak added a comment to T223829: Implement citation classifier features for quality models.

Also https://en.wikipedia.org/wiki/MediaWiki:Spam-whitelist

Aug 18 2020, 4:50 PM · articlequality-modeling, editquality-modeling, Machine-Learning-Team, artificial-intelligence
Halfak added a comment to T223829: Implement citation classifier features for quality models.

See also https://en.wikipedia.org/wiki/MediaWiki:Spam-blacklist

Aug 18 2020, 4:49 PM · articlequality-modeling, editquality-modeling, Machine-Learning-Team, artificial-intelligence

Aug 6 2020

Halfak added a parent task for T251571: Build article quality model for Ukrainian Wikipedia: T258435: ORES deployment Late July 2020.
Aug 6 2020, 4:18 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak added a subtask for T258435: ORES deployment Late July 2020: T251571: Build article quality model for Ukrainian Wikipedia.
Aug 6 2020, 4:18 PM · Patch-For-Review, articlequality-modeling, drafttopic-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)

Jul 20 2020

Halfak reopened T256412: Production shell access for Chris Albon as "Open".

Still waiting on deployment-prep access so that @calbon can do a beta deploy of ORES.

Jul 20 2020, 7:34 PM · Patch-For-Review, Release-Engineering-Team, Machine-Learning-Team (Active Tasks), SRE
Halfak created T258435: ORES deployment Late July 2020.
Jul 20 2020, 6:45 PM · Patch-For-Review, articlequality-modeling, drafttopic-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)
Halfak moved T256070: Rebuild drafttopic models with new smaller vectors and compare results from Done to Pending deployment on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:30 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling
Halfak moved T251571: Build article quality model for Ukrainian Wikipedia from Review to Pending deployment on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:30 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak added a comment to T251571: Build article quality model for Ukrainian Wikipedia.

Initial model is merged. https://github.com/wikimedia/articlequality/pull/140

Jul 20 2020, 4:30 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak moved T251571: Build article quality model for Ukrainian Wikipedia from Active to Review on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:29 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak moved T256070: Rebuild drafttopic models with new smaller vectors and compare results from Active to Done on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:28 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling
Halfak moved T256412: Production shell access for Chris Albon from Done to Active on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:27 PM · Patch-For-Review, Release-Engineering-Team, Machine-Learning-Team (Active Tasks), SRE
Halfak moved T256412: Production shell access for Chris Albon from Active to Done on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:26 PM · Patch-For-Review, Release-Engineering-Team, Machine-Learning-Team (Active Tasks), SRE
Halfak moved T256812: The wrong label shows up while performing an undo from Active to Review on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:26 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Machine-Learning-Team (Active Tasks), Jade
Halfak moved T256811: Flesh out mw:Jade/Edit_quality from Review to Done on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:24 PM · Machine-Learning-Team (Active Tasks), Documentation, Jade
Halfak moved T257248: Add articletopic model to testwiki from Active to Review on the Machine-Learning-Team (Active Tasks) board.
Jul 20 2020, 4:21 PM · Growth-Team (Current Sprint), Machine-Learning-Team (Active Tasks), Patch-For-Review, drafttopic-modeling, ORES, Growth-Scaling
Halfak added a comment to T111179: Tokenization of "word" things for CJK.

Given that we are likely trying to use these segmenters in order to get *signal* and not to translate or do something more exact, I'm a fan of faster, lower accuracy, and easier to install methods. It looks like Japanese will be the most difficult.

Jul 20 2020, 3:23 PM · Machine-Learning-Team (Active Tasks), Chinese-Sites, artificial-intelligence, revscoring
Halfak updated the task description for T257359: Update Turkish Wikipedia's labeling campaign for 2020.
Jul 20 2020, 1:57 PM · Turkish-Sites, artificial-intelligence, editquality-modeling, Machine-Learning-Team

Jul 17 2020

Halfak committed rOWCc4d09e2abf9e: Rebuilds ukwiki with class order. (authored by Halfak).
Rebuilds ukwiki with class order.
Jul 17 2020, 9:24 PM
Halfak added a comment to T230953: Why is jawiki's goodfaith model so bad?.

@jeena took a list at a bunch of example edits that scored as likely to be badfaith and confirmed that most of them look good. I think the right next step here is to interrogate our labeled data to see if Japanese Wikipedians would confirm or refute the "badfaith" labeled edits.

Jul 17 2020, 7:22 PM · editquality-modeling, artificial-intelligence, Machine-Learning-Team
Halfak created P11945 (An Untitled Masterwork).
Jul 17 2020, 2:39 PM
Halfak added a comment to T256812: The wrong label shows up while performing an undo.

I added a label to 438261 in the Undo interface and then loaded the undo interface for the previous edit (418935) and it showed me the label for 438261 rather than no label.

Jul 17 2020, 1:54 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Machine-Learning-Team (Active Tasks), Jade

Jul 15 2020

Halfak added a comment to T258082: Identify articles that should be de-prod'ed. .

How would we get some good labeled data for this? Is there a log event when an article is Prod'ed that we can look for?

Jul 15 2020, 4:18 PM · Machine-Learning-Team, articlequality-modeling, artificial-intelligence
Halfak created T258082: Identify articles that should be de-prod'ed. .
Jul 15 2020, 4:18 PM · Machine-Learning-Team, articlequality-modeling, artificial-intelligence
Halfak added a comment to T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).

We've managed to compress our vectors and reduce the memory footprint of ORES. That means we have space for these models and @HAKSOAT is going to start work.

Jul 15 2020, 2:43 PM · Machine-Learning-Team (Active Tasks), Serbian-Sites, Growth-Scaling, Growth-Team
Halfak reassigned T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available) from Halfak to HAKSOAT.
Jul 15 2020, 2:42 PM · Machine-Learning-Team (Active Tasks), Serbian-Sites, Growth-Scaling, Growth-Team
Halfak closed T247523: Compress Gensim models as Resolved.

We new have models that are built using the compressed vectors. They seem to give us good fitness.

Jul 15 2020, 2:41 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling
Halfak closed T247523: Compress Gensim models, a subtask of T249520: Fit more topic models into ORES, as Resolved.
Jul 15 2020, 2:41 PM · drafttopic-modeling, Machine-Learning-Team
Halfak edited projects for T247523: Compress Gensim models, added: Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.
Jul 15 2020, 2:40 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling

Jul 13 2020

Halfak moved T256412: Production shell access for Chris Albon from Done to Active on the Machine-Learning-Team (Active Tasks) board.
Jul 13 2020, 6:20 PM · Patch-For-Review, Release-Engineering-Team, Machine-Learning-Team (Active Tasks), SRE
Halfak moved T257341: Add ORES article quality predictions to the WDQS from Unorganized to Lift Wing on the Machine-Learning-Team board.
Jul 13 2020, 4:46 PM · artificial-intelligence, Wikidata, Wikidata-Query-Service, articlequality-modeling, Machine-Learning-Team
Halfak moved T246486: Design Jade pilot deployment plan with the Scoring Platform team from Unorganized to Blocked on team discussion on the Machine-Learning-Team board.
Jul 13 2020, 4:45 PM · Machine-Learning-Team, Jade
Halfak closed T247564: Experiment with Topic modeling in KubeFlow, a subtask of T226193: [Discuss] Future ORES architecture, as Declined.
Jul 13 2020, 4:45 PM · ORES, Machine-Learning-Team
Halfak closed T247564: Experiment with Topic modeling in KubeFlow as Declined.

Declining for now. We're doing a more fundamental exploration of model management frameworks and we might come back to this at some point.

Jul 13 2020, 4:45 PM · Machine-Learning-Team, drafttopic-modeling, ORES
Halfak assigned T254289: Add wikidata to articletopic pipeline to Dibyaaaaax.
Jul 13 2020, 4:43 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research
Halfak added a comment to T254356: [Spike] Implement script-optimized tokenization.

@HAKSOAT can you link to your notes?

Jul 13 2020, 4:42 PM · revscoring, Machine-Learning-Team, artificial-intelligence