Page MenuHomePhabricator

Halfak (Aaron Halfaker, EpochFail, halfak)
Principal Research Scientist

Projects (18)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:05 PM (313 w, 4 d)
Availability
Available
IRC Nick
halfak
LDAP User
Halfak
MediaWiki User
EpochFail [ Global Accounts ]

Hi! I'm a socio-technologist. I do science so that I can build new technologies for social systems.

You can find me as:

Recent Activity

Thu, Oct 22

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

AHa! I figured out what is going on. It turns out that specific edit was reverted. We flag sysop edits for review when they are reverted just in case something weird was going on (e.g. a damaging mistake that was made in good faith). So all may be fine. Let me know if you see other examples and I'll dig into them

Thu, Oct 22, 4:48 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Woops! That's definitely not right. I'm guessing there was a step in the script that didn't work as intended. I might need to re-work the sample, so hold off on labeling until we have found the issue, OK?

Thu, Oct 22, 4:16 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I created this gist to show how to extract the stub class from XML dumps for nlwiki: https://gist.github.com/halfak/d3d6976dd303575f235d2a0f1e44e141

Thu, Oct 22, 3:57 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine Learning Platform

Wed, Oct 21

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

As promised, I've loaded the new labeling campaign and I've included a summary of the actions I performed below for documenting this process.

Wed, Oct 21, 2:59 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform

Tue, Oct 20

Halfak added a comment to T102680: Investigate and remove NFS mounts in the snuggle project.

Still around. Definitely wanting to find a new maintainer though.

Tue, Oct 20, 7:04 PM · cloud-services-team (Kanban), Cloud-Services

Wed, Oct 14

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Thanks for the ping. I do still have this on my todo list and I should be able to give Kevin the stuff he needs to get it done with week.

Wed, Oct 14, 6:22 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform

Tue, Sep 29

Halfak updated subscribers of T131553: Look into matching images of the same painting.

I think image embeddings could be a really effective strategy for this and it would also be relevant to a lot of other types of image modeling tasks. So, I figure @Miriam might be interested in it. Essentially, the idea (if I understand it) is to dedupe commons.

Tue, Sep 29, 11:30 PM · artificial-intelligence, Machine Learning Platform (Research)
Halfak reopened T131553: Look into matching images of the same painting as "Open".

Putting this on the main AI board that is designed for interesting ideas that no one is ready to pick up yet like this one.

Tue, Sep 29, 11:04 PM · artificial-intelligence, Machine Learning Platform (Research)

Mon, Sep 28

Halfak added a comment to T263910: ORES redis: max number of clients reached....

I'm not familiar with this problem. Anything change with the deployment recently? Did any overload errors happen during the outage? If not, that would line up with uwsgi-level failures.

Mon, Sep 28, 6:35 PM · Sustainability (Incident Followup), Patch-For-Review, Okapi, serviceops, Operations, ORES, Machine Learning Platform

Fri, Sep 25

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Sorry for the delay. Just drove across a continent and I'm moving into a new house! I should be able to get back to supporting this task next week.

Fri, Sep 25, 10:51 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak added a comment to T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).

FWIW, I believe that @HAKSOAT built these models and that they are basically ready for deployment. The primary concern with doing that deployment was related to memory usage of the models. @HAKSOAT did a lot of work to ensure that the models would fit in memory. In fact, I expect our memory footprint to decrease with the deployment of these new models and their embeddings because they reduce the memory footprint per language by about 90%. In my last conversation with @calbon, he said he wanted to be cautious with new deployment while the team is in transition. I'm happy to make time to advise and support getting these models out the door. Feel free to reach out if/when you're ready to get a new deployment configuration together.

Fri, Sep 25, 10:50 PM · Machine Learning Platform (Current), Serbian-Sites, Growth-Scaling, Growth-Team

Sep 23 2020

Halfak added a comment to T258735: Build articlequality model for Hindi wiki.

@Navinsingh133, for what it's worth, most wikis do not have quality scales like English Wikipedia and ORES supports them anyway. There were two wikis that ORES supports that didn't have any quality criteria beforehand. Wikidata developed a quality scale from scratch and we were able to run a "labeling campaign" with Wikidata editors to provide training examples for the model. Basque Wikipedians decided to wholesale translate the English Wikipedia quality scale and then make modifications to it to suit their wiki. We also ran a "labeling campaign" with them to gather training examples. Both of those models are alive and seeing relatively heavy use today.

Sep 23 2020, 3:16 PM · Machine Learning Platform (Current), artificial-intelligence, articlequality-modeling

Sep 16 2020

Halfak added a comment to T253038: 'endorsementcomment' is required on jadeproposeorendorse. Shouldn't be..

It might be a good idea to stick these details in a new task that can wait in the backlog until you're ready to spec out Jade v2

Sep 16 2020, 3:07 PM · Machine Learning Platform, Jade

Sep 15 2020

Halfak awarded T152434: Add method to Revision to check if it was a Revert, and whether an edit was Reverted a Meh! token.
Sep 15 2020, 2:18 PM · Google-Summer-of-Code (2020), Growth-Team, Platform Team Legacy (Watching / External), Readers-Web-Backlog (Tracking), Product-Infrastructure-Team-Backlog, Trending-Service, Epic, MediaWiki-Page-editing, Contributors-Team, MediaWiki-Interface
Halfak added a comment to T152434: Add method to Revision to check if it was a Revert, and whether an edit was Reverted.

Fantastic! Wonderful work!

Sep 15 2020, 2:18 PM · Google-Summer-of-Code (2020), Growth-Team, Platform Team Legacy (Watching / External), Readers-Web-Backlog (Tracking), Product-Infrastructure-Team-Backlog, Trending-Service, Epic, MediaWiki-Page-editing, Contributors-Team, MediaWiki-Interface
Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

"trusted_groups" are user groups of users who we don't want to waste your time asking you to review. E.g., we can be reasonably sure that admins aren't vandalizing Wikipedia. Is that true for people who are given the Patroller right? Either way, we'll be asking you to review any edits by editors in these "trusted_groups" that were reverted just in case there was some unintentional damage involved.

Sep 15 2020, 2:15 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform

Sep 14 2020

Halfak added a comment to T251571: Build article quality model for Ukrainian Wikipedia.

I just checked a couple of those articles and the rises and falls in predicted quality tend to correspond with additions and removals of content. E.g., It looks like Комптонівське розсіювання goes up and back down in quality around substantial content deletions.

Sep 14 2020, 6:59 PM · Machine Learning Platform (Current), artificial-intelligence, articlequality-modeling, Wikilabels

Sep 11 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Nice work on the progress @kevinbazira!

Sep 11 2020, 6:53 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform

Sep 10 2020

Halfak added a comment to T261850: compare model accuracy with and without property suggester.

It's not quite fair to compare the old an new feature sets. It does look like the property suggestor was having a minor positive effect, but that seems like it was not worth the additional API call. Everything that follows is just me nerding out about the stats.

Sep 10 2020, 4:17 PM · User-Ladsgroup, Item Quality Scoring Improvement (Item Quality Scoring Improvement - Sprint 3), Wikidata

Sep 8 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

Here's a query that gathers a random sample of 20k revisions from the last year: https://quarry.wmflabs.org/query/47980

Sep 8 2020, 6:07 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

This query gets all of the articles in the A-level category: https://quarry.wmflabs.org/query/47900

Sep 8 2020, 1:52 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine Learning Platform

Sep 3 2020

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I juts looked at this with Ciell. In addition to looking for articles in the "Rough Diamonds" page, we can also look for articles that appear in a level 3 header on this page: https://nl.wikipedia.org/wiki/Wikipedia:Etalage/Aanmelding_kandidaten/Aanmeldingen but do not appear in the category of featured articles here: https://nl.wikipedia.org/wiki/Categorie:Wikipedia:Etalage-artikelen

Sep 3 2020, 3:42 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine Learning Platform

Sep 1 2020

Halfak added a comment to T257359: Update Turkish Wikipedia's labeling campaign for 2020.

@Evrifaessa, I'd moved to a new job, so I'm not managing the backlog for ORES anymore. For the now, @calbon is responsible for prioritizing tasks like this one.

Sep 1 2020, 2:59 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform

Aug 18 2020

Halfak added a comment to T223829: Implement citation classifier features for quality models.

Also https://en.wikipedia.org/wiki/MediaWiki:Spam-whitelist

Aug 18 2020, 4:50 PM · articlequality-modeling, editquality-modeling, Machine Learning Platform, artificial-intelligence
Halfak added a comment to T223829: Implement citation classifier features for quality models.

See also https://en.wikipedia.org/wiki/MediaWiki:Spam-blacklist

Aug 18 2020, 4:49 PM · articlequality-modeling, editquality-modeling, Machine Learning Platform, artificial-intelligence

Aug 6 2020

Halfak added a parent task for T251571: Build article quality model for Ukrainian Wikipedia: T258435: ORES deployment Late July 2020.
Aug 6 2020, 4:18 PM · Machine Learning Platform (Current), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak added a subtask for T258435: ORES deployment Late July 2020: T251571: Build article quality model for Ukrainian Wikipedia.
Aug 6 2020, 4:18 PM · Patch-For-Review, articlequality-modeling, drafttopic-modeling, ORES, artificial-intelligence, Machine Learning Platform (Current)

Jul 20 2020

Halfak reopened T256412: Production shell access for Chris Albon as "Open".

Still waiting on deployment-prep access so that @calbon can do a beta deploy of ORES.

Jul 20 2020, 7:34 PM · Patch-For-Review, Release-Engineering-Team, Machine Learning Platform (Current), Operations
Halfak created T258435: ORES deployment Late July 2020.
Jul 20 2020, 6:45 PM · Patch-For-Review, articlequality-modeling, drafttopic-modeling, ORES, artificial-intelligence, Machine Learning Platform (Current)
Halfak moved T256070: Rebuild drafttopic models with new smaller vectors and compare results from Done to Pending deployment on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:30 PM · Machine Learning Platform (Current), drafttopic-modeling
Halfak moved T251571: Build article quality model for Ukrainian Wikipedia from Review to Pending deployment on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:30 PM · Machine Learning Platform (Current), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak added a comment to T251571: Build article quality model for Ukrainian Wikipedia.

Initial model is merged. https://github.com/wikimedia/articlequality/pull/140

Jul 20 2020, 4:30 PM · Machine Learning Platform (Current), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak moved T251571: Build article quality model for Ukrainian Wikipedia from Active to Review on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:29 PM · Machine Learning Platform (Current), artificial-intelligence, articlequality-modeling, Wikilabels
Halfak moved T256070: Rebuild drafttopic models with new smaller vectors and compare results from Active to Done on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:28 PM · Machine Learning Platform (Current), drafttopic-modeling
Halfak moved T256412: Production shell access for Chris Albon from Done to Active on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:27 PM · Patch-For-Review, Release-Engineering-Team, Machine Learning Platform (Current), Operations
Halfak moved T256412: Production shell access for Chris Albon from Active to Done on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:26 PM · Patch-For-Review, Release-Engineering-Team, Machine Learning Platform (Current), Operations
Halfak moved T256812: The wrong label shows up while performing an undo from Active to Review on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:26 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Machine Learning Platform (Current), Jade
Halfak moved T256811: Flesh out mw:Jade/Edit_quality from Review to Done on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:24 PM · Machine Learning Platform (Current), Documentation, Jade
Halfak moved T257248: Add articletopic model to testwiki from Active to Review on the Machine Learning Platform (Current) board.
Jul 20 2020, 4:21 PM · Growth-Team (Current Sprint), Machine Learning Platform (Current), Patch-For-Review, drafttopic-modeling, ORES, Growth-Scaling
Halfak added a comment to T111179: Tokenization of "word" things for CJK.

Given that we are likely trying to use these segmenters in order to get *signal* and not to translate or do something more exact, I'm a fan of faster, lower accuracy, and easier to install methods. It looks like Japanese will be the most difficult.

Jul 20 2020, 3:23 PM · Machine Learning Platform (Current), Chinese-Sites, artificial-intelligence, revscoring
Halfak updated the task description for T257359: Update Turkish Wikipedia's labeling campaign for 2020.
Jul 20 2020, 1:57 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform

Jul 17 2020

Halfak committed rOWCc4d09e2abf9e: Rebuilds ukwiki with class order. (authored by Halfak).
Rebuilds ukwiki with class order.
Jul 17 2020, 9:24 PM
Halfak added a comment to T230953: Why is jawiki's goodfaith model so bad?.

@jeena took a list at a bunch of example edits that scored as likely to be badfaith and confirmed that most of them look good. I think the right next step here is to interrogate our labeled data to see if Japanese Wikipedians would confirm or refute the "badfaith" labeled edits.

Jul 17 2020, 7:22 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak created P11945 (An Untitled Masterwork).
Jul 17 2020, 2:39 PM
Halfak added a comment to T256812: The wrong label shows up while performing an undo.

I added a label to 438261 in the Undo interface and then loaded the undo interface for the previous edit (418935) and it showed me the label for 438261 rather than no label.

Jul 17 2020, 1:54 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Machine Learning Platform (Current), Jade

Jul 15 2020

Halfak added a comment to T258082: Identify articles that should be de-prod'ed. .

How would we get some good labeled data for this? Is there a log event when an article is Prod'ed that we can look for?

Jul 15 2020, 4:18 PM · Machine Learning Platform, articlequality-modeling, artificial-intelligence
Halfak created T258082: Identify articles that should be de-prod'ed. .
Jul 15 2020, 4:18 PM · Machine Learning Platform, articlequality-modeling, artificial-intelligence
Halfak added a comment to T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).

We've managed to compress our vectors and reduce the memory footprint of ORES. That means we have space for these models and @HAKSOAT is going to start work.

Jul 15 2020, 2:43 PM · Machine Learning Platform (Current), Serbian-Sites, Growth-Scaling, Growth-Team
Halfak reassigned T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available) from Halfak to HAKSOAT.
Jul 15 2020, 2:42 PM · Machine Learning Platform (Current), Serbian-Sites, Growth-Scaling, Growth-Team
Halfak closed T247523: Compress Gensim models as Resolved.

We new have models that are built using the compressed vectors. They seem to give us good fitness.

Jul 15 2020, 2:41 PM · Machine Learning Platform (Current), drafttopic-modeling
Halfak closed T247523: Compress Gensim models, a subtask of T249520: Fit more topic models into ORES, as Resolved.
Jul 15 2020, 2:41 PM · drafttopic-modeling, Machine Learning Platform
Halfak edited projects for T247523: Compress Gensim models, added: Machine Learning Platform (Current); removed Machine Learning Platform.
Jul 15 2020, 2:40 PM · Machine Learning Platform (Current), drafttopic-modeling

Jul 13 2020

Halfak moved T256412: Production shell access for Chris Albon from Done to Active on the Machine Learning Platform (Current) board.
Jul 13 2020, 6:20 PM · Patch-For-Review, Release-Engineering-Team, Machine Learning Platform (Current), Operations
Halfak moved T257341: Add ORES article quality predictions to the WDQS from Untriaged to Monitor on the Machine Learning Platform board.
Jul 13 2020, 4:46 PM · artificial-intelligence, Wikidata, Wikidata-Query-Service, articlequality-modeling, Machine Learning Platform
Halfak moved T246486: Design Jade pilot deployment plan with the Scoring Platform team from Untriaged to Blocked on team discussion on the Machine Learning Platform board.
Jul 13 2020, 4:45 PM · CommRel-Specialists-Support (Oct-Dec-2020), Machine Learning Platform, Jade
Halfak closed T247564: Experiment with Topic modeling in KubeFlow, a subtask of T226193: [Discuss] Future ORES architecture, as Declined.
Jul 13 2020, 4:45 PM · ORES, Machine Learning Platform
Halfak closed T247564: Experiment with Topic modeling in KubeFlow as Declined.

Declining for now. We're doing a more fundamental exploration of model management frameworks and we might come back to this at some point.

Jul 13 2020, 4:45 PM · Machine Learning Platform, drafttopic-modeling, ORES
Halfak assigned T254289: Add wikidata to articletopic pipeline to Dibyaaaaax.
Jul 13 2020, 4:43 PM · drafttopic-modeling, Machine Learning Platform (Current), Research
Halfak added a comment to T254356: [Spike] Implement script-optimized tokenization.

@HAKSOAT can you link to your notes?

Jul 13 2020, 4:42 PM · revscoring, artificial-intelligence, Machine Learning Platform
Halfak moved T254785: Missing observations from eswikiquote from Active to Done on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:41 PM · Machine Learning Platform (Current), Spanish-Sites, editquality-modeling, artificial-intelligence
Halfak closed T254785: Missing observations from eswikiquote as Resolved.

Thanks for the information @MarcoAurelio. Given that this is an expected deletion of data, we're going to resolve this.

Jul 13 2020, 4:41 PM · Machine Learning Platform (Current), Spanish-Sites, editquality-modeling, artificial-intelligence
Halfak triaged T231214: Refactor revscoring to handle session-orientation as Lowest priority.
Jul 13 2020, 4:39 PM · Machine Learning Platform, revscoring, artificial-intelligence
Halfak moved T256085: Article Quality tokenizer error from Untriaged to Monitor on the Machine Learning Platform board.
Jul 13 2020, 4:38 PM · Machine Learning Platform
Halfak added a comment to T256085: Article Quality tokenizer error.

I filed an issue against mwparserfromhell. https://github.com/earwig/mwparserfromhell/issues/248

Jul 13 2020, 4:37 PM · Machine Learning Platform
Halfak claimed T257248: Add articletopic model to testwiki.
Jul 13 2020, 4:36 PM · Growth-Team (Current Sprint), Machine Learning Platform (Current), Patch-For-Review, drafttopic-modeling, ORES, Growth-Scaling
Halfak triaged T257359: Update Turkish Wikipedia's labeling campaign for 2020 as Medium priority.

@calbon this is the Wikilabels task we talked about at backlog grooming.

Jul 13 2020, 4:36 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak moved T257681: Add the topic taxonomy to Jade from Untriaged to Maintenance/cleanup on the Machine Learning Platform board.
Jul 13 2020, 4:34 PM · Jade, drafttopic-modeling, Machine Learning Platform
Halfak triaged T257681: Add the topic taxonomy to Jade as Medium priority.
Jul 13 2020, 4:33 PM · Jade, drafttopic-modeling, Machine Learning Platform
Halfak moved T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) from Active to Done on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:29 PM · Patch-For-Review, Machine Learning Platform (Current), Operations, ORES
Halfak placed T246486: Design Jade pilot deployment plan with the Scoring Platform team up for grabs.
Jul 13 2020, 4:29 PM · CommRel-Specialists-Support (Oct-Dec-2020), Machine Learning Platform, Jade
Halfak edited projects for T210268: Build blubber file for ORES, added: Machine Learning Platform; removed Machine Learning Platform (Current).
Jul 13 2020, 4:29 PM · Machine Learning Platform, Release Pipeline (Blubber), Operations, ORES
Halfak moved T256412: Production shell access for Chris Albon from Active to Done on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:28 PM · Patch-For-Review, Release-Engineering-Team, Machine Learning Platform (Current), Operations
Halfak moved T256800: Narrow the Jade box that appears when undoing an edit from Active to Review on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:26 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Machine Learning Platform (Current), Jade
Halfak moved T256060: Jade highlighting doesn't work from Review to Done on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:26 PM · MW-1.35-notes (1.35.0-wmf.40; 2020-07-07), Patch-For-Review, Machine Learning Platform (Current), Jade
Halfak moved T254355: Render wikitext in Jade endorsementcomment and notes fields. from Review to Done on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:26 PM · MW-1.35-notes (1.35.0-wmf.41; 2020-07-14), Machine Learning Platform (Current), Jade
Halfak moved T257528: Update Jade db maintenance script to reflect new terminology from Review to Done on the Machine Learning Platform (Current) board.
Jul 13 2020, 4:23 PM · MW-1.35-notes (1.35.0-wmf.41; 2020-07-14), Jade, Machine Learning Platform (Current)

Jul 10 2020

Halfak reassigned T111179: Tokenization of "word" things for CJK from calbon to Pavol86.

Moving to main workboard because @Pavol86 is actively making progress on this task.

Jul 10 2020, 3:59 PM · Machine Learning Platform (Current), Chinese-Sites, artificial-intelligence, revscoring
Halfak created T257681: Add the topic taxonomy to Jade.
Jul 10 2020, 3:53 PM · Jade, drafttopic-modeling, Machine Learning Platform

Jul 9 2020

Halfak added a comment to T257248: Add articletopic model to testwiki.

It's a pretty simple config change from our end. No big deal.

Jul 9 2020, 3:12 PM · Growth-Team (Current Sprint), Machine Learning Platform (Current), Patch-For-Review, drafttopic-modeling, ORES, Growth-Scaling
Halfak added a comment to T256813: Design mechanism for opting out of Jade's secondary integrations.

An example of simple Korean tokenization based on spaces:
google translate of korean article on Hurricane Andrew

Jul 9 2020, 2:42 PM · Machine Learning Platform (Current), Jade
Halfak added a comment to T111179: Tokenization of "word" things for CJK.

Talking with @Pavol86, it looks like we need to be able to install mecab and the related dictionaries in order to process Japanese and Korean.

Jul 9 2020, 2:21 PM · Machine Learning Platform (Current), Chinese-Sites, artificial-intelligence, revscoring
Halfak added a comment to P11820 (An Untitled Masterwork).

The above happens when I try to run "git review" on mediawiki/services/ores/deploy/

Jul 9 2020, 1:55 PM

Jul 8 2020

Halfak created P11820 (An Untitled Masterwork).
Jul 8 2020, 8:31 PM
Halfak added a comment to T256813: Design mechanism for opting out of Jade's secondary integrations.

Need to design the beta features preference panel.

Jul 8 2020, 1:23 PM · Machine Learning Platform (Current), Jade
Halfak triaged T257438: Write to categorylinks and pagelinks table when saving a Jade entity as Medium priority.
Jul 8 2020, 1:10 PM · Jade, Machine Learning Platform
Halfak moved T257438: Write to categorylinks and pagelinks table when saving a Jade entity from Untriaged to Ready to go on the Machine Learning Platform board.
Jul 8 2020, 1:10 PM · Jade, Machine Learning Platform
Halfak created T257438: Write to categorylinks and pagelinks table when saving a Jade entity.
Jul 8 2020, 1:10 PM · Jade, Machine Learning Platform

Jul 7 2020

Halfak added a comment to T257248: Add articletopic model to testwiki.

Aha! We don't have it here, but we could. See https://ores.wikimedia.org/v3/scores/testwiki/

Jul 7 2020, 8:49 PM · Growth-Team (Current Sprint), Machine Learning Platform (Current), Patch-For-Review, drafttopic-modeling, ORES, Growth-Scaling
Halfak created T257359: Update Turkish Wikipedia's labeling campaign for 2020.
Jul 7 2020, 7:01 PM · artificial-intelligence, editquality-modeling, Machine Learning Platform
Halfak added a comment to T257341: Add ORES article quality predictions to the WDQS.

We already store article quality predictions in the ores_classification table on the wikis where we have support.

Jul 7 2020, 4:31 PM · artificial-intelligence, Wikidata, Wikidata-Query-Service, articlequality-modeling, Machine Learning Platform
Halfak created T257341: Add ORES article quality predictions to the WDQS.
Jul 7 2020, 4:29 PM · artificial-intelligence, Wikidata, Wikidata-Query-Service, articlequality-modeling, Machine Learning Platform
Halfak committed rORES68fe20025ae5: Adds a test to check on our build_event_set method. (authored by Halfak).
Adds a test to check on our build_event_set method.
Jul 7 2020, 2:30 PM
Halfak raised the priority of T223313: Add legal language to wikilabels from Low to High.
Jul 7 2020, 1:11 PM · WMF-Legal, Wikilabels, Machine Learning Platform
Halfak added a comment to T223313: Add legal language to wikilabels.

Here's the language I got from the legal folks:

Jul 7 2020, 1:10 PM · WMF-Legal, Wikilabels, Machine Learning Platform

Jul 1 2020

Halfak added a comment to T256813: Design mechanism for opting out of Jade's secondary integrations.

Jade invisible

Not the inclusion of a "(label)" link that, when clicked disappears and adds the Jade label to the diff page.

Jul 1 2020, 7:59 PM · Machine Learning Platform (Current), Jade
Halfak added a comment to T254355: Render wikitext in Jade endorsementcomment and notes fields. .

We can parse arbitrary wikitext with the API too. E.g. https://en.wikipedia.org/w/api.php?action=parse&text=%27%27%27boldtext%27%27%27

Jul 1 2020, 7:03 PM · MW-1.35-notes (1.35.0-wmf.41; 2020-07-14), Machine Learning Platform (Current), Jade
Halfak added a comment to T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).

See https://github.com/mediawiki-utilities/python-mwtext for building embeddings

Jul 1 2020, 5:33 PM · Machine Learning Platform (Current), Serbian-Sites, Growth-Scaling, Growth-Team
Halfak added a comment to T246486: Design Jade pilot deployment plan with the Scoring Platform team.

Good question. I'm thinking that I should put together a short blurb on MediaWiki that would be recommended content for such a page. I think we'll want to have a brief explanation of "Proposals", "Endorsements", and "Preferred" status of a label so people know what they are getting into with Jade. I just worked on https://www.mediawiki.org/wiki/Jade/Edit_quality. I think we'll want something like that. I'll try to get something together shortly.

Jul 1 2020, 4:46 PM · CommRel-Specialists-Support (Oct-Dec-2020), Machine Learning Platform, Jade
Halfak awarded T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) a Evil Spooky Haunted Tree token.
Jul 1 2020, 3:41 PM · Patch-For-Review, Machine Learning Platform (Current), Operations, ORES
Halfak added a comment to T242705: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart).

Fantastic :) Thanks for the quick turn-around @akosiaris.

Jul 1 2020, 3:41 PM · Patch-For-Review, Machine Learning Platform (Current), Operations, ORES