Page MenuHomePhabricator

Halfak (Aaron Halfaker, EpochFail, halfak)
Principal Research Scientist

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:05 PM (401 w, 6 d)
Availability
Available
IRC Nick
halfak
LDAP User
Halfak
MediaWiki User
EpochFail [ Global Accounts ]

Hi! I'm a socio-technologist. I do science so that I can build new technologies for social systems.

You can find me as:

Recent Activity

Jun 4 2022

Halfak updated the task description for T308013: Assign SPDX headers to puppet.git.
Jun 4 2022, 2:37 AM · Patch-For-Review, Infrastructure-Foundations, SRE

Mar 31 2022

Halfak added a comment to T304973: Articlequality model for nlwiki doesn't seem to track images correctly. .

I can't seem to replicate the issue with the current version of the feature in the articlequality repo. When I run this same revision through the feature extractor, I get 5 image links rather than 1.

Mar 31 2022, 3:59 PM · artificial-intelligence, Machine-Learning-Team, articlequality-modeling

Mar 29 2022

Halfak added a comment to T304973: Articlequality model for nlwiki doesn't seem to track images correctly. .

I think the right next step is to implement some tests to see if we detect the following image links:

[[Bestand:Stevie Wonder 1967 (1).jpg|thumb|In 1967 tijdens een repetitie voor een optreden in een [[TROS]]-programma]]
[[Bestand:Burt Bacharach - jam session.jpg|thumb|Stevie Wonder tijdens een optreden met [[Burt Bacharach]] in de jaren zestig]]
Mar 29 2022, 4:33 PM · artificial-intelligence, Machine-Learning-Team, articlequality-modeling
Halfak created T304973: Articlequality model for nlwiki doesn't seem to track images correctly. .
Mar 29 2022, 4:31 PM · artificial-intelligence, Machine-Learning-Team, articlequality-modeling

Mar 22 2022

Halfak updated subscribers of T303293: Enable ORES in RecentChanges for Hindi Wikipedia .
  1. FWIW, I'm only aware of one community who might have not wanted ORES (German Wikipedia) but that was never explored to my recollection. I don't think there was ever a discussion and the ORES filters were never made available there through RCFilters, but there was a somewhat substantial cohort who did the labeling work. I've asked my collaborators to link to the discussions they have started on Hindi Wikipedia above. @1997kB is the contributor who notified us when the labeling campaign was finished. They might be aware of more details regarding interest from Hindi Wikipedians.
Mar 22 2022, 3:55 AM · Growth-Team, Edit-Review-Improvements-RC-Page, Growth community maintenance, Hindi-Sites, editquality-modeling, artificial-intelligence, Machine-Learning-Team

Mar 21 2022

Halfak updated subscribers of T303293: Enable ORES in RecentChanges for Hindi Wikipedia .

@Psingh07 & @Nikhil1194, can you link to the posts you made on Hindi Wikipedia discussing ORES?

Mar 21 2022, 8:49 PM · Growth-Team, Edit-Review-Improvements-RC-Page, Growth community maintenance, Hindi-Sites, editquality-modeling, artificial-intelligence, Machine-Learning-Team

Mar 14 2022

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Done!

Mar 14 2022, 5:36 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak closed T173122: Add language support for Hindi as Resolved.

Looks like this is done as part of T252581: Train and test editquality models for Hindi Wikipedia

Mar 14 2022, 5:20 PM · Hindi-Sites, Bad-Words-Detection-System, revscoring, Machine-Learning-Team, artificial-intelligence

Mar 9 2022

Halfak added a comment to T302851: revscoring feature extraction error for wikitext papes in Wikidata .

Been overloaded recently. Don't wait for me, but I'll put something together if I get inspired.

Mar 9 2022, 11:23 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks), ORES
Halfak added a comment to T302851: revscoring feature extraction error for wikitext papes in Wikidata .

Agreed that the error message could be better. The feature extraction should work on anything that is an entity (Properties, Items, and Lexemes), but will fail for wikitext.

Mar 9 2022, 6:02 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks), ORES
Halfak renamed T302851: revscoring feature extraction error for wikitext papes in Wikidata from revscoring feature extraction error for Wikidata to revscoring feature extraction error for wikitext papes in Wikidata .
Mar 9 2022, 5:52 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks), ORES

Mar 8 2022

Halfak added a project to T303293: Enable ORES in RecentChanges for Hindi Wikipedia : Edit-Review-Improvements-RC-Page.
Mar 8 2022, 4:21 PM · Growth-Team, Edit-Review-Improvements-RC-Page, Growth community maintenance, Hindi-Sites, editquality-modeling, artificial-intelligence, Machine-Learning-Team
Halfak created T303293: Enable ORES in RecentChanges for Hindi Wikipedia .
Mar 8 2022, 4:20 PM · Growth-Team, Edit-Review-Improvements-RC-Page, Growth community maintenance, Hindi-Sites, editquality-modeling, artificial-intelligence, Machine-Learning-Team
Halfak closed T252581: Train and test editquality models for Hindi Wikipedia as Resolved.

Looks like this is resolved.

Mar 8 2022, 4:12 PM · Patch-For-Review, Hindi-Sites, editquality-modeling, Machine-Learning-Team, artificial-intelligence

Feb 25 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

My spot checking looks good on Beta. I see performance improvements where expected and consistency elsewhere.

Feb 25 2022, 5:22 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T265163: Create a system to encode best practices into editing experiences.

One of the problems with building effective automated content flaw detection to help Wikipedians is the lack of precise information around historical edits (like what exactly was improved in this edit?)

Feb 25 2022, 12:03 AM · Editing-team, VisualEditor

Feb 24 2022

Halfak added a comment to T265163: Create a system to encode best practices into editing experiences.

[...]two quick questions:

  1. Would it you be okay with me uploading a PDF copy of the paper in case anyone else happening upon this ticket would like to read it as well? I thankfully have access to it via the The Wikipedia Library.
  2. Do you recall how you come to learn about this ticket? No worries if you don't remember...I ask this curious to learn about spaces where this kind of thing is being discussed.
Feb 24 2022, 5:03 PM · Editing-team, VisualEditor
Halfak updated subscribers of T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

This is where that warning is coming from. The "lock_manager" is used to rate limit requests based on the incoming IP address. I don't recognize this code. It looks like @awight and @Ladsgroup originally committed it. I'm not sure if they are available to comment. See https://github.com/wikimedia/ores/pull/260 for the original pull request.

Feb 24 2022, 4:47 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

It looks like we're still not getting back JSON from the local proxy API endpoint. Instead, it looks like we're getting HTML instead. Is this because we're still pointing to deployment-prep's mediawiki and it's returning an HTML page as an error?

Feb 24 2022, 12:58 AM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 23 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Aha! What are the chances!

Feb 23 2022, 4:59 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Hey folks! Just got back from some vacation and taking a look at this. It seems like the deployment on beta is broken. The redis server is complaining about passwords:

Feb 23 2022, 4:55 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 16 2022

Halfak committed rORESWHEELS85c0dccefacb: Removes old unused packages. (authored by Halfak).
Removes old unused packages.
Feb 16 2022, 8:04 AM

Feb 15 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I included an update to the frozen-requirements.txt that matches the updated wheels. I generated that set of requirements by running make deployment_wheels in the repo (which is the normal process). But I needed to manually adjust the requirements.txt for each submodule to make sure it referenced yamlconf==0.2.4 because yamlconf==0.2.5 (the latest) requires PyYAML==5.4 which is not compatible with python 3.5.

Feb 15 2022, 6:48 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I was able to start the server and load all of the models after removing those packages. It looks like a mixture of packages that were used in testing (e.g. pytest) and packages that got picked up for python 3.7

Feb 15 2022, 6:40 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I'm trying to work through these now. It turns out sticking with python 3.5 is a pain point because many libraries have dropped support for it in recent versions. I needed to identify these issues and manually set some versions for our libraries.

Feb 15 2022, 6:39 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak committed rORESWHEELS4afd76b841c5: Removes importlib_resources that was picked up in python 3.7 (authored by Halfak).
Removes importlib_resources that was picked up in python 3.7
Feb 15 2022, 5:10 PM
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I'm not sure this change alone will unblock deployment to beta. I'm running some tests.

Feb 15 2022, 5:09 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 14 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I really am trying to be constructive. I'm sorry but I think I came off badly.

Feb 14 2022, 7:36 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I think that there were little errors in procedures from multiple people

Feb 14 2022, 7:05 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I understand your hesitation to minimizing changes to ORES deployments. I think the real issues is ORES gets a deployment once every 6 months and I'm blowing out cobwebs every time I ask for one to go out! I don't have the resources to fix any of our workarounds, so I've been just adding and documenting new workarounds every time we turn the deployment crank.

Feb 14 2022, 4:41 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 11 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Also, confirmed that I now have the package on stat1007. Continuing the broad model rebuild with revscoring 2.11.1. I should have a set of PRs to review by EOD (which can obviously wait until Monday).

Feb 11 2022, 7:54 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

@elukey thanks for asking. It would be great to get that on beta for the weekend. I'll be able to blow some smoke through it in the meantime.

Feb 11 2022, 7:52 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Thanks @Dzahn! Can you also run puppet on stat1007? I'm using that server to rebuild one of the models that needs this package. I believe it draws from the same puppet config.

Feb 11 2022, 7:19 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Aha! I caught something rebuilding the models. Cheers @elukey for your insightful comment. See https://gerrit.wikimedia.org/r/761974. It looks like we somehow had the hindi dictionary (aspell-hi) installed on our model building server (ores-misc-01) but it wasn't included in the puppet config so it won't be in production. That would have made the new hiwiki editquality models unusable. This patch should fix that.

Feb 11 2022, 7:08 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak committed rORESWHEELSa5de707d2ff2: Reverts back to python 3.5 wheels and includes mwparserfromhell 0.6.3 (authored by Halfak).
Reverts back to python 3.5 wheels and includes mwparserfromhell 0.6.3
Feb 11 2022, 6:51 AM

Feb 10 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

2.11.1 has some useful improvements. I've tested the loading of models. But you're right, there can always be issues that pop up with any difference in versions.

Feb 10 2022, 10:39 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 9 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

OK updates made.

Feb 9 2022, 6:09 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 8 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

OK I think that patchset is good for review. We end up rolling back a lot of versions, but a quick spot check suggests these versions were present before we switched to 3.7.

Feb 8 2022, 6:20 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

It looks like the mwparserfromhell change was manually changed without changing any of the requirements for revscoring. That's going to be an issue any time we try to rebuild the wheels.

Feb 8 2022, 6:17 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

It shouldn't cause problems for unpickling. But it is a good idea to stick to the versions in the prod environment regardless. We'll want new version of the wheels built with python 3.5 anyway so I don't think reverting will get us much. I'll start the process now. Luckily, it's pretty easy. I should have a new patchset ready in a few hours.

Feb 8 2022, 4:59 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak updated subscribers of T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Oh! The move the 3.7 is kind of old. It was a (I think) two year old request from @akosiaris that we move to 3.7. We can go back though. I'll take a look at that and getting the most recent mwparserfromhell today.

Feb 8 2022, 4:13 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 7 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Just checking -- will this be going to beta first? I'd like to poke the system in a prod-like environment a little bit before the actual deployment goes out.

Feb 7 2022, 7:47 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 5 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Config updated. "(WIP)" removed. I think we're good to go

Feb 5 2022, 9:51 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Feb 1 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I have rebuilt the English Wikipedia model. It now loads fine with revscoring 2.11.1. https://github.com/wikimedia/articlequality/pull/171

Feb 1 2022, 7:33 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Thanks @ACraze! I've been testing the deployment configuration and ran into a surprising compatibility issue with the current enwiki articlequality model (built with revscoring 2.8.2). I'm digging to figure out what might have caused the issue and will be submitting some rebuilt model PRs using the new revscoring 2.11.1 as they finish. Sorry for the delay, folks.

Feb 1 2022, 4:57 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Jan 28 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I've pushed the changes that I can to the relevant patchset. Once editquality and ores repos are mirrored, I can finish it off.

Jan 28 2022, 8:27 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

It looks like the mirroring failure is affecting the ORES repo as well now (no LFS there, just regular git commits). So that will also need to be manually mirrored.

Jan 28 2022, 8:23 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a subtask for T252581: Train and test editquality models for Hindi Wikipedia: T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.
Jan 28 2022, 8:20 PM · Patch-For-Review, Hindi-Sites, editquality-modeling, Machine-Learning-Team, artificial-intelligence
Halfak added a subtask for T299137: Improve ORES observability : T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.
Jan 28 2022, 8:20 PM · Patch-For-Review, Epic, Machine-Learning-Team (Active Tasks), observability
Halfak added parent tasks for T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability: T252581: Train and test editquality models for Hindi Wikipedia, T299137: Improve ORES observability .
Jan 28 2022, 8:20 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak updated the task description for T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.
Jan 28 2022, 8:19 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak renamed T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability from ORES deployment - Winter 2022 - nlwiki articlequality model to ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.
Jan 28 2022, 8:19 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

I should also note that this change includes hiwiki editquality models and the ores logging, so I'll go update the title/description.

Jan 28 2022, 8:18 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Looks like something went wrong with the editquality repo. I got a smudge error. This usually means that the lfs didn't get pushed completely.

Jan 28 2022, 8:18 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)
Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

OK I'll pull it in.

Jan 28 2022, 8:15 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Jan 27 2022

Halfak added a comment to T300195: ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability.

Great! I'll get everything updated in the deploy patchset and ready for you tomorrow.

Jan 27 2022, 10:18 PM · Patch-For-Review, artificial-intelligence, articlequality-modeling, ORES, Machine-Learning-Team (Active Tasks)

Jan 20 2022

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

FYI, here is the config change. https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/755731 It is still a (WIP) while we wait on the model repo code to be manually mirrored.

Jan 20 2022, 4:59 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak created T299664: ORES deployment repos not mirroring regular git changes anymore.
Jan 20 2022, 4:51 PM · Machine-Learning-Team, ORES
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Thanks @ACraze! It looks like I no longer have permission to manually mirror changes into the gerrit model repos. See https://wikitech.wikimedia.org/wiki/ORES/Deployment#Updating_model_repositories

Jan 20 2022, 4:48 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jan 14 2022

Halfak added a comment to T299137: Improve ORES observability .

These are edits to Wikidata erroring. They might be edits to regular wiki pages. The damage and item quality models were made to assess edits to entities (items, properties) so it errors when trying to process wikitext.

Jan 14 2022, 9:56 PM · Patch-For-Review, Epic, Machine-Learning-Team (Active Tasks), observability
Halfak updated subscribers of T265163: Create a system to encode best practices into editing experiences.

@Sumit recently wrote a paper (https://dl.acm.org/doi/abs/10.1145/3479503) about building AIs that learn how to highlight content that is likely to need specific types of clean-up by learning directly from past edits. E.g. sentences that get edited for NPOV reasons tend of have a specific set of issues. The model learns those issues and then can be used to flag the same types of issues in new sentences. In effect, this encodes policy directly into the context in which someone is editing. It could be handy, so I'm bringing it up and pinging Sumit for comment :)

Jan 14 2022, 7:05 PM · Editing-team, VisualEditor

Jan 13 2022

Halfak committed rORESWHEELS6bc6015da0d7: Updated wheels for revscoring-2.11 and python 3.7 (authored by Halfak).
Updated wheels for revscoring-2.11 and python 3.7
Jan 13 2022, 5:07 PM
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

@ACraze, thanks for your review of the model repo updates. Can you also look at the patchset linked above in T223782#7579930?

Jan 13 2022, 4:15 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Dec 20 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Woops! Almost forgot that I'd need to update the packages for the deployment as well. See also https://gerrit.wikimedia.org/r/c/research/ores/wheels/+/748390

Dec 20 2021, 3:09 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I have 3 pull requests open that add version compatibility with revscoring 2.11 in prep for a deployment patchset.

Dec 20 2021, 2:15 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Dec 16 2021

Halfak updated subscribers of T223782: Build article quality model for Dutch Wikipedia.

See discussion here about a new iteration of the model. https://nl.wikipedia.org/w/index.php?title=Overleg_gebruiker:EpochFail/Kladblok&oldid=60538637#Hodge_podge_of_data_and_building_a_new_ORES_model

Dec 16 2021, 4:29 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Dec 13 2021

Halfak committed rOWCb20cb7469cda: Extends fetch_labels utility and nlwiki pipeline. (authored by Halfak).
Extends fetch_labels utility and nlwiki pipeline.
Dec 13 2021, 2:26 AM
Halfak committed rOWC32ac92697532: Adding new manual sample for nlwiki (authored by Halfak).
Adding new manual sample for nlwiki
Dec 13 2021, 2:26 AM

Dec 3 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Fantastic! I'll work to get something together before Thursday so we might be able to review then.

Dec 3 2021, 7:14 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Nov 24 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

If folks aren't interested in doing more labeling, it sounds like the best approach would be to just take the max label then from the set and see how well we can do with that.

Nov 24 2021, 8:48 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Ahh. That was more of an ask to Dutch Wikipedians to help choose what label those articles should ultimately have.

Nov 24 2021, 7:04 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Do you think we could apply the new criteria to list of articles I have in my Sandbox? https://nl.wikipedia.org/wiki/Gebruiker:EpochFail/Kladblok

Nov 24 2021, 5:59 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Nov 15 2021

Halfak added a comment to T252581: Train and test editquality models for Hindi Wikipedia.

@Nikhil1194 and I are working on an iteration. So I don't think we should resolve this quite yet.

Nov 15 2021, 9:34 PM · Patch-For-Review, Hindi-Sites, editquality-modeling, Machine-Learning-Team, artificial-intelligence

Nov 12 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Great! I think once we settle this, the next steps will be obvious and (hopefully) will require less investment from Dutch Wikipedians.

Nov 12 2021, 5:44 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Nov 11 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

We completed the labeling campaign and I produced a report of articles where the labelers disagreed here: https://nl.wikipedia.org/wiki/Gebruiker:EpochFail/Kladblok

Nov 11 2021, 4:50 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Oct 28 2021

Halfak added a comment to T252581: Train and test editquality models for Hindi Wikipedia.

@calbon, waiting on this one for a couple of weeks. Any chance y'all can take a look?

Oct 28 2021, 8:13 PM · Patch-For-Review, Hindi-Sites, editquality-modeling, Machine-Learning-Team, artificial-intelligence

Oct 14 2021

Halfak updated subscribers of T252581: Train and test editquality models for Hindi Wikipedia.

Thanks to @Nikhil1194's work. We have an initial pair of editquality models ready for Hindi Wikipedia. See https://github.com/wikimedia/editquality/pull/235

Oct 14 2021, 5:19 PM · Patch-For-Review, Hindi-Sites, editquality-modeling, Machine-Learning-Team, artificial-intelligence

Oct 12 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Those last 4 must be checked out to someone in a workset. I think they were returned in the meantime because I was just able to check them out in a workset. I skiped them all so they should be available again.

Oct 12 2021, 8:48 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Oct 1 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I was able to get the campaign loaded! See https://labels.wmflabs.org/ui/nlwiki/

Oct 1 2021, 10:07 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I'm running into some issues with the wikilabels updates. Looks like some of our deployment code has gotten old and crusty (versions have changed and backwards compatibility dropped). So I'm working on that.

Oct 1 2021, 9:11 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Sep 29 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Adds the nlwiki article quality scale form to Wikilabels: https://github.com/wikimedia/wikilabels-wmflabs-deploy/pull/53

Sep 29 2021, 11:25 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Sep 23 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

^ New version of the model using updated features and manually extracted labels.

Sep 23 2021, 4:42 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

https://github.com/wikimedia/articlequality/pull/168

Sep 23 2021, 4:34 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak committed rOWCed260ae53796: nlwiki features for infobox and list_items (authored by Halfak).
nlwiki features for infobox and list_items
Sep 23 2021, 4:01 PM
Halfak committed rOWC355af1a42c1a: Adds manually extracted nlwiki labels. (authored by Halfak).
Adds manually extracted nlwiki labels.
Sep 23 2021, 4:01 PM

Sep 20 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I was able to gather 64 new labels from the wiki. Most of them were E class, but we did get some B, C and D -- which are hard to differentiate.

Sep 20 2021, 5:52 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Sep 9 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Still waiting on a review/merge. In the meantime, @Psingh07 is working on gathering new labeled data from the reviewing work folks did on the wiki pages.

Sep 9 2021, 4:29 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Aug 26 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Once this is merged, I'll use this and other improvements to re-generate the models. Then we can use those models to consider a new labeling campaign based on the new quality criteria.

Aug 26 2021, 5:39 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I added a wikitext.revision.list_items feature to revscoring for tracking articles that are in outline form (as opposed to prose). See https://github.com/wikimedia/revscoring/pull/506

Aug 26 2021, 5:38 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jul 30 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Sorry. One final thought. We could make the quality classes non-ordinal. E.g. call the lowest class Beginnetje and the highest class Etalage, and develop common sense names for the classes in between. That way, order may be plainly apparent and in between classes would require a common sense name as well--rather than something like "B-" or "C+".

Jul 30 2021, 5:15 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I should say, this pattern of retraining also works for between classes too.

Jul 30 2021, 5:11 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

It should be OK to change the meaning of the current classes over time too. One nice thing about using an ML model to supplement quality assessment is that it is easy to propagate changes like that. E.g. if we adjust the definition of a quality classes, we just need to review our training data (50-75 articles per quality class) to fix the labels and retrain.

Jul 30 2021, 5:10 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jul 29 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

We're unblocked with new work. We have new code ready for modeling/testing that improved unsourced content detection.

Jul 29 2021, 6:04 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jul 20 2021

Halfak added a comment to T287021: Move CJK segmentation features to a branch and revert revscoring.

https://github.com/wikimedia/revscoring/pull/505

Jul 20 2021, 3:46 PM · Patch-For-Review, artificial-intelligence, revscoring, Machine-Learning-Team (Active Tasks)
Halfak created T287021: Move CJK segmentation features to a branch and revert revscoring.
Jul 20 2021, 3:46 PM · Patch-For-Review, artificial-intelligence, revscoring, Machine-Learning-Team (Active Tasks)

Jun 23 2021

Halfak added a comment to T284687: Resource allocation request for the wikicommunityhealth project.

I suggest referencing https://pythonhosted.org/mwxml/map.html#mwxml.map

Jun 23 2021, 6:15 PM · Cloud-VPS (Quota-requests)

Jun 8 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

I see. You're asking to include the "weighted sum" measure in the JSON output?

Jun 8 2021, 3:17 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team
Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

The output of https://ores.wikimedia.org/v3/scores/nlwiki/123125/articlequality is pure JSON and links are not possible in this data format.

Jun 8 2021, 3:00 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jun 3 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Here's the importance table. The higher the importance score, the more important the value is to the prediction. It turns out that the count of category links is the least important feature of the set. Overall length of the article, the amount of content with references, and the proportion of content that is referenced are the dominant features.

Jun 3 2021, 2:45 AM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team

Jun 2 2021

Halfak added a comment to T223782: Build article quality model for Dutch Wikipedia.

Sorry for the late response. The holiday weekend in the US (memorial day) had me out of my usual flow.

Jun 2 2021, 3:59 PM · artificial-intelligence, articlequality-modeling, Wikilabels, Machine-Learning-Team