Hi! I'm a socio-technologist. I do science so that I can build new technologies for social systems.
You can find me as:
Hi! I'm a socio-technologist. I do science so that I can build new technologies for social systems.
You can find me as:
I can't seem to replicate the issue with the current version of the feature in the articlequality repo. When I run this same revision through the feature extractor, I get 5 image links rather than 1.
I think the right next step is to implement some tests to see if we detect the following image links:
[[Bestand:Stevie Wonder 1967 (1).jpg|thumb|In 1967 tijdens een repetitie voor een optreden in een [[TROS]]-programma]] [[Bestand:Burt Bacharach - jam session.jpg|thumb|Stevie Wonder tijdens een optreden met [[Burt Bacharach]] in de jaren zestig]]
Looks like this is done as part of T252581: Train and test editquality models for Hindi Wikipedia
Been overloaded recently. Don't wait for me, but I'll put something together if I get inspired.
Agreed that the error message could be better. The feature extraction should work on anything that is an entity (Properties, Items, and Lexemes), but will fail for wikitext.
Looks like this is resolved.
My spot checking looks good on Beta. I see performance improvements where expected and consistency elsewhere.
One of the problems with building effective automated content flaw detection to help Wikipedians is the lack of precise information around historical edits (like what exactly was improved in this edit?)
[...]two quick questions:
- Would it you be okay with me uploading a PDF copy of the paper in case anyone else happening upon this ticket would like to read it as well? I thankfully have access to it via the The Wikipedia Library.
- Do you recall how you come to learn about this ticket? No worries if you don't remember...I ask this curious to learn about spaces where this kind of thing is being discussed.
This is where that warning is coming from. The "lock_manager" is used to rate limit requests based on the incoming IP address. I don't recognize this code. It looks like @awight and @Ladsgroup originally committed it. I'm not sure if they are available to comment. See https://github.com/wikimedia/ores/pull/260 for the original pull request.
It looks like we're still not getting back JSON from the local proxy API endpoint. Instead, it looks like we're getting HTML instead. Is this because we're still pointing to deployment-prep's mediawiki and it's returning an HTML page as an error?
Aha! What are the chances!
Hey folks! Just got back from some vacation and taking a look at this. It seems like the deployment on beta is broken. The redis server is complaining about passwords:
I included an update to the frozen-requirements.txt that matches the updated wheels. I generated that set of requirements by running make deployment_wheels in the repo (which is the normal process). But I needed to manually adjust the requirements.txt for each submodule to make sure it referenced yamlconf==0.2.4 because yamlconf==0.2.5 (the latest) requires PyYAML==5.4 which is not compatible with python 3.5.
I was able to start the server and load all of the models after removing those packages. It looks like a mixture of packages that were used in testing (e.g. pytest) and packages that got picked up for python 3.7
I'm trying to work through these now. It turns out sticking with python 3.5 is a pain point because many libraries have dropped support for it in recent versions. I needed to identify these issues and manually set some versions for our libraries.
I'm not sure this change alone will unblock deployment to beta. I'm running some tests.
I really am trying to be constructive. I'm sorry but I think I came off badly.
I think that there were little errors in procedures from multiple people
I understand your hesitation to minimizing changes to ORES deployments. I think the real issues is ORES gets a deployment once every 6 months and I'm blowing out cobwebs every time I ask for one to go out! I don't have the resources to fix any of our workarounds, so I've been just adding and documenting new workarounds every time we turn the deployment crank.
Also, confirmed that I now have the package on stat1007. Continuing the broad model rebuild with revscoring 2.11.1. I should have a set of PRs to review by EOD (which can obviously wait until Monday).
@elukey thanks for asking. It would be great to get that on beta for the weekend. I'll be able to blow some smoke through it in the meantime.
Thanks @Dzahn! Can you also run puppet on stat1007? I'm using that server to rebuild one of the models that needs this package. I believe it draws from the same puppet config.
Aha! I caught something rebuilding the models. Cheers @elukey for your insightful comment. See https://gerrit.wikimedia.org/r/761974. It looks like we somehow had the hindi dictionary (aspell-hi) installed on our model building server (ores-misc-01) but it wasn't included in the puppet config so it won't be in production. That would have made the new hiwiki editquality models unusable. This patch should fix that.
2.11.1 has some useful improvements. I've tested the loading of models. But you're right, there can always be issues that pop up with any difference in versions.
OK updates made.
OK I think that patchset is good for review. We end up rolling back a lot of versions, but a quick spot check suggests these versions were present before we switched to 3.7.
It looks like the mwparserfromhell change was manually changed without changing any of the requirements for revscoring. That's going to be an issue any time we try to rebuild the wheels.
It shouldn't cause problems for unpickling. But it is a good idea to stick to the versions in the prod environment regardless. We'll want new version of the wheels built with python 3.5 anyway so I don't think reverting will get us much. I'll start the process now. Luckily, it's pretty easy. I should have a new patchset ready in a few hours.
Oh! The move the 3.7 is kind of old. It was a (I think) two year old request from @akosiaris that we move to 3.7. We can go back though. I'll take a look at that and getting the most recent mwparserfromhell today.
Just checking -- will this be going to beta first? I'd like to poke the system in a prod-like environment a little bit before the actual deployment goes out.
Config updated. "(WIP)" removed. I think we're good to go
I have rebuilt the English Wikipedia model. It now loads fine with revscoring 2.11.1. https://github.com/wikimedia/articlequality/pull/171
Thanks @ACraze! I've been testing the deployment configuration and ran into a surprising compatibility issue with the current enwiki articlequality model (built with revscoring 2.8.2). I'm digging to figure out what might have caused the issue and will be submitting some rebuilt model PRs using the new revscoring 2.11.1 as they finish. Sorry for the delay, folks.
I've pushed the changes that I can to the relevant patchset. Once editquality and ores repos are mirrored, I can finish it off.
It looks like the mirroring failure is affecting the ORES repo as well now (no LFS there, just regular git commits). So that will also need to be manually mirrored.
I should also note that this change includes hiwiki editquality models and the ores logging, so I'll go update the title/description.
Looks like something went wrong with the editquality repo. I got a smudge error. This usually means that the lfs didn't get pushed completely.
OK I'll pull it in.
Great! I'll get everything updated in the deploy patchset and ready for you tomorrow.
FYI, here is the config change. https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/755731 It is still a (WIP) while we wait on the model repo code to be manually mirrored.
Thanks @ACraze! It looks like I no longer have permission to manually mirror changes into the gerrit model repos. See https://wikitech.wikimedia.org/wiki/ORES/Deployment#Updating_model_repositories
These are edits to Wikidata erroring. They might be edits to regular wiki pages. The damage and item quality models were made to assess edits to entities (items, properties) so it errors when trying to process wikitext.
@Sumit recently wrote a paper (https://dl.acm.org/doi/abs/10.1145/3479503) about building AIs that learn how to highlight content that is likely to need specific types of clean-up by learning directly from past edits. E.g. sentences that get edited for NPOV reasons tend of have a specific set of issues. The model learns those issues and then can be used to flag the same types of issues in new sentences. In effect, this encodes policy directly into the context in which someone is editing. It could be handy, so I'm bringing it up and pinging Sumit for comment :)
Woops! Almost forgot that I'd need to update the packages for the deployment as well. See also https://gerrit.wikimedia.org/r/c/research/ores/wheels/+/748390
I have 3 pull requests open that add version compatibility with revscoring 2.11 in prep for a deployment patchset.
See discussion here about a new iteration of the model. https://nl.wikipedia.org/w/index.php?title=Overleg_gebruiker:EpochFail/Kladblok&oldid=60538637#Hodge_podge_of_data_and_building_a_new_ORES_model
Fantastic! I'll work to get something together before Thursday so we might be able to review then.
If folks aren't interested in doing more labeling, it sounds like the best approach would be to just take the max label then from the set and see how well we can do with that.
Ahh. That was more of an ask to Dutch Wikipedians to help choose what label those articles should ultimately have.
Do you think we could apply the new criteria to list of articles I have in my Sandbox? https://nl.wikipedia.org/wiki/Gebruiker:EpochFail/Kladblok
@Nikhil1194 and I are working on an iteration. So I don't think we should resolve this quite yet.
Great! I think once we settle this, the next steps will be obvious and (hopefully) will require less investment from Dutch Wikipedians.
We completed the labeling campaign and I produced a report of articles where the labelers disagreed here: https://nl.wikipedia.org/wiki/Gebruiker:EpochFail/Kladblok
@calbon, waiting on this one for a couple of weeks. Any chance y'all can take a look?
Those last 4 must be checked out to someone in a workset. I think they were returned in the meantime because I was just able to check them out in a workset. I skiped them all so they should be available again.
I was able to get the campaign loaded! See https://labels.wmflabs.org/ui/nlwiki/
I'm running into some issues with the wikilabels updates. Looks like some of our deployment code has gotten old and crusty (versions have changed and backwards compatibility dropped). So I'm working on that.
Adds the nlwiki article quality scale form to Wikilabels: https://github.com/wikimedia/wikilabels-wmflabs-deploy/pull/53
^ New version of the model using updated features and manually extracted labels.
I was able to gather 64 new labels from the wiki. Most of them were E class, but we did get some B, C and D -- which are hard to differentiate.
Still waiting on a review/merge. In the meantime, @Psingh07 is working on gathering new labeled data from the reviewing work folks did on the wiki pages.
Once this is merged, I'll use this and other improvements to re-generate the models. Then we can use those models to consider a new labeling campaign based on the new quality criteria.
I added a wikitext.revision.list_items feature to revscoring for tracking articles that are in outline form (as opposed to prose). See https://github.com/wikimedia/revscoring/pull/506
Sorry. One final thought. We could make the quality classes non-ordinal. E.g. call the lowest class Beginnetje and the highest class Etalage, and develop common sense names for the classes in between. That way, order may be plainly apparent and in between classes would require a common sense name as well--rather than something like "B-" or "C+".
I should say, this pattern of retraining also works for between classes too.
It should be OK to change the meaning of the current classes over time too. One nice thing about using an ML model to supplement quality assessment is that it is easy to propagate changes like that. E.g. if we adjust the definition of a quality classes, we just need to review our training data (50-75 articles per quality class) to fix the labels and retrain.
We're unblocked with new work. We have new code ready for modeling/testing that improved unsourced content detection.
I suggest referencing https://pythonhosted.org/mwxml/map.html#mwxml.map
I see. You're asking to include the "weighted sum" measure in the JSON output?
The output of https://ores.wikimedia.org/v3/scores/nlwiki/123125/articlequality is pure JSON and links are not possible in this data format.
Here's the importance table. The higher the importance score, the more important the value is to the prediction. It turns out that the count of category links is the least important feature of the set. Overall length of the article, the amount of content with references, and the proportion of content that is referenced are the dominant features.