label_reverted and fetch_labels will need to operate on and produce json-lines.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T144636 [Epic] Implement PCFG features for editquality and draftquality | |||
Resolved | Halfak | T146410 Update editquality for revscoring 1.3.0 | |||
Resolved | BUG REPORT | Halfak | T146482 json-lines output format doesn't have line breaks |
Event Timeline
This is needed in order to T144636: [Epic] Implement PCFG features for editquality and draftquality because we're relying on revscoring 1.3.0 (which uses json-lines) in order fit the PCFGs (like the TFiDF selector in T132580)
https://github.com/wiki-ai/revscoring/pull/290 -- Fixes some issues I found in the tune utility
I'm working on https://github.com/wiki-ai/editquality/tree/new_revscoring
Mostly, I needed to update the Makefile and utilities to work purely with JSON-lines. There are a few others simplifications.
E.g. I've merged label_reverted and prelabel into a new utility called autolabel that does both.
Currently, I'm re-running the entire Makefile and regenerating models with cross-validated test statistics. The runs keep getting interrupted by typos in the Makefile. I figure that, by the time I've finished running this code on all of the models, we'll be ready to cut revscoring 1.3.0 and deploy it along with the updated models I'm producing now.
I've processed up to plwiki. So, I'm a little bit more than half-way.
- arwiki
- cswiki
- dewiki
- enwiki
- enwiktionary
- eswiki
- etwiki
- fawiki
- frwiki
- hewiki
- huwiki
- idwiki
- itwiki
- nlwiki
- nowiki
- plwiki <-- [Process is here]
- ptwiki
- ruwiki
- svwiki
- trwiki
- ukwiki
- viwiki
- wikidatawiki
Process is on wikidatawiki! Almost there! I'm really hoping that this will be settled and a pull request can be submitted tomorrow.
OK. All done, but I found an issue in the ruwiki datasets so I'm regenerating those models. After that, I think we're all done!