label_reverted and fetch_labels will need to operate on and produce json-lines.
|Open||None||T144636 [Epic] Implement PCFG features for editquality and draftquality|
|Resolved||Halfak||T146410 Update editquality for revscoring 1.3.0|
|Resolved||Halfak||T146482 json-lines output format doesn't have line breaks|
This is needed in order to T144636: [Epic] Implement PCFG features for editquality and draftquality because we're relying on revscoring 1.3.0 (which uses json-lines) in order fit the PCFGs (like the TFiDF selector in T132580)
I'm working on https://github.com/wiki-ai/editquality/tree/new_revscoring
Mostly, I needed to update the Makefile and utilities to work purely with JSON-lines. There are a few others simplifications.
E.g. I've merged label_reverted and prelabel into a new utility called autolabel that does both.
Currently, I'm re-running the entire Makefile and regenerating models with cross-validated test statistics. The runs keep getting interrupted by typos in the Makefile. I figure that, by the time I've finished running this code on all of the models, we'll be ready to cut revscoring 1.3.0 and deploy it along with the updated models I'm producing now.
I've processed up to plwiki. So, I'm a little bit more than half-way.
- plwiki <-- [Process is here]