Page MenuHomePhabricator

Update editquality for revscoring 1.3.0
Closed, ResolvedPublic


label_reverted and fetch_labels will need to operate on and produce json-lines.

Event Timeline

This is needed in order to T144636: [Epic] Implement PCFG features for editquality and draftquality because we're relying on revscoring 1.3.0 (which uses json-lines) in order fit the PCFGs (like the TFiDF selector in T132580) -- Fixes some issues I found in the tune utility

I'm working on

Mostly, I needed to update the Makefile and utilities to work purely with JSON-lines. There are a few others simplifications.

E.g. I've merged label_reverted and prelabel into a new utility called autolabel that does both.

Currently, I'm re-running the entire Makefile and regenerating models with cross-validated test statistics. The runs keep getting interrupted by typos in the Makefile. I figure that, by the time I've finished running this code on all of the models, we'll be ready to cut revscoring 1.3.0 and deploy it along with the updated models I'm producing now.

I've processed up to plwiki. So, I'm a little bit more than half-way.

  1. arwiki
  2. cswiki
  3. dewiki
  4. enwiki
  5. enwiktionary
  6. eswiki
  7. etwiki
  8. fawiki
  9. frwiki
  10. hewiki
  11. huwiki
  12. idwiki
  13. itwiki
  14. nlwiki
  15. nowiki
  16. plwiki <-- [Process is here]
  17. ptwiki
  18. ruwiki
  19. svwiki
  20. trwiki
  21. ukwiki
  22. viwiki
  23. wikidatawiki

Process is on wikidatawiki! Almost there! I'm really hoping that this will be settled and a pull request can be submitted tomorrow.

OK. All done, but I found an issue in the ruwiki datasets so I'm regenerating those models. After that, I think we're all done!

Halfak renamed this task from Implement new json-lines pattern in editquality to Update editquality for revscoring 1.3.0.Oct 3 2016, 3:04 PM