Page MenuHomePhabricator

Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?
Open, Needs TriagePublic

Description

From the survey on the format of the mediawiki history dumps:

"Going on a bit of a tangent, it would be helpful to have the Parsoid and ORES data (for all models available at the time of the extraction, optimized sufficiently for querying even if just via JSON key-value mapping convention only) available in the pertinent files. I find myself often wanting to have this data readily available instead of having to make API calls."

Parsoid data is available via dumps but ORES data is at this time not available anywhere. It seems that it would not be hard to make a dump of ores scores per revision id per wiki and those could be released in a daly schedule such api calls are not needed.
cc @Halfak in case his team is interested in grabbing this item

Event Timeline

I do not think it will be a wise decision (even if we could do it technically) to have mediawiki history be the monolith of all the things, with revision ids you should be able to retrieve easily ores scores and content.

I mildly disaggre with @Nuria for ORES scores - I think it could be very cool to have them (some models only, one model version only). Maybe in a separate table, or in different dumps. As for parsoid, I don't really understand what that is.
Edited for typos.

I do not disagree that ores score would be useful in a table accessible by revision, +1 to that . I just do not think that the process that retrieves them and maintains them should be related to the mediawiki history one. I think the use case here is to be able to retrieve those in an easier format than it is possible now, makes sense.

Nuria renamed this task from Can we add Parsoid and ORES data to mediawiki history? to Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?.Sep 16 2019, 5:45 PM
Nuria moved this task from Machine Learning Platform to Radar on the Analytics board.
Nuria added a subscriber: Halfak.