Page MenuHomePhabricator

Consolidate articlequality and itemquality models into a "model family"
Open, LowestPublic

Description

We realized that article and item quality are special cases of a more general "content quality" concept.

UPDATE

We're not going to merge the existing models. Instead, were introducing a new "model family" property which can be used for autodiscovery of content quality models. Here's an outline of how clients should access the models:

  • Most clients should will consume specific models, and these model names should be set in configuration. When making a request to the ORES API, always include the models parameter, e.g. models=editquality|articlequality.
  • Some clients like an explorer UI may be able to take advantage of new models, without having the names hardcoded. These clients should make an initial request to list all models, e.g. https://ores.wikimedia.org/v3/scores/?model_info . A model_family field included in responses will help the client determine how each model can be used. Some experimental or deprecated models may be is_hidden.
  • Other clients will request "all" models in order to pass the data through, e.g. to a generalized ORES cache. These should continue to do this without any changes. The server will hide the hidden models as needed. Clients shouldn't make any assumptions about what models are available in the "all" list.

Event Timeline

awight created this task.Oct 2 2018, 9:06 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 2 2018, 9:06 PM

Change 464049 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/JADE@master] Rename articlequality to contentquality

https://gerrit.wikimedia.org/r/464049

Restricted Application added a project: Growth-Team. · View Herald TranscriptOct 2 2018, 9:13 PM
JTannerWMF moved this task from Inbox to FY 2019-20 on the Growth-Team board.Oct 3 2018, 6:18 PM

@awight -- our team just finished renaming the model from wp10 to articlequality in T202998. When will this additional change go into effect? And by when would we have to do the rename?

CC @SBisson

Change 464049 merged by jenkins-bot:
[mediawiki/extensions/JADE@master] Rename articlequality to contentquality

https://gerrit.wikimedia.org/r/464049

@awight -- our team just finished renaming the model from wp10 to articlequality in T202998. When will this additional change go into effect? And by when would we have to do the rename?

CC @SBisson

It's unfortunate that we didn't catch this before deprecating "wp10", apologies for the extra work. There's no schedule yet, but I think the sooner, the better, before "articlequality" gains traction. We'll use the same strategy, making the old names available in the ORES service as an alias, until some deprecation period elapses.

Another alternative would be to rename the "itemquality" model to "articlequality", but we think that the generalized "contentquality" name is more correct.

Harej moved this task from Inbox to In Progress on the Jade board.Oct 4 2018, 7:40 PM
awight removed a project: Jade.Oct 4 2018, 7:47 PM
awight renamed this task from Rename articlequality and itemquality to "contentquality" to Merge articlequality and itemquality.Oct 10 2018, 6:37 PM
awight claimed this task.
awight triaged this task as High priority.

I'm going to push this forward, but would first like to reopen discussion of the name. "contentquality" is something we invented among our team, and has no inherent meaning to Wikimedians. Perhaps we should merge "itemquality" into "articlequality" instead, which would keep its approximate original meaning, and has easily discoverable documentation.

Additionally, the itemquality -> articlequality is much easier, only affects Wikidata clients, and doesn't introduce a new key.

Halfak added a subscriber: Halfak.Oct 10 2018, 6:49 PM

"Content" is commonly used among Wikipedians. "Content pages" is an official definition used in Wiki stats.

"Content" is commonly used among Wikipedians. "Content pages" is an official definition used in Wiki stats.

If we'll ever estimate talk page content quality, this overloading could become confusing.

"talk quality"? Generally we don't refer to discussions as "content". "Content pages" does not include talk pages.

"talk quality"? Generally we don't refer to discussions as "content". "Content pages" does not include talk pages.

Are you saying we would have two models "contentquality" and "talkquality" in that case? If we're keeping some granularity, maybe we want to keep "article", "item", and future "talk" quality models separate?

articlequality and itemquality have very much in common. They are all based on similar scales and refer to the completeness of a content page. Talkquality doesn't yet exist and probably wouldn't have anything like a similar scale to articlequality or itemquality.

I'm going to push this forward, but would first like to reopen discussion of the name.

@awight I see that the patch has been merged, but should we update our usages in PageCuration or wait for this discussion to conclude?

@kostajh Sorry for the delay, we haven't merged any patches that will affect ORES yet. I'll post here once we begin the migration, thanks!

Let's explore an alternative to merging and renaming models: The conceptual similarity between itemquality and articlequality can be represented by a new field, model_info.model_family = 'contentquality'. Models can now have any arbitrary name, and clients are either configured to use specific model name for each wiki, or can do model auto-detection by iterating through all model information, matching by model family.

If combined with another new field model_info.is_hidden, we have a very graceful potential migration between model names in the future. We can surface the new model name without disrupting clients configured to fetch specific models under their deprecated name, and these can be hidden which avoids the step of serving redundant results under multiple keys.

awight added a subscriber: Harej.Oct 30 2018, 5:07 PM

@Harej I'd love feedback on ^ the "model family" concept above.

JTannerWMF moved this task from FY 2019-20 to External on the Growth-Team board.Nov 14 2018, 7:02 PM

I'd like to finish up this discussion. To summarize my current preference, I think that we should introduce a "content quality" model family, but let articlequality and itemquality keep their respective identities. These two models are divergent, and I think any benefit from merging at the level of model name is a false economy. There are no arguments that it will simplify client configuration parsing logic, nor improve anything on the server.

Practically, I would deploy the model family concept in two phases:

  • Add model_family to model_info API responses.
    • Return "content quality" for the model family of item and article quality.
    • Decide on model families for the other models.
    • Introduce the model_info.is_hidden flag, initially false for all models.
    • No client changes required.
  • Announce and deprecate the wp10 model for requests where no model_name is specified.
    • Set is_hidden to true for the model.
    • Explicit requests for wp10 will continue to return articlequality results, however.
    • Only clients that request all models and depend on wp10 being present will be broken.

+1. Note that this will affect ChangeProp/precaching too. We need to make sure that hidden models aren't duplicated in precaching as well. Also anyone who is listening to ChangeProp scores (just @Ottomata right now) will need to adjust to the new model name.

Since change-prop is responsible for emitting the revision-score event, we'll have to make sure that these fields are in the event schema, and that change-prop sets the properly. Also ping @Pchelolo.

+1. Note that this will affect ChangeProp/precaching too. We need to make sure that hidden models aren't duplicated in precaching as well. Also anyone who is listening to ChangeProp scores (just @Ottomata right now) will need to adjust to the new model name.

Thanks for the note. Downstream clients probably don't need to know about anything new actually, the only difference is that "all" endpoints will stop returning deprecated models. The model names will not change in changeprop, Wikidata will have an itemquality model and other wikis may have an articlequality model.

Downstream clients must not assume that wp10 will be present. Other than that, no changes are required.

awight renamed this task from Merge articlequality and itemquality to Consolidate articlequality and itemquality models into a "model family".Nov 19 2018, 5:48 PM
awight updated the task description. (Show Details)
Harej removed awight as the assignee of this task.Mar 25 2019, 4:51 PM
Harej lowered the priority of this task from High to Lowest.
Harej removed a subscriber: Harej.Jul 4 2019, 9:26 AM