Page MenuHomePhabricator

Newcomer tasks: ORES ontology mapping and score thresholds
Closed, ResolvedPublic

Description

In T236713: Improve drafttopic training data pipeline, @Halfak, @Isaac, and @MMiller_WMF worked on the ontology for the new articletopic model. It is shown below, and includes 64 leaf nodes in a tree that has up to four levels. In order to use them in the UI for newcomer tasks, we have to do two things:

  • Mapping: map the topics in the ontology to the buttons that users will actually be able to click.
  • Thresholds: decide how to set score thresholds for whether an article "counts" as belonging to a topic.

Mapping

Given that the new ontology has 64 leaf nodes, we will likely want to roll that up to something like 30 or fewer, because we believe that any more will be overwhelming to the user. This means that we may combine some, like "Biology", "Chemistry", "Physics", and "Space" all into a label we call "Science".

In the interface, we may want to put topics under hard-coded headings, like putting "Science", "Technology", "Engineering" under a "Math and Science" header (as shown in this mockup). So we would need to decide those.

One other consideration is whether local-language geographies should be exposed as a "special topic". In other words, we plan to expose "Geography" as a topic in all languages, but perhaps it would also be nice to expose "Eastern European geography" for Czech Wikipedia or "Southeast Asian geography" for Vietnamese Wikipedia, etc.

Thresholds

We have to decide when an article "counts" as being part of a topic.

At one point, we attempted to count an article as part of a topic if its highest topic score was that topic. This will likely not work going forward, as the ORES models can score Geography very accurately, and so most local language topics would end up with a highest score of geography.

It may be simple to not worry about thresholds and just return the articles sorted by score. The downside, though, is that we don't want all newcomers to receive the same articles, and so we need to sort randomly at some threshold cutoff to give each newcomer their own unique set.

Ontology

Below is the articletopic ontology. Here's how it works:

  • Each row is a "leaf node", and every article gets a separate independent score for each one.
  • The leaves range from level 2 to level 4 in the ontology.
  • The asterisked topics, e.g. Culture.Media.Media*, are "catch-all" topics. The best way to think about these is that they would be leaf nodes, except there are a couple sub-topics we wanted to break out specifically, which may not actually cover the full breadth of the asterisked topic.
Culture.Biography.Biography*
Culture.Biography.Women
Culture.Food and drink
Culture.Internet culture
Culture.Linguistics
Culture.Literature
Culture.Media.Books
Culture.Media.Entertainment
Culture.Media.Films
Culture.Media.Media*
Culture.Media.Music
Culture.Media.Radio
Culture.Media.Software
Culture.Media.Television
Culture.Media.Video games
Culture.Performing arts
Culture.Philosophy and religion
Culture.Sports
Culture.Visual arts.Architecture
Culture.Visual arts.Comics and Anime
Culture.Visual arts.Fashion
Culture.Visual arts.Visual arts*
Geography.Geographical
Geography.Regions.Africa.Africa*
Geography.Regions.Africa.Central Africa
Geography.Regions.Africa.Eastern Africa
Geography.Regions.Africa.Northern Africa
Geography.Regions.Africa.Southern Africa
Geography.Regions.Africa.Western Africa
Geography.Regions.Americas.Central America
Geography.Regions.Americas.North America
Geography.Regions.Americas.South America
Geography.Regions.Asia.Asia*
Geography.Regions.Asia.Central Asia
Geography.Regions.Asia.East Asia
Geography.Regions.Asia.North Asia
Geography.Regions.Asia.South Asia
Geography.Regions.Asia.Southeast Asia
Geography.Regions.Asia.West Asia
Geography.Regions.Europe.Eastern Europe
Geography.Regions.Europe.Europe*
Geography.Regions.Europe.Northern Europe
Geography.Regions.Europe.Southern Europe
Geography.Regions.Europe.Western Europe
Geography.Regions.Oceania
History and Society.Business and economics
History and Society.Education
History and Society.History
History and Society.Military and warfare
History and Society.Politics and government
History and Society.Society
History and Society.Transportation
STEM.Biology
STEM.Chemistry
STEM.Computing
STEM.Earth and environment
STEM.Engineering
STEM.Libraries & Information
STEM.Mathematics
STEM.Medicine & Health
STEM.Physics
STEM.STEM*
STEM.Space
STEM.Technology

Event Timeline

Working with @RHo and @Halfak today, I made a mapping of the new ORES ontology to elements that would be in our UX. I ended up with 39 topic buttons under four headers. What I came up with is in this spreadsheet, which can be read in this way:

  • ORES to UX mapping: this tab has one row for each of the 64 topic nodes that come out of the ORES models.
    • The first column shows them the way they come out of the API.
    • The subsequent four columns parse them into their different levels in the ontology.
    • UX header: this is the header that the topic would be mapped under in the UX. The capitalization and punctuation in this column is deliberate.
    • UX button: this is the button label that the topic would be mapped to in the UX (with multiple rows mapped to the same label). In many cases, I have come up with new button labels that are phrases that don't occur in the original ontology, e.g. "Computers and internet". The capitalization and punctuation in this column is deliberate.
  • List and order of topics: this is a de-duplicated list of the headers and topic buttons that would go in the UX. It is also represents the ordering we will want to show the users. The rule is that the order of the headers is consistent across languages: Culture; History and Society; Science, Technology, and Math; Geography. But the topics themselves should be displayed alphabetically in the local language. Engineers, please let us know if that doesn't look like a good approach.
  • WikiProject mapping: the link on this tab shows which English WikiProjects were used to train each topic model, which helps us understand the kind of articles we'll see in them.
  • morelike topics: the list of topics being used for the morelike version of topic modeling, for our reference.

Here's some of the thinking behind how we did this:

  • We consolidated some very similar topics to reduce buttons on the page. Examples are:
    • Books, Literature --> Literature
    • Geographical, Earth and environment --> Earth and environment
  • We omitted a couple topics that were too granular to include anywhere, like Linguistics.
  • Although the model can distinguish between different parts of continents, we have just one geography topic per continent. (Note: "North Asia" is included in Europe because the "North Asia" articles are almost entirely about Russia.

Here are the next steps:

  • @Halfak is working on determining model score cutoffs in T244297: Newcomer tasks: set initial thresholds for ORES articletopic.
  • @RHo -- could you please take the topics in the "List and order of topics" and put them into a little prototype of topic overlay? I would like us to get a sense of what it's like to use with that many topics. One of my concerns is that topics below the fold won't get seen.
  • @kostajh @Tgr @Catrope -- could you please comment on what you think of this design, which necessitates scrolling in the topic overlay and topic filter, and which adds this element of headers? And could you comment on whether the topic mapping as we've done it looks right to you and is something we can work with?

But the topics themselves should be displayed alphabetically in the local language. Engineers, please let us know if that doesn't look like a good approach.

This shouldn't be an issue.,

could you please comment on what you think of this design, which necessitates scrolling in the topic overlay and topic filter, and which adds this element of headers?

Implementation-wise, it wouldn't be much work. We'd have four instances of the "group of pills" widget that we currently have, and we'd have to gather/distribute the topic settings across them. We would presumably disable the "show more" functionality, because it doesn't make as much sense in this arrangement.

And could you comment on whether the topic mapping as we've done it looks right to you and is something we can work with?

It looks reasonable to me, but it would be helpful to have the reversed view (i.e. a mapping from UX headers to lists of ORES topics), both for review purposes and because we'll likely need it for implementation.

Thanks, @Catrope.

Regarding the "reversed view", I added this sheet to the workbook, which shows the correspondence between UX buttons and ORES topics the other way around. Let me know if you meant something different.

I also created T244421: Newcomer tasks: UX changes for ORES topics to specify the UX changes to buttons and headers.

MMiller_WMF renamed this task from Newcomer tasks: determine how to use ORES ontology to Newcomer tasks: ORES ontology mapping and score thresholds.Feb 5 2020, 10:55 PM

Looks good to me. @MMiller_WMF can you please make this sheet publicly viewable unless there's reason not to yet?

In theory there are two mappings:

  1. map the ORES taxonomy to topic search keywords
  2. map topic search keywords to Suggested Edits topic buttons

We could have #1 as a no-op, in which case manual search queries would look like articletopic:"Culture.Visual arts.Visual arts\*" (not 100% sure I got the escaping right) - which is not nice, to put it mildly.
Or we could make #2 a no-op, so users doing a manual search would be limited to the same topic groups we use for Suggested Edits checkboxes. That seems like an artificial limitation that we should not force on people.
Or we could make #1 an 1:1 mapping, just to get saner keywords, so e.g. map Culture.Biography.Biography* to biography, Culture.Food and drink to food-and-drink etc. And then define Suggested Edits topics as groups of search keywords so e.g. the asia button in Suggested Edits would be translated into the articletopic:asia|central-asia|east-asia|south-asia|southeast-asia|west-asia query. (Not quite sure about the semantics and naming of catch-all topics. Is Geography.Regions.Asia.Asia* everything that's Asia-related, or only those things which are not covered by the other leaf nodes? In that case it should be called something like asia-misc probably.) That seems like the nice solution to me, although slightly more effort. The first mapping would ideally live in ORES, although we could also put it into MediaWiki (the CirrusSearch config, maybe); the second in GrowthExperiments config. @EBernhardson, @Gehel, @Halfak any thoughts?

Also, just to confirm, the mapping (and ORES taxonomy) will be identical across all wikis, right?

@Tgr -- yes, the taxonomy and mapping will be the same across all wikis.

So I'd probably put the ORES label -> search keyword mapping in mediawiki-config, and the search keywords -> topic mapping on a configuration page on mediawiki.org (and use i18n messages for defining / translating topic names).

Halfak moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.

Looks like this is done from the Machine-Learning-Team side. Let me know if you need anything else from us.

What I came up with is in this spreadsheet

Added a new tab with the proposed keywords. Not much to see there, they are just the same words written in a consistent ASCII-lowercase-and-hyphen-only format. The ones in the Search keyword column would be user-visible though (that's what you need to type into the search bar) so you might want to review if they make sense, @MMiller_WMF and @Halfak. (The other two columns are internal to GrowthExperiments and don't really matter.)

@Tgr -- I think they look fine. Thanks.

@Halfak what do you think about applying P10407 within ORES and already outputting predictions in that form? We'll definitely need identifier-like names for the search syntax, and it would be nice not to have two sets of public identifiers for the topic names.

I think since we are now about to go to production with a mapping and score thresholds, this is done. We may change the mapping or thresholds in the future based on results from users. Thank you!