Page MenuHomePhabricator

Newcomer tasks: ambassadors test morelike
Closed, ResolvedPublic

Description

It is time to test out the results from implementing the "morelike backend" -- our ability to search for suggested edits by topic. It's not yet part of the frontend, so we need to test it via the API sandbox. Below are instructions for how to do this.

  • Czech
  • Korean
  • Vietnamese
  • Arabic

Note that this is not as thorough an evaluation as we did on T234272: Newcomer tasks: evaluate topic matching prototypes. Instead of scoring every article for every topic, we just need a quick scan of each topic to make sure we mapped your article lists correctly.

Here's how to try it.

  1. Go to the following URL, filling in your wiki's language code: https://[WIKI].wikipedia.org/wiki/Special:ApiSandbox#action=query&format=xml&prop=&list=growthtasks&gtlimit=10
  2. Click "list=growthtasks" in the menu on the left side

image.png (233×262 px, 8 KB)

  1. Next to "gttopics", select a topic.

image.png (181×481 px, 8 KB)

  1. Click "Make request"

image.png (50×139 px, 1 KB)

  1. Look through the resulting articles, and note whether the articles generally are or are not good matches for the topic.
  2. Repeat for each topic, and when finished, comment on this task with what you noticed, if you saw any major issues or concerns, or if the algorithm seems to be performing worse than when you previously tested via the prototype.

Please also scan over the configuration pages that contain the article lists, to check for any typos or copy/paste errors:

Event Timeline

MMiller_WMF updated the task description. (Show Details)
MMiller_WMF set Due Date to Jan 13 2020, 8:00 AM.

@Urbanecm @revi @Dyolf77_WMF @PPham -- this is ready for you to test. Please finish by Monday, Jan 13.

Hi Marshall,

It suddenly occurs to me that I have to go for a trip this weekend and I
can't bring my laptop with me. I won't be back until Monday morning my time
(which means Sunday evening your time). You set the due date to be Monday
15:00, which is in GMT right? If so, I'm afraid I can't make it in time.
Can I do it on Monday when I get back?

Thank you Marshall.

That's fine, the algorithm is not as I expected but it is a good start. My detailed comments are here

@PPham -- thanks for letting me know. It is okay for you to do it on Monday. Have a good trip!

@Dyolf77_WMF -- thank you for the thorough review. We'll see if anything can be improved on our side. @Catrope @kostajh @Tgr -- the Google Doc that @Dyolf77_WMF made indicates some topics in Arabic that got 0 of 10 good results. Could we maybe check the mapping/pasting of the seed articles? Or maybe @Dyolf77_WMF should look them over on the config page.

Thanks, @Catrope.

Yes, all ambassadors (@Dyolf77_WMF @revi @PPham @Urbanecm), please check over the config pages that @Catrope listed, because any typos or copy/paste errors could cause morelike to give bad results.

Thanks, @Catrope.

Yes, all ambassadors (@Dyolf77_WMF @revi @PPham @Urbanecm), please check over the config pages that @Catrope listed, because any typos or copy/paste errors could cause morelike to give bad results.

Checked, they look fine for Arabic.

I'm thinking we should have some kind of debug mode that returns more details about why those tasks got selected (for the morelike backend, that would probably be in the form of links to the CirrusSearch debug dumps for the component queries - those dumps show what words were used for the morelike match, with what weight), plus the English Wikidata title/description of the article if available, to make it easier for non-speakers of the language to debug results.

I'm thinking we should have some kind of debug mode that returns more details about why those tasks got selected (for the morelike backend, that would probably be in the form of links to the CirrusSearch debug dumps for the component queries - those dumps show what words were used for the morelike match, with what weight), plus the English Wikidata title/description of the article if available, to make it easier for non-speakers of the language to debug results.

+1

Selected arts and got the following:

  • Mír (peace) - why?
  • Sanskrt (Sanskrit) - why?
  • Obchodní firma (business name) - why?
  • Rumunsko (Romania) - why?
  • Zdenka Marie Nováková - LGTM, a painter
  • Adamov (okres Blansko) (a city) - why?
  • Frans Hals - a painter, LGTM
  • Mladý obal (contest about designing cases) - LGTM
  • Jed (poison) - why?

Do we think three out of nine (33%) articles looking good is an acceptable score, @MMiller_WMF?

@Tgr @Catrope @kostajh Any ways how to see why an article got in, so I can alter the lists if needed?

Selected arts and got the following:

  • Cao nguyên Nam Cameroon: a highland in Cameroon, off-topic
  • Fidarestat: a chemical substance, off-topic
  • Nam Anh: a village, off-topic
  • Các nước thành viên Liên minh châu Âu: EU's countries, off-topic
  • Hệ thống nông nghiệp: agricultural system, off-topic
  • Khutulun: a Mongolian woman, off-topic (oh, I see why this article is selected: this woman shows up in a lot of books, dramas, soap operas and Morelike picks up the words...)
  • Phenoxyethanol: a chemical substance, off-topic
  • Robert Allen Pease: an analog integrated circuit design expert, off-topic
  • Khu đô thị Trung Hòa - Nhân Chính: an urban area in Hanoi, totally off-topic

So it's 0% then? And arts isn't the only topic this bad. Every other topics are like this in Vietnamese...

Change 563766 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] ApiQueryGrowthTasks: remove logged-in requirement

https://gerrit.wikimedia.org/r/563766

Change 563773 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: Add debug mode

https://gerrit.wikimedia.org/r/563773

Change 563766 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] ApiQueryGrowthTasks: remove logged-in requirement

https://gerrit.wikimedia.org/r/563766

Per @MMiller_WMF's request, I'm adding some other topics:

  • History
    • Anonymous (skupina) - how are anonymous relevant?
    • GRU - well, part of the article is history-related, so probably okay?
    • Administrativní dělení Švýcarska (cantons of Switzerland) - irrelevant
    • Osvojení (adoption) - irrelevant, apart a short part of the article, it doesn't even describe the history of adoption
    • Helikoptérové peníze (Helicopter money) - more related to economy, but perhaps okay?
  • Economics
    • Slezané (Silesians) - isn't that more related to history
    • Sociální stratifikace (Social stratification) - LGTM
    • Martin Dvořák (manažer) (executive director of major companies, like a TV company or Prague city transport) - LGTM
    • Albánie (Albania) - why?
    • Ekonomika měst a obchod ve středověké Anglii (Economics of English towns and trade in the Middle Ages) - more history related I guess, but LGTM
  • Science
    • Věda (science) - LGTM
    • Suchoj Su-25 (Sukhoi Su-25) - partially relevant, let's take it
    • Turecko (Turkey) - why?
    • Otrokářství (slavery) - why?
    • Ekonomický sektor (economic sector) - LGTM

Seems it's all relatively bad.

Change 563773 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: Add debug mode

https://gerrit.wikimedia.org/r/563773

Change 563983 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/GrowthExperiments@master] Suggested Edits: Pass debug flag to API query

https://gerrit.wikimedia.org/r/563983

Hi, since I discovered that it might be caused by simply wrongly picked articles. To test that theory, I played with Special:Search and changed the list of articles for arts in https://cs.wikipedia.org/w/index.php?title=MediaWiki:NewcomerTopics.json&diff=prev&oldid=18036160. Looks much better, out of first 10 articles, 9 are okay.

Yeah, I think specific articles is likely to result in much better results, but we need to try to remove bias in the seed list, or at least randomize the bias (for example in the updated list 3 out of 4 articles are about male artists).

Maybe we could do something in the API to use Special:WhatLinksHere, so rather than a morelike query which uses Umění directly, we would instead use a random set of 5 articles taken from https://cs.wikipedia.org/w/index.php?title=Speciální:Co_odkazuje_na/Umění, and those would be fed to morelike. There will still be bias, but it will be randomized. The downside is that the tool is no longer deterministic because you'll get different results each time you use the same topic filter; but that might be acceptable per the original design/product intentions of this feature.

To provide a more specific example, if a user selected "copyedit" as the task type and "games" as a topic, our backend would find the topic list for games which is "Hra" and "Počítačová hra". It would then perform a query to select some number of articles (5? 10?) that link to "Hra" (same query used to generate the list at Special:WhatLinksHere/Hra. Then it would take another N number (5? 10?) of articles from Special:WhatLinksHere/Počítačová hra, and then it would shuffle the list of 10-20 articles and take 5 at random. Those would then be passed on to the morelike query.

This seems like it might work better than what we currently do but we'd have to assess it in production to see for sure.

Config LGTM after small fix, topics are generally in bad shape: Popular Culture and Entertainment only had more than 5 good matches; rest (analyzied at this moment) were generally 0-2 good matches.

Change 563983 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Suggested Edits: Pass debug flag to API query

https://gerrit.wikimedia.org/r/563983

Change 564161 had a related patch set uploaded (by Catrope; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.14] Newcomer tasks: Add debug mode

https://gerrit.wikimedia.org/r/564161

Change 564161 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.14] Newcomer tasks: Add debug mode

https://gerrit.wikimedia.org/r/564161

@revi @Urbanecm @Dyolf77_WMF @PPham -- thank you so much for testing this so quickly! Your work has shown us that morelike is not performing as well as we expected -- it is doing substantially worse than we originally tested it in the prototype in T234272: Newcomer tasks: evaluate topic matching prototypes. We need to improve it before we can deploy it for users.

Therefore, today we changed some of the settings on the algorithm so that they better match what the prototype used ("classic_noboostlinks"). We think this will help the results. There are two things we would like you to do as soon as possible (hopefully in the next day, so we can continue to move forward):

  1. Try out the topics again. I know this takes a long time, but this is the only way we can know whether the algorithm is working well. I think a fast way to do this would be to try out some of the topics that performed the worst when you looked before, and see if they have improved. Please post your results here.

Please do #1 above before moving on #2

  1. Optional: you can try to change some of the seed articles in your config files to get better results. Our current thinking is that articles that are shorter and more specific to the topics we're looking for will get better results than long, general articles. For example, for the topic "Arts", the article for "Art" is long and talks about many things. We think that there will be better results if the seed articles are a list of specific works of art or artists, like "Vincent Van Gogh", "Georgia O'Keefe", "Swan Lake", "The Thinker". You can edit the config files directly (linked in the task description), and changes should be reflected with about 15 minutes via the API. I think you are all able to make edits to your config files except @PPham. @PPham, you can list yours separately, and we can make the edits for you.

Picked art again and here are the result:

  • Tượng sống: Living statue, good
  • Nghệ thuật Hy Lạp cổ: Ancient Greek art, good
  • Bố cục tạo hình: err, don't know the English word but it's art-related, good
  • Kiến trúc Tân cổ điển: Neoclassical architecture, good
  • Nghệ thuật Gothic: Gothic art, good
  • Hoàng Hải (nghệ sĩ cải lương): Hải is a singer but of a Vietnamese modern folk opera genre, so I think this one's half good? Still art-related if you consider singing is art ("art" in Vietnamese is "nghệ thuật", which does include singing, but in my personal impression "art" itself in English doesn't...)
  • Amanda Strydom: a singer too
  • Chủ nghĩa biểu hiện: expressionism, good
  • Du lịch Paris: Tourism in Paris, off-topic
  • Doraemon (hoạt hình): it's... an anime, can you call it art? Half good.

Art: 8 point out of 10.

Okay so I don't know what you guys did but it's definitely improved so much. So I picked some other topics but I'll just mark it instead of doing a full review (if you need it pls tell me)

  • Politics: 8/10, the weird thing is one article shows up twice ("Hệ thống xã hội chủ nghĩa", although it's related), and the "Tourism in Paris" shows up again (see "art" review above)
  • Education: 9/10, "Tourism in Paris" shows up again...
  • Religion: 8/10, off-topic articles are the planet "Mercury" (but I can guess why it shows up) and "Doraemon".
  • Crafts+Hobbies: 5.5/10, this topic is so hard for me to find seed articles because there're not many of it, so 5.5 is fair.
  • Health: 9/10, "Tourism in Paris", what are you doing here??

@MMiller_WMF, do you still think we need to change the seed articles? I still wanna do it though.

Updated the documentation of my comments.
There is a very big improvement in the algorithm. I found only one suggestion with 5/10 score all the rest 10/10 in majority. Thanks to all the team.

I think you are all able to make edits to your config files except @PPham. @PPham, you can list yours separately, and we can make the edits for you.

I can't make edits! Want to change for Crafts-hobbies: cc @Catrope

ExistentNew value
"حرفة""حرفة"
"هواية""هواية"
"ذات المراوح الأربع""هواية جمع الطوابع"
"صيد""الغزل اليدوي"
"(تجميع (هواية)""(تجميع (هواية"

Probably the easiest way is if ambassadors copy the page into their personal namespace (User:XXX/NewcomerTopics.json) and make changes there, and then either us or one of the administrators of the wiki can copy the new content back to the real page. That's less error prone than people unfamiliar with the script and RTL browser behavior trying to copy over text from tables.

(Relatedly, this is something I've been meaning to do but never gotten around it, so let's have a task to keep track of it: T242789: When using the local configuration loader for newcomer tasks, edits that bring the configuration page to an invalid state should be prevented)

Just to close the loop here and repeat what was said on various closed channels, the poor quality of results was probably caused by the task suggestion backend using the "random sorting" mode of CirrusSearch (fixed in rEGRE31bdede2dba2: Newcomer tasks: Don't randomize for morelike search), which is incompatible with morelike (which is also kind of a sorting mode) and replaces it. In our earlier testing of new patches we saw the exact opposite of this, morelike overriding randomization (that's why we did T242057); probably the local task suggester (which searches via internal PHP calls and is currently only used in production) and the remote task suggester (which searches via action API calls to another wiki, used in the beta cluster wikis and local development) differ in their behavior of which of the two conflicting parameters take precedence, which is why it took us a while to catch it. Sorry about the extra work this has generated.

Probably the easiest way is if ambassadors copy the page into their personal namespace (User:XXX/NewcomerTopics.json) and make changes there, and then either us or one of the administrators of the wiki can copy the new content back to the real page. That's less error prone than people unfamiliar with the script and RTL browser behavior trying to copy over text from tables.

I created the page and made the changes as suggested.

During some tests of more like in ar betawiki using the SE module, I found that the results, are about 1 to 3 points less than the tests in production made as explained in the task. For example, when selecting the topic arts, the SE is suggesting 2 to 5 right articles (tested many times). cc @MMiller_WMF

@Catrope @Tgr @kostajh -- can you see any reason why @Dyolf77_WMF would see worse recommendations in the module in beta than in the API in production?

Probably some default search option gets set differently when you use CirrusSearch via the API vs. direct PHP call. If you switch the gtdebug flag on in the API, it will return some score debug URLs, we can try looking at those. Although the way those URLs are built currently for local search is a bit fragile.

Did a quick skimming: Geography had 3~4 (depending on how generous you are)/10, craft-hobbies had 5/10, but otherwise 7~10/10.

Quick Q: Is it normal to show one article multiple times?

<suggestion title="한국 요리" tasktype="expand" difficulty="hard" order="2">
  <topics>
    <_v>food-drink</_v>
  </topics>
</suggestion>
<suggestion title="한국 요리" tasktype="references" difficulty="medium" order="3">
  <topics>
    <_v>food-drink</_v>
  </topics>
  <maintenancetemplates>
    <_v>출처 필요</_v>
  </maintenancetemplates>
</suggestion>

Testing few random topics from T242400#5796914 to compare and it looks to work fairly well, except for Geography (articles are related to the universe rather than to geography) and few other topics. Seems it's due to my seed articles being poor there, I'll play a bit and change them.

I created the page and made the changes as suggested.

I've copied your changes to the config page.

I created the page and made the changes as suggested.

I've copied your changes to the config page.

@Catrope please can you update the file, Art topic keeps giving bad suggestions, so I tried to change the keywords and see if it will refine the results. Thanks!

Probably the easiest way is if ambassadors copy the page into their personal namespace (User:XXX/NewcomerTopics.json) and make changes there, and then either us or one of the administrators of the wiki can copy the new content back to the real page. That's less error prone than people unfamiliar with the script and RTL browser behavior trying to copy over text from tables.

(Relatedly, this is something I've been meaning to do but never gotten around it, so let's have a task to keep track of it: T242789: When using the local configuration loader for newcomer tasks, edits that bring the configuration page to an invalid state should be prevented)

@MMiller_WMF I redid my seed article list here, please help me update it

@Dyolf77_WMF @PPham -- thank you for updating your lists. Hopefully it helps! Thanks for making the change, @Tgr.

Is there anything else to do?

Is there anything else to do?

Up to @MMiller_WMF to close it.

This is finished. Thank you!