Page MenuHomePhabricator

Adjust Wikidata search to surface more relevant results in Article Guidance
Closed, ResolvedPublic1 Estimated Story Points

Description

As part of Article Guidance (T396029), for the initial step of "Title entry and topic matching" users can search for Wikidata items.
The search is capable of finding relevant Wikidata items (and their associated outlines). However, in some cases where multiple results are available those don't seem sorted by relevance and the main expected result is left out of the list (even for exact matches). For example. searching for "Marie Curie" shows as the most prominent results the "Pierre and Marie Curie University ", the movie "Marie Curie: The Courage of Knowledge ", and the plant "Rosa 'Marie Curie' ", not leaving space in the results for Marie Curie (the person) despite being an exact match.

In the following examples, I'll compare search results for queries using Article Guidance on beta, and on the wikidata.org website (using both the search bar and advanced search):

QueryArticle guidanceWikidata.org Search barWikidata.org Search results page
Marie Curie
en.wikipedia.beta.wmcloud.org_wiki_Special_NewArticle(Wiki Mobile).png (1×320 px, 141 KB)
Screenshot 2026-04-27 at 16.12.32.png (964×1 px, 390 KB)
Screenshot 2026-04-27 at 16.15.04.png (1×1 px, 307 KB)

Event Timeline

After some investigation, I may have a clue of what is going on. Searching for "Marie Curie person" to try to make the Marie curie result to show in the results, the results finally shows but it is labelled as "Q7186".

en.wikipedia.beta.wmcloud.org_wiki_Special_NewArticle(Wiki Mobile) (1).png (568×320 px, 68 KB)

Looking at the Wikidata item I noticed that it does not have an explicit label in English, because it is using the "default for all languages" value instead.

Screenshot 2026-04-28 at 12.16.01.png (1×1 px, 265 KB)

This is a recommended practice from the Wikidata community to avoid repeating the exact same label across many languages (especially relevant for people and Companies). We may want to make sure we support the multilingual label unless it is overwritten by a local one. For searching we could use both. For displaying, we may want to use the local version and fall back to the multilingual one.

@Pginer-WMF switching to a different search API here would allow getting "Marie Curie" (the scientist) as the first result in the search but it would mean undoing the work done for T422854: Make Wikidata search more flexible in Article Guidance.

We could technically use both search APIs and combine the results but there is already a lot going on in the search flow so I would prefer to avoid that if we can.

Correct handling of the language fallback, OTOH, is a simple fix.

Change #1279446 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/ArticleGuidance@master] wbgetentities: use languagefallback=1

https://gerrit.wikimedia.org/r/1279446

Change #1279446 merged by jenkins-bot:

[mediawiki/extensions/ArticleGuidance@master] wbgetentities: use languagefallback=1

https://gerrit.wikimedia.org/r/1279446

Currently on Test wikipedia the search results seem to work well for the examples of both cases (this ticket, and T422854):

Marie CurieTitanicTitanic filmTitanic (film)
test.wikipedia.org_wiki_Special_NewArticle(Wiki Mobile).png (568×320 px, 64 KB)
test.wikipedia.org_wiki_Special_NewArticle(Wiki Mobile) (1).png (568×320 px, 41 KB)
test.wikipedia.org_wiki_Special_NewArticle(Wiki Mobile) (2).png (568×320 px, 39 KB)
test.wikipedia.org_wiki_Special_NewArticle(Wiki Mobile) (3).png (568×320 px, 39 KB)