Page MenuHomePhabricator

dewiki: "Search results from Polish Wikipedia" lists search results from dewiki
Closed, ResolvedPublic

Description

This problem was reported here on dewiki.

  • Search for Jan Nepomucen Umiński on dewiki.
  • The article doesn't exist yet.
  • Therefore the search page apparently shows results from the Polish Wikipedia, indicated by the headline "Suchergebnisse von der polnischen Wikipedia" (i.e. "Search results from Polish Wikipedia")
  • However: The search results which are shown are not from Polish Wikipedia, they are actually articles on German Wikipedia.
  • The Polish article Jan Nepomucen Umiński is strangely not displayed, although the article exists since 2004.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 17 2019, 3:45 PM

hello. Can i work on this?

@Der_Keks yes. I'm an outreachy applicant trying to find my way around making a contribution.

Please try to use this task only for task-specific information. You can answer me on the (your) linked user talk page.
If you want to code, see here (you can change the language) and here.
We use Gerrit to review code, the git origin is also on gerrit: https://www.mediawiki.org/wiki/Gerrit/Tutorial

Don't let yourself be slain from the walltext. If you need help arround the Wiki itself just write me on DerKeks@wikipedia.de. For coding specific things you can ask @matmarex or Aklapper (renouncing ping). They only rarely bite :)

EBernhardson triaged this task as High priority.Oct 17 2019, 7:37 PM
EBernhardson added a subscriber: EBernhardson.

However: The search results which are shown are not from Polish Wikipedia, they are actually articles on German Wikipedia.
The Polish article Jan Nepomucen Umiński is strangely not displayed, although the article exists since 2004.

I looked into this, it seems:

  • The search results are from plwiki, as evidenced by search snippets containing text that doesn't exist on the dewiki pages.
  • Something must be interpreting the plwiki results as local which has two knock on effects. Of course first off the urls are then constructed incorrectly, secondly all results that are not real pages on dewiki are discarded, giving the impression that it found dewiki pages. This is why the exact match from plwiki is not displayed.

An example that shows this bug a little more obviously searching for russian on eswiki: https://es.wikipedia.org/wiki/?search=различные%20формы%20жизни

Clearly the snippet is russian, and we get a single result because most russian pages have cyrillic titles, somehow this result happened to have a latin title.

I will need to dig into this a bit, unfortunately testing/debugging anything to do with cross-wiki integration is pretty painful, and due to the recent switch from hhvm -> php7 i'm no longer (yet) able to attach a debugger to code running in production to step through and see where things go wrong. This looks to affect sister search across the entire cluster and needs to be looked into.

TJones added a subscriber: TJones.Oct 17 2019, 8:14 PM

Another odd side effect is that if there are no results with matching titles on the local wiki, no results are displayed, though the message "Showing Results from <other> Wikipedia" still shows up. For example here.

Wait.. it gets weirder. When I went to check the link above, I got one result, "София" (which is also a redirect to "Sofia" on enwiki). The results shown are inconsistent as I reload the page multiple times—София shows up for 5 out of 20 reloads. (The inconsistent results may be unrelated to the wrong-wiki links, so ignore it for now and I'll retest when the main problem gets sorted out.)

Change 545559 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Make sure to use host wiki components when issuing a sister wiki search

https://gerrit.wikimedia.org/r/545559

dcausse claimed this task.Oct 23 2019, 1:08 PM
dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board.

Change 545559 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Make sure to use host wiki components when issuing a sister wiki search

https://gerrit.wikimedia.org/r/545559

Gehel closed this task as Resolved.Oct 29 2019, 5:51 PM

@Gehel Will this change be deployed with the train next week?

@JStrodt_WMDE it will be deployed as part of this train 1.35.0-wmf.4, it should reach dewiki in the evening tomorrow (thursday 31st) if the train rolls forward as usual.

Wonderful, thanks!