Page MenuHomePhabricator

Repeated patterns and <nowiki> output in search result descriptions
Open, LowPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

I see these repeated patterns.

What should have happened instead?:

I should see no patterns.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

When I click the items to edit them, the patterns aren't present in the wikitext. So they must not exist. Or, OK, if they do exists, they look like a bug.

20230307T171703.jpg (337×250 px, 22 KB)

Event Timeline

Aklapper changed the task status from Open to Stalled.EditedMar 7 2023, 9:45 AM

Hi, I'm not sure what a "pattern" is. Please elaborate what you expect and why without paraphrasing but using specific input, and what happens instead. And what "item" you click. Please be specific.

<nowiki>Wet Wet Wet; Wet Wet Wet; Wet Wet Wet; Wet Wet Wet; Wet Wet Wet; Wet
Wet Wet; Wet Wet Wet; Wet Wet Wet; Ует Ует Ует; Wet Wet Wet; Wet Wet Wet; Wet
Wet

Ah, thanks! Not sure if this is still MediaSearch or already WikibaseMediaInfo...

Aklapper renamed this task from Repeated patterns in search results, that don't exist to Repeated patterns and <nowiki> output in search result descriptions.Mar 7 2023, 11:32 AM
Aklapper changed the task status from Stalled to Open.

That looks to be the "File usage on Commons" and "File usage on other wikis" shinning though at least on the Wet Wet Wet category (I haven't looked at the other two) as the only image in there is https://commons.wikimedia.org/wiki/File:Wetwetwet-montreux.jpg

TheDJ added subscribers: Mike_Peel, TheDJ.

You can get the same descriptions with normal search
https://commons.wikimedia.org/w/index.php?search=wet+wet+wet&title=Special:Search&profile=advanced&fulltext=1&ns14=1

If you throw Category:Wet Wet Wet's contents through Special:ExpandTemplates, you get the same content and it is generated by {{Wikidata Infobox}} (the only content of the category wikipage) and simply a list of ALL the wikidata labels of the category.

This probably helped search engine matches in the old Search, but I'm not sure if this is still needed with the recent optimizations that the Discovery-Search team has made... (worse, i hope it didn't effect their measurements and led to a suboptimal solution...)

CC @Mike_Peel who I'm sure has worked on this :)

Thanks @TheDJ. I'm happy to claim responsibility for this - it's because the Wikidata Infobox improves multilingual search engine optimisation in Commons categories that it's used in by including all language Wikidata labels. I don't think the changes to search helped things here (from my personal point of view, they made things worse by hiding category search results). I've been talking about this for several years now as an improvement to multilingual search on Commons, so I hope Discovery-Search have seen it and that it hasn't unexpectedly affected their research. I would be very reticent to turn this functionality off before knowing that multilingual search has indeed improved beyond this being necessary.

MediaSearch (which is the default search on Commons) does some pre-processing, using Structured Data on Commons and Wikidata. This helps a bit with multilingual search.

CIrrusSearch itself only indexes the wikitext of the article (and some peripheral data), but does not do anything fancy around multi-lingual indexing or integration with Wikidata. Doing so would be a major project (it might well be worth it, but it's unlikely to happen soon).

In the current context, it's likely that Wikidata Infobox is helping with retrieval of more articles. It might also introduce some issues with precision. It does have the added advantage of being potentially useful for external search engines.

Another limitation of the on wiki search is that we don't honour <nowiki/> to hide those generated keywords in the search result page (SRP). Using different text for selecting pages and for displaying them might be possible, but introduces a lot of questions on how that should work. The snippets shown on SRP are based on keywords used in the search. If a search returns a page but non of those keywords are displayable, we need to review how those snippets are generated. That's a solvable problem, but it does bring added complexity.

In short: Search Platform is aware of the issue, but unlikely to do anything about it in the medium term.

Gehel moved this task from needs triage to Feature Requests on the Discovery-Search board.

Copying over the report from the other ticket, in case that helps resolve this:

The Wikidata Infobox (which I maintain) that is used in Commons categories and pages embeds the labels from all languages from the Wikidata item in the returned code to enable multilingual search on Commons. It is wrapped in nowiki to avoid it being shown on the page. However, search results sometimes show the multilingual labels, and the nowiki tag, for example at:
https://commons.wikimedia.org/w/index.php?title=Special:MediaSearch&search=Solar+eclipse+of+2024+April+8&type=page

This is also present in the older search:
https://commons.wikimedia.org/w/index.php?search=Solar+eclipse+of+2024+April+8&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1&ns6=1&ns12=1&ns14=1&ns100=1&ns106=1

The presence of the nowiki tag in particular is puzzling, since code doesn't normally show up in search results? Is there a way to avoid that?

The display of the other language content is unsurprising, though. If there's a better way of coding this in the Lua infobox code while still returning multilingual results, I'm happy to explore it.

Originally reported at https://commons.wikimedia.org/wiki/Template_talk:Wikidata_Infobox#Issue_with_Search_Feature_on_Wikimedia