Page MenuHomePhabricator

Beta Commons - File Caption language name and text values smooshed together in search results
Closed, ResolvedPublic

Description

This is an old bug that currently can be seen on production Commons with Description text as well. But with the introduction of multilingual captions, the bug gets more confusing. The simplest fix would probably be just some kind of delimeters between the Language name and the text.

Ex: English | This is English - Spanish | Esto es espanol
(perhaps @PDrouin-WMF has some ideas?)

beta-search-text-mashed.PNG (172×804 px, 25 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ramsey-WMF moved this task from Untriaged to Next up on the Multimedia board.

Would it be possible to do something like this?:

English: This is English | Spanish: Esto es espanol

Ramsey-WMF raised the priority of this task from Low to High.Nov 6 2018, 7:40 PM
Ramsey-WMF added a subscriber: Jdforrester-WMF.

Update on this bug: it is more serious than simple display. It seems the language name is actually being prepended to the caption string.

Example file:

https://commons.wikimedia.beta.wmflabs.org/wiki/File:Cmbrowser.png

This file has 4 file captions on it, all added hours (or days) ago:

Its English caption is "Screenshot bunny"

  • Searching for "bunny" (which is in the English caption) is a hit in the search results
  • Searching for "screenshot bunny" returns no hit
  • Searching for "Englishscreenshot" returns a match in the search results
  • Searching for "knabino", which is the Esperanto caption, produces no matches
  • Searching for "Esperantoknabino" returns a match.

Additional note: this behavior seems to be inconsistent across files.

Example file:

https://commons.wikimedia.beta.wmflabs.org/wiki/File:Crystal-1657.stl

Two captions here, English and Spanish:

  • All permutations of the English caption, without the "English" prefix, work fine (example: search for "procedurally")
  • Same goes for the Spanish caption, El Guapo. Comes up just fine.

Change 472641 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Ensure captions are separated in auxiliary_text in search index

https://gerrit.wikimedia.org/r/472641

Change 472641 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Ensure captions are separated in auxiliary_text in search index

https://gerrit.wikimedia.org/r/472641

OK, so they now get spaces:

image.png (479×729 px, 69 KB)

Adding |s or whatever as separators would be hard without doing a full custom result system, because this is how Cirrus is built to respond to contents of tables. I understand that that Epic is wanted, but is beyond the scope of T187438 and the intitial launch.

As another point in favour of the wider work which we will want to do that anyway, we're hard-coding the language values (but not the content) into the search result output in English – this is what it looks like in Chinese, as an example:

Screenshot 2018-11-09 at 11.35.50.png (472×721 px, 73 KB)

Instead of "English", "Spanish", and "Esperanto" it should really show "英文", "西班牙文", and "世界文" respectively. Maybe push these concerns into a new epic post-deployment task for better search experiences ahead of the "Semantic Search results" concept?

Thanks for the update and additional points, @Jdforrester-WMF . For the question of the larger tasks and changes to search, I'm gonna ping @EBjune . Fodder for our usual meetup next Wednesday?

Thanks for the update and additional points, @Jdforrester-WMF . For the question of the larger tasks and changes to search, I'm gonna ping @EBjune . Fodder for our usual meetup next Wednesday?

To put it into specific product asks, here are some potential user stories that we can't do right now:

  • "As a user making a search for files, I want search results to lay out the different languages in which captions are available to me in a clean way"
  • "As a user making a search for files by looking for terms in caption values, I want search results to label the languages of captions in my own language"
  • "As a user making a search for files by looking for terms in caption values, I want to be able to search for particular values each in a particular language, and see the results highlighted"
    • e.g. 'gift' in a German caption means poison, which is a very different thing from in English ;-)

This particular bug seems to be fixed. Will create a new Epic based on the other issues mentioned above.