Page MenuHomePhabricator

Index Wikidata entities in web search engines in all the available languages
Open, Needs TriagePublic

Description

As a user of the most popular web search engines I want Wikidata entities to be indexed in all the available languages in order to find their data when writing keywords or clues in my native language.

Problem: Unless there's a link to a Wikidata entity with ?uselang=xx somewhere, web search engines only index each Wikidata entity in English. Wikidata is multilingual but the fact that users search in their native languages makes them unable to find entities that really contain the keywords they specify and the data they need.

Example: Currently, there are 9000-10000 entities indexed in Google in French, 200-300 in Polish and 0 in Aragonese.

Acceptance criteria:

  • Most Wikidata entities (preferably, all) are indexed in most available languages (preferably, all) in the most popular web search engines

Event Timeline

abian created this task.Jul 31 2018, 8:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 31 2018, 8:02 PM
abian added a comment.EditedAug 8 2018, 9:18 AM

Apparently, 20-30% of the Wikidata entities aren't indexed by Google in English (no uselang defined) either. I haven't found any good reason, some of them have a good number of sitelinks and are linked from several infoboxes.

Just want to add my support for this, especially in terms of structured data for commons images. The structured data is nonexistent in other languages when searching for images in google. I love the move to structured data and its multilingual ability, but if no one can find anything in languages other than English, it's not very useful. Here's an example:

The Image with "structured data":
https://commons.wikimedia.org/wiki/File:Red_flowers,_in_Constantine.jpg

Google search In English using words from the "structured data," resulting in the correct image near the top:
https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fcommons.wikimedia.org%2F+%22flower%22+%22forest%22+%22green%22+%22algeria%22&tbm=isch

Google search in Spanish using the same words, but with the Spanish words from the "structured data", resulting in no images:
https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fcommons.wikimedia.org%2F+%22flor%22+%22bosque%22+%22verde%22+%22argelia%22&tbm=isch