Page MenuHomePhabricator

Yoruba Language Wikipedia not being indexed by search engines
Closed, DeclinedPublic

Description

Yoruba language Wikipedia (https://yo.wikipedia.org/) is not indexed by Google and other search engine. This is impacting negatively on the project. We'd be glad if this problem is fixed.

Event Timeline

Reedy renamed this task from Yoruba Language Wikipedia to Yoruba Language Wikipedia not being indexed by search engines.Oct 23 2019, 11:59 AM
colewhite triaged this task as Medium priority.Oct 23 2019, 7:15 PM

@Krinkle Is Reading web team associated with this task? Just wondering why you added Web-Team-Backlog tag in the duplicate.

Yes. In the past, the team most active as stakeholder around the topic of Wikipedia being indexed by search engines was Reading. Operations might be able to help out once a more specific issue is known to need their help.

That article was only created today and is likely not been picked up by search crawlers. Google's algorithm prioritises what gets indexed and we have no control over that priority.

The bug suggests that no pages are getting indexed, but they clearly are. Are there any pages which have been created over a month ago that are not in google?

@Jdlrobson articles created from October till date are not indexed but older articles does. But I don't think that should be a problem. What do you think?

I would hope not, but really this is up to Google. There is nothing wrong on our side and nothing we can do other than find a contact in Google to get their search engines to prioritise this content. I'd hope that would happen over time. In the mean time, ensuring new pages are linked to English Wikipedia via language links and are shared on external sites is probably the best thing you can do to increase discovery.

Can I resolve this task or is there more you'd like to do?

Please, feel free to resolve this task

Jdlrobson claimed this task.

I'm sorry I can't be more help @Wikicology :(

In my experience, Google's index of Wikipedia content tends to be updated near real-time, in the order of mere seconds for global search results to already reflect updated summaries and statements. This is made possible in part through RCStream/EventStreams which I believe Google subscribes to in some form, as well as through daily indexing of dynamic lists such as Special:Log, Special:Recentchanges that provide organic discovery of new pages.

New articles taking a week to index is unheard of. If it takes that long to be indexed (not to be confused with being ranked highly) I would consider that a problem that could seriously damage a smaller wiki's ability to succeed in our current ecosystem.

It is almost certainly the result of a bug and not Google's choice to intentionally not index. This should imho be investigated for possible causes on our end, including with escalation paths to get data through our Google Search Console, and to any internal contacts we have.

The fact some of that Dẹ̀jọ Túnfúlù is still not picked up in a month's time does seem a bit suspect but this could be down to article size or lack of incoming links.

Looking through Special:RecentChanges, newly created pages are still not being indexed
e.g.

All of these lack incoming links and are relatively short articles.

Olga can you

  1. check Google search console to see if these pages are being indexed and/or have problems
  2. Contact google to see if there is any reason such pages would be skipped by Google's indexing (e.g. size or lack of incoming links).
  3. Check if any of the accented characters could cause problems with google indexing.

    I should however note that using Special:Random, 10 out of 10 pages I landed on were indexed by google so I suspect pages impacted is limited to newer pages.

@ovasileva: Could you please check the last comment? (You were not CC'ed so you could not see it.) Thanks! :)

  • Searching on google.com for "Dẹ̀jọ Túnfúlù" now lists https://yo.wikipedia.org/wiki/Dẹ̀jọ_Túnfúlù . Also finds Ambrose Olútáyọ̀ Ṣómidé‎, Fàrándà.
  • However, "Eré Òṣùpá" site:yo.wikipedia.org and "Ọjà Balógun" site:yo.wikipedia.org still has no results on google.com here.
  • (yowp page Àwọn ojọ́ìbí ní 1928‎ does not exist anymore.)

@ovasileva: Could you please check the last comment? (You were not CC'ed so you could not see it.) Thanks! :)

  • Searching on google.com for "Dẹ̀jọ Túnfúlù" now lists https://yo.wikipedia.org/wiki/Dẹ̀jọ_Túnfúlù . Also finds Ambrose Olútáyọ̀ Ṣómidé‎, Fàrándà.
  • However, "Eré Òṣùpá" site:yo.wikipedia.org and "Ọjà Balógun" site:yo.wikipedia.org still has no results on google.com here.
  • (yowp page Àwọn ojọ́ìbí ní 1928‎ does not exist anymore.)

Will look into this.

nshahquinn-wmf subscribed.

@ovasileva, I'm not certain what you'd like Product-Analytics to do here (maybe check Google Search Console to see what it says about the missing pages?).

Can you create a subtask and clarify the request there?

For me google only shows one result in the mentioned link above. And this is the discussion about this problem https://de.wiktionary.org/wiki/Wiktionary:Fragen_zum_Wiktionary/Archiv/2020 not the link to the article, which would be https://de.wiktionary.org/wiki/Darweisung. The article was created on 7. Jan. 2020 and google still doesn't show a link to it, but shows a link to a discussion about this phenomenon which was started several months later on 23. May 2020.

In T236241#6062190, nshahquinn-wmf wrote:

ovasileva, I'm not certain what you'd like Product-Analytics to do here (maybe check Google Search Console to see what it says about the missing pages?).

Removing tag.

In T236241#5619630, Krinkle wrote:

Operations might be able to help out once a more specific issue is known to need their help.

Removing tag.

Cannot reproduce with any of the given / left test case, hence closing this ticket.

Screenshot from 2020-11-27 13-48-40.png (485×940 px, 91 KB)

Screenshot from 2020-11-27 13-48-49.png (485×938 px, 89 KB)

If this still happens, then please file a new ticket including URLs to the articles in question. Thanks!