Page MenuHomePhabricator

UkWiki article not indexed on Google
Open, Needs TriagePublic

Description

Ukrainian Wikipedia article https://uk.wikipedia.org/wiki/Сиротенко_Григорій_Тимофійович seems not to be indexed in Google search results (tried by multiple people from multiple countries).

Example search queries include "Сиротенко Григорій Тимофійович" (the article's exact title) and even "Сиротенко Григорій Тимофійович українська вікіпедія" (the article's exact title + "ukrainian wikipedia"

There don't seem to be any obvious reasons that would cause this (e.g. the article isn't recently created).

It's alternative on Russian Wikipedia does show up in search results.

Could you look into what might cause the issue?

Event Timeline

Most likely because Googlebot spider-crawler bot hasn't got round to indexing it yet. It's certainly indexing other pages from https://uk.wikipedia.org/

Most likely because Googlebot spider-crawler bot hasn't got round to indexing it yet. It's certainly indexing other pages from https://uk.wikipedia.org/

@Bugreporter2 note that the article was created almost 20 years ago, in 2005.

I'm afraid this needs to be reported to Google per last comment.

Searching on google.com for "Сиротенко Григорій Тимофійович" site:uk.wikipedia.org lists Сиротенко and other ukwiki pages but not this particular one

For future inquiries into this problem, here's one more examples of a Ukrainian Wikipedia article seemingly not indexed by Google: https://uk.wikipedia.org/w/index.php?curid=3225413

One more example of a Ukrainian Wikipedia article not indexed in Google search: https://uk.wikipedia.org/?curid=383672

Krinkle claimed this task.
Krinkle added subscribers: SCherukuwada, Krinkle.

I'm looking into this as follow-up to T214998: RFC: Serve mobile and desktop variants through the same URL (unified mobile routing) and T400022: 2025 Commons SEO review, because based on what we learned from the Commons case study, I'd like to see if some of those issues are or were playing on other wikis.

Before I start my own research, first a timeline of what we know so far:

My first assumption is that, if uk.wikipedia is indeed notably downranked or otherwise missing for whatever reason from Google Search results in Ukraine, with ru.wikipedia possibly presented instead, this should have resulted in a clear decline in Wikimedia pageviews for that wiki/country. We have data on this and so can verify that.

Let's have a look at pageviews stats to quantify the impact and to perhaps spot some notable dates.

Pageviews from April 2023 to Nov 2025 for uk.wikipedia.org

Turnilo query: https://w.wiki/G3BG (restricted)

All users:
pageviews-ukwiki-user-Android.png (2,622×1,472 px, 253 KB) pageviews-ukwiki-user.png (2,610×1,470 px, 123 KB)
Google-referred:
pageviews-ukwiki-Google_refer.png (2,856×1,474 px, 139 KB) pageviews-ukwiki-Google_refer-Android.png (2,856×1,478 px, 136 KB)

Observations:

  • May 2023: Sudden 40-50% decline in Google-referred pageviews, or 30% overall.
  • Sep 2023: Full recovery?
  • May 2024: Again a sudden 50% decline in Google-referred pageviews, and also ~50% overall.
  • Sep 2024: Partial recovery?
  • Dec 2024: More partial recovery?
  • Jan-Apr 2025: Slow additional 20% decline.
  • May 2025: Rapid additional 40% decline.
  • May 2023 - May 2025: If we ignore all the ups and downs, this two year period sees about a 50% net-decline overall.
    • All user: 25M/week to 14M/week.
    • Android: 12M/week to 6M/week.
    • Google-referred: 14M/week to 8M/week.
    • Google-referred and Android: 8M/week to 3.5M/week

Note: I use the "OS is Android" filter as a stable baseline to ignore unrelated spikes. There isn't anything special about Android per-se, I picked it for being a large enough subset to generally be resistant against rapid trend changes specific to itself, and large enough to be representative of external changes (i.e. in Google behavior) yet specific enough to be immune to most random spikes in unclassified bot traffic and other distractions.

Note: When I say "decline" and "recovery", those are just relative observations and not absolute judgements. There is a natural yearly and monthly seasonality to Wikipedia readership, and various factors and changes are in play at our scale at all times, including changes in measurements and improvements in classification of user/bot traffic. None of what I say is meant to imply that these lines "should" have stayed straight or that a "decline" is per-se evidence of a problem. It is however a reason to take a closer look and when a drop is very sharp, it is more likely than not to be a problem. The question is then, what kind of problem.

50% is huge, but it also still leaves a lot of traffic, which suggests something more nuanced than Google completely not crawling or indexing ukwiki pages, or otherwise not selecting/ranking ukwiki pages within results for Ukraine.

Let's take a closer look at a per-wiki and country breakdown. First, let's look at the same four plots with ru.wikipedia.org included:

Pageviews from April 2023 to Nov 2025 for ru.wikipedia and uk.wikipedia

Turnilo query: https://w.wiki/G3Nc

All users:
pageviews-ruwiki_and_ukwiki-user.png (2,453×1,500 px, 143 KB) pageviews-ruwiki_and_ukwiki-user-Android.png (2,447×1,498 px, 154 KB)
Google-referred:
pageviews-ruwiki_and_ukwiki-Google_refer.png (2,444×1,505 px, 149 KB) pageviews-ruwiki_and_ukwiki-Google_refer-Android.png (2,453×1,511 px, 158 KB)

If the "large" trend changes we're seeing are related, and if these large changes are the mystery problem we're looking for in this task — then, our mystery problem is not specific to uk.wikipedia. It is impacting ru.wikipedia in much the same way. It hasn't been hit has hard, but a number of reasons could explain why a large wiki is hit less severely than a small wiki (e.g. number of pages, domain/page rank and its potential influence on delisting thresholds or crawl budgets, more about that later).

Observations for ru.wikipedia:

  • May 2023: Sudden 17% decline in Google-referrals, or 10% overall.
  • Sep 2023: Full recovery?
  • May 2024: Sudden 25% decline in Google-referrals, or 17% overall.
  • Sep 2024: Partial recovery?
  • Jan-Apr 2025: Slow additional decline
  • May 2025: Rapid additional 25% decline
  • May 2023 - May 2025: Perhaps a 30% net-decline in referrals overall.
    • All user: 210M/week to 180M/week (-14%)
    • Android: 101M to 90M (-10%)
    • Google-referred: 73M to 51M (-30%)
    • Google-referred/Android: 38M to 25M (-34%)

Next, let's look at Ukraine specifically:

Pageviews in Ukraine from April 2023 to Nov 2025

All users (blue is ru.wikipedia)
pageviews-Ukraine_ruwiki_and_ukwiki-user.png (2,447×1,517 px, 301 KB)

Google-referred (blue is uk.wikipedia):
pageviews-Ukraine_ruwiki_and_ukwiki-Google_refer.png (2,450×1,511 px, 198 KB) pageviews-Ukraine_ruwiki_and_ukwiki-Google_refer-Android.png (2,867×1,511 px, 202 KB)

This suggests that traffic has been divided 50/50 between ukwiki and ruwiki inside Ukraine since at least 2023.

We don't collect browser language or Google-localised domain in anonymised data, so we can't easily tell whether this divide is "expected" (i.e. based on browser language and which Google domain people use in Ukraine) or whether it is due to Google serving ruwiki results even when everything is/was working correctly. However, for the purposes of this task I'm going to take 2023 as baseline and leave pre-existing issues for another time.

I'll also zoom out to 2017 and include en.wikipedia to see if there are larger trends that may be of interest.

(This switches from referer_name=Google to referer_class=external-search-engine, because referer_name index is absent before April 2023. Based on recent data, about 80% of external-search referrals are Google with Yandex holding a notable 15% in Ukraine.)

I have few theories I'm looking at that I'll explain later. In a nutshell my theory is that I suspect Google's transition to a mobile crawler, has delisted millions of pages across the wikis. We saw this on Commons (T400022) but I suspect all wikis are affected but due to various factors 1) the impact has been less severe on English Wikipedia and other large wikis, 2) most severe on smaller Wikipedias like UkWiki and sister projects like Commons and Wikibooks but we don't monitor those as closely so not discovered until now, and 3) while Commons quickly recovered after turning off the mobile redirect these others have not yet recovered.

Pageviews in Ukraine from 2017 to 2025:

Turnilo query: https://w.wiki/G4d9

All users:
pageviews-Ukraine_ruwiki_and_ukwiki_2017_user.png (2,860×1,514 px, 354 KB) pageviews-Ukraine_ruwiki_and_ukwiki_2017_user-Android.png (2,862×1,530 px, 309 KB)

Search-engine referred:
pageviews-Ukraine_ruwiki_and_ukwiki_2017_searchengine.png (2,860×1,522 px, 314 KB) pageviews-Ukraine_ruwiki_and_ukwiki_2017_searchengine-Android.png (2,862×1,524 px, 321 KB)

For the overlapping window where we have data in search console, do the Google-referrals in the data lake line up with what search console reports?

For the overlapping window where we have data in search console, do the Google-referrals in the data lake line up with what search console reports?

Good question. It does! But it only goes back 16 months. During this period both the WMF pageviews (Google-referred) data and GCS clicks data start and end at 1 million per day. The drop from ~2 million per day to ~1 million per day in mid-2024 is beyond the GCS cut-off.

pageviews-ukwiki-Google_refer (16mo_daily, GCS-mixin).png (2,824×1,606 px, 352 KB)

WMF pageviews (last 2y, weekly; as previous comment)Reformat to match GCS (last 16 months, daily)
pageviews-ukwiki-Google_refer.png (2) (2,856×1,474 px, 139 KB)
pageviews-ukwiki-Google_refer (16mo-daily).png (2,824×1,606 px, 218 KB)
GCS clicks to uk.wikipediaAnnotated to show weird gap
GCS-ukwiki-clicks-Apr2024-Nov2025.png (2,020×1,139 px, 95 KB)
GCS-ukwiki-clicks-Apr2024-Nov2025-annotated.png (2,020×1,139 px, 104 KB)

Google Search Console data stopped for uk.wikipedia in Sept 2024 and resumed March 2025. I guess it is lazily computed and there weren't any WMF accounts subscribed to it then. Google's plot jumps six months between two data points without obvious indication of this.

Change #1211131 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/MobileFrontend@master] Update SpecialMobileLanguages to use canonical instead forced index.php

https://gerrit.wikimedia.org/r/1211131

My theory going into this task is that (some) of the decline in Wikimedia pageviews from Google SERP impressions, is due to the mobile domain. More specifically, Google's new mobile-first crawler being confused by the mobile domain, as we saw on Commons in T400022.

At glance, there are two issues with this theory:

  • No improvement: We disabled the mobile redirect over a month ago, but while Commons recovered very fast, others have not (such as Ukranian Wikipedia).
  • Missing GCS data: Google Search Console (GCS) only keeps data about the index size for 3 months. We happen to have older data for Commons, because we collected it as part of a separate project (not related to mobile domain work). The current data about the index size and index errors in GCS shows us we still have millions of delisted pages with "redirect" as the reason. But, we can't inspect this reason for a more detailed inspection. Google only shows a detailed inspection for the 1000 most recent samples in a given category of index errors. One thousand is not much on a site with millions of pages like Wikipedia. We have at least 1000 unrelated "genuine" redirects (such as "Einstein" > "Albert Einstein") and they are regularly crawled so these samples are all about those. I suspect the majority are still delisted due the mobile redirect and haven't been recrawled yet, but GCS does not provide access to examples of the larger backlog so we can neither confirm nor reject this theory based from this data alone.
  • Missing transition window: We know when referrals drops, but if we don't know when the crawler transition happened, we can't attribute it as a cause.
  • Delayed impact: The size of our wikis, and relationship between search keywords and articles, means that any such transition is not immediate across pages of a wiki or across wikis. It may happen slowly across several months. There may appear to be no impact for several weeks and then a sudden deep drop at diferent points based on certain pages reaching threshold (retry count? time since last succesfull crawl?), possibly weighted by page-specific factors like ranking, incoming links, impressions/clicks. There is a long tail.

These issues don't rule it out the mobile domain as cause, but it means we need more information to make a stronger case for it, and that in turn may tell us how we can fix it. Let's look for data!

Crawler gradually transitioned

Our page load performance data (T405429#11287913, fetchStart going from 0 to 200ms) shows clearly that the Google Search frontend changed Wikipedia domains in May 2024 from the old behaviour (mobile results link to mobile domain) to the new behaviour (one link for everyone).

We know the referral change was immediate and worldwide. But, it takes months to crawl large sites like Wikipedia. Therefore, I don't think this was actually due to a crawler change. Google's blog explains the link change as part of the transition to a new mobile-first crawler, but I believe these were two separate things internally.

Wikimedia pageview data tells us Googlebot kept a fairly constant crawl rate over the decade from 2015 to 2025. It did not simply double to build a potential secondary index, and did not halve after May 2024 to stop building such an index. What we see instead is a gradual transition, spanning many years, from one crawler to the other. This is consistent with Google's public messaging about there being only one index.

I suspect both the old and new crawler collected mobile-link metadata, and that they had a configuration setting to switch sites on a whole-domain basic to the new "one link" behavior (as we saw for Wikipedia in May 2024).

Pageview crawling by Googlebot from 2015 to 2025

Turnilo: https://w.wiki/GGwr

en.wikipedia:

pageviews-enwiki-Googlebot_2015-2025_total.png (2,404×1,008 px, 160 KB)

en.wikipedia by access_method:
{F70624953 height=350, layout=inline} {F70305667 height=150, layout=inline}

uk.wikipedia by access_method:
pageviews-ukwiki-Googlebot_2015-2025.png (2,403×1,524 px, 262 KB) {F70305779 height=150, layout=inline}

Note: Although we try our best to filter this out, the above may include imposter traffic since the query is for UA, not ASN/IP. In the last 4 weeks we recorded 40.0M Googlebot pageviews on ukwiki, and 39.6M webrequests (=309.3K samples * 128) to uk.wikipedia.org/wiki/ from the Googlebot IP-space. So this isn't an issue in recent history, but some of the historical spikes could be fake.

pageviews-ukwiki-Googlebot_Oct2025.png (2,511×1,493 px, 84 KB) webrequests-ukwiki-Googlebot-Oct2025.png (2,289×1,481 px, 79 KB)

Crawler is recovering after domain unification

The mobile redirect was disabled for uk.wikipedia.org on 7 Oct 2025 (Mobile domain sunsetting § Timeline).

What we see immediately after that date:

Crawl stats for uk.wikipedia.org from Google Search Console:

GCS-ukwiki-crawl-all.png (1,466×684 px, 61 KB) GCS-ukwiki-crawl-desktop.png (1,476×705 px, 70 KB) GCS-ukwiki-crawl-type_smartphone-req_size.png (1,615×871 px, 77 KB) GCS-enwiki-crawl-type_smartphone-req_size.png (1,570×870 px, 79 KB) GCS-enmwiki-crawl-type_smartphone-req_size.png (1,542×835 px, 78 KB) GCS-ukwiki-crawl-smartphone.png (1,448×690 px, 67 KB) GCS-ukwiki-crawl-refresh.png (1,470×712 px, 65 KB) GCS-ukwiki-crawl-req_302.png (1,457×713 px, 67 KB) GCS-ukwiki-crawl-req_200.png (1,467×697 px, 66 KB) GCS-ukwiki-crawl-resp_html.png (1,374×839 px, 65 KB) GCS-ukwiki-index-error_redirect (3.33M unchanged).png (1,736×1,076 px, 72 KB)

  • 7-9 Oct: GCS total crawling requests double from 450K to 850K/day and stays there.
    • 7-9 Oct: GCS crawling with "Smartphone" type increase from 250K to ~575K/day and stays. However, take note that of daily download size going from ~0 bilion bytes (9MB/day) to 10 billion bytes (10GB). This could mean that the mobile crawler wasn't working at all prior to the switch (250K/day requests that didn't even follow the redirect?). However, I don't think that's the case. This is an artefact of Google Search Console restricting telemetry by domain. The download size data for the destination of the redirect will be shown under the GCS account for uk.m.wikipedia.org rather than uk.wikipedia.org. I've included two graphs from en.wikipedia.org and en.m.wikipedia.org to demonstrate this.
    • 8-9 Oct: GCS crawling for "Refresh" purpose increases from 500K to 750K/day and stays.
    • 7-9 Oct: GCS crawling requests with 200 OK response increases from 300K to 750K/day and stays.
    • 7-9 Oct: GCS crawling requests with 302 Redirect response drop from 300K to ~5K/day and stays.
    • 7-9 Oct: GCS crawling requests with HTML response increases from 250K to 700K/day and stays.
    • 20 Oct: GCS crawling with "Desktop" type winds down from 200K to ~90K/day and stays.
  • 7 Oct-today: GCS index errors shrinking from 4M (Oct 6) to 3.3M (Nov 4), consuming about ~100K every 3 days. This is on-going and has yet to reach a bottom.
  • 7 Oct-today: GCS indexed page count is growing from 2.6M (Oct 6) to 2.9M (Nov 4), adding about ~5K every 3 days.

This sequence of events started the same day as the mobile domain sunsetting. Afaik we did nothing to request this, which suggests this is purely from the crawler no longer hitting "Page with redirect" errors, and instead behaving normally in terms of refreshing/discovering pages. (We last changed the Googlebot rate limit in the Wikimedia CDN several months ago in July 2025, ref T400022.)

This supports the idea that Googlebot was not utilising its crawl budget, with most pages in the index in a state of error. Its understandabe that Googlebot does not prioritise retrying millions of delisted URLs when there is no reason to believe they have changed, when you can spend their finite crawl budget on other things. Delistings are usually deterministic and unlikely to change. Fore example, a "404 Not Found" or "Blocked by robots.txt", to /w/index.php URLs it encounters in the wild. Putting aside the now-solved mobile domain issue, it is actually quite normal for wikis to have delisted URLs due to "Page with redirect". For example:

  • https://en.wikipedia.org/wiki/Einstein wiki page redirect to Albert Einstein.
  • https://en.wikipedia.org/w/index.php?title=Banana redirecting to canonical /wiki/Banana
  • http://uk.wikipedia.org/wiki/Sciurus_variegatoides redirecting to HTTPS https://uk.wikipedia.org/wiki/Sciurus_variegatoides
  • https://uk.wikipedia.org/wiki/моряк redirecting to normalized uppercase M https://uk.wikipedia.org/wiki/Моряк

Such non-canonical aliases are not meant to be indexed as their own piece of content, and expected to be delisted. Naturally these are a low priority to recrawl/refresh.

This would explain why recovery is so damn slow. However, why was recovery so quick for Commons? It's hard to tell for sure. My guess would be that there are very few "popular" pages on Commons that Googlebot would prioritise in recrawl, and Commons has grown by many millions of pages since the issue started, so there is almost always a long list of completely new pages Google has found but so far failed to crawl even once. Such new pages are probably more important and so when we disabled the mobile redirect on Commons, that immediately flooded inward, and had good keywords to start ranking for. On Wikipedia, there is probably a much more stable core of popular high-value pages, with new pages being smaller in number and lower in value. Maybe the legacy Desktop crawler could keep up with keeping those high-value pages alive at a lower interval, whilst keeping stale content in the index. Maybe the legacy Desktop crawler even kept up with new Wikipedia pages, given they're smaller in numbers. Or maybe new pages there don't factor as much into overall traffic quantity (impressions/clicks).

Intervention: Error validation

On 6 November 2025, I requested validation via the "Index errors / Page with redirect" page in GCS for uk.wikipedia.org, hoping it would speed things up during the next recrawl.

GCS-ukwiki-index-error-redirect 2025-11-06 (3.33M).png (1,851×1,312 px, 88 KB) GCS-ukwiki-index-error_redirect 2025-11-06 (validating-notif).png (810×784 px, 68 KB)

However, it doesn't seem to do much. It replaces the "Validate fix" button to make way for a "Validation details" button, showing the progress of what was presumably happening already, and continues to happen the same way regardless, except now I can see the part it has and hasn't yet re-crawled within some unspecified larger cycle (quarterly? yearly?). The actual speed or effect of the re-validation does not seem to have changed. Note how the header says that the validation I requested "Failed" after the second day, on 8 Nov 2025, after processing only a tiny portion of the backlog. Fortunately, the validation did in fact continue regardless, including with the new "Validation details" tab continuing past the apparent point where it "Failed" with new data on-going to this day.

  • 8 Nov: GCS-ukwiki-index-error_redirect_validating 2025-11-08 (0).png (1,953×872 px, 66 KB)
  • 11 Nov: GCS-ukwiki-index-error_redirect_validating 2025-11-11 (30K).png (1,805×881 px, 71 KB)
  • 19 Nov: GCS-ukwiki-index-error_redirect_validating 2025-11-19 (370K).png (1,969×792 px, 64 KB)
  • 26 Nov: GCS-ukwiki-index-error_redirect_validating 2025-11-27 (420K).png (1,756×891 px, 70 KB)

Conclusion: Error validation

As detailed in T380573#11413575, after the mobile domain setting, the "Discovery" process of Googlebot seemed to recover and finally grew the index page count for uk.wikipedia.org on a regular basis. It grow from 2.6M (Oct 6) to 2.9M (Nov 4), and continuing every few days.

I requested validation of the "Page with redirect" backlog containing 5.4M delisted URLs of which I expect at least 1M are valid and canonical pages missing from the index (because we have 4.2M indexable pages on ukwiki and only 2.6M were indexed on Oct 6). This hasn't sped up recovery, but, recovery is on-going either way. We're up to 2.9M as of 11 Nov.

GCS-ukwiki-index-size 2025-11-11 (2.96M).png (1,178×587 px, 49 KB)

So far this hasn't yet notably increased Wikimedia pageviews, GCS Clicks, or GCS Impressions for uk.wikipedia.org.

Change #1211131 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Update SpecialMobileLanguages to use canonical instead forced index.php

https://gerrit.wikimedia.org/r/1211131

Intervention: Sitemap

I submitted a sitemap for uk.wikipedia.org to GCS on 12 Nov. This added 800K new pages to the index within a day (2.9M to 3.71M), with 1.05M additional new and previously undiscovered pages placed in a new "Discovered but yet crawled" queue, which it is processing over the days/weeks that follow.

GCS-ukwiki-sitemap 2025-11-27.png (1,498×484 px, 34 KB) GCS-ukwiki-index-size 2025-11-18 (3.71M).png (1,265×441 px, 31 KB) GCS-ukwiki-index-error_discovered 2025-11-28 (1.05M).png (1,409×1,533 px, 110 KB)

On our end, we can clearly see the extra 0.5M crawls above baseline on 12 Nov. With subsequent days also each elevated by 0.6M crawls above baseline.

pageviews-ukwiki-Googlebot_2025-11.png (1,935×1,298 px, 164 KB)

Most GCS reports are only generated once a week, so the "Discovered" panel is last updated Nov 18 and doesn't yet show the decrease from the 1.05M discovered new pages it has been crawling. For example, it lists https://uk.wikipedia.org/wiki/Залесе_(Влощовський_повіт) which was crawled and indexed as of yesterday, Nov 26.

GCS-ukwiki-inspect-Zalesie 2025-11-28.png (1,460×1,413 px, 124 KB)

GCS-ukwiki-index-size 2025-11-18 (3.76M).png (1,114×583 px, 38 KB)

Impact: What does it look like in practice?

When I first heard of this issue in Sept 2024 (timeline at T380573#11363731), we took it at face value: Russian Wikipedia appeared to be selected instead of Ukranian Wikipedia. By selected I mean that Google is presumably indexing, crawling, and parsing our sites correctly, but when it comes to selecting the URL to show for a particular Wikipedia subject, it seems to favor Russian over Ukranian, for a Ukranian browser.

It's easy to take forget that such a selection mechanism must exist. If it didn't, you'd likely gets lots of high-quality foreign results in Google Search when browsing in English. You generally don't because content language is a filter applied regardless of page rank. So Google does not flatly rank all pages on an equal field from an end-user perspective. No matter the popularity or quality of a German Wikipedia article, if you're browsing Google in French and there is an French Wikipedia article about the same subject, you'll generally see that in results instead. Likewise when browsing in English, you're not likely to see results from any Wikipedia other than en.wikipedia.org. With that context, when we heard last year that Ukranian users see Russian Wikipedia in their results, this seems likely one of two things:

  • Something about <link rel=alternate hreflang> pointers is broken and Google isn't linking ukwiki to other Wikipedias. This would mean it treats ukwiki as Ukranian-language content detached from the Wikipedia network and thus losing out on the collective ranking, with English or Russian versions easily outranking it on a given query.
  • Something about <html lang=uk> where Google isn't understanding that uk.wikipedia.org is Ukranian and thus isn't even considering it — in the same way that it won't consider French Wikipedia when browsing Google in Spanish.

More recently, my theory has been that due to crawling issues around the mobile domain, many pages are missing from the Google Search index, with ruwiki/enwiki having a much larger corpus, declining more slowly, less likely to get delisted, more likely to get attention from the legacy "Desktop" Googlebot before a page reaches the delisting threshold.

With three interventions in-place (mobile domain sunsetting, error validation, and sitemaps) we now have some recently indexed URLs that we can test Google behavior around.

On 14 Nov, the folllowing article was crawled: https://uk.wikipedia.org/wiki/Jagdgeschwader_3. I confirmed that it a canonical URL and for a regular page (not a redirect or other non-article), and that there are English and Russian Wikipedia articles about the same subject, with interlanguage pointers between them.

Example: Jagd

GCS-ukwiki-crawl-mode_discovery Jagd 2025-11-14.png (1,589×772 px, 61 KB) ukwiki-Jagd (2025-11-15).png (2,331×823 px, 182 KB)

  • <html … lang="uk" dir="ltr">
  • <a href="https://en.wikipedia.org/wiki/Jagdgeschwader_3" title="Jagdgeschwader 3 – English" lang="en" hreflang="en" …>
  • <a href="https://ru.wikipedia.org/wiki/Jagdgeschwader_3" title="Jagdgeschwader 3 – Russian" lang="ru" hreflang="ru" …>

Setup:

  • ProtonVPN - Ukraine
  • Google Chrome, new profile, with language preference set to Ukraine, English.
  • https://google.ua

The Google SERP for this term looks "fine" at glance in that both the top result and knowledge graph point to uk.wikipedia.org. However, note the odd duplicate results from ruwiki and enwiki.

GSERP Ukraine Jugd 2025-11-15 uk-ru-en.png (1,954×1,611 px, 629 KB)

While it is common to see multiple Wikipedia pages in the same result set, this is usually because they are about differnet subjects. For example, a search for "Europe" may return the continent and the band, a search for "Pschyo" may return the 1960 film, but also many other results, and a search for "Banana" may return the fruit or the tree (and depending on the language that may be one article or two separate articles).

And in a situation where your preferred Wikipedia-language doesn't have one of the articles, you could even imagine there being multiple Wikipedia languages for a user. For example, a Dutch person may get "Europe" the continent in Dutch, followed by "Europe (band)" in English if were no Dutch article about it.

However, what we're seeing here is something different entirely. Google is returning multiple (three!) language editions of the same subject in response to a single query. I've never seen it do that before.

Google Search appears to be self-aware about this:

GSERP Ukraine Jugd 2025-11-15 uk-ru-en about.png (1,631×1,290 px, 452 KB)

It correctly recognises the uk.wikipedia result as being in the Ukranian language (lang=UK) and as relevant for the country of Ukraine (country=UA). Likewise it recognises the English and Russian results as such, noting that the English result "typically appears outside of the following country: Ukraine". And yet, it does appear inside Ukraine. Why?

Example: Krause

Another example: https://uk.wikipedia.org/wiki/Krause_Publications. I believe this one was not just recently re-crawled but newly "discovered" through our sitemap submission.

GCS-ukwiki-inspect-Kraus 2025-11-13.png (1,794×1,456 px, 117 KB)

GSERP Ukraine Kraus 2025-11-26-Chrome-vpn-acceptlang en-ru-uk.png (1,797×1,628 px, 401 KB)

GSERP English Kraus 2025-12-01.png (1,564×1,288 px, 208 KB)

This exhibits the same strange phenomenon of duplicate results with a triplet for the same subject. But this time much worse. English sorts first, followed by a Russian duplicate, followed by several other websites. Ukranian doesn't appear until below the fold, after Google's UI for related queries, on what is essentially a virtual "Page 2".

  1. https://en.wikipedia.org/wiki/Krause_Publications
  2. https://ru.wikipedia.org/wiki/Krause_Publications
  3. Other website
  4. Other website
  5. Google UI for suggested questions
  6. (Virtual "Page 2") https://uk.wikipedia.org/wiki/Krause_Publications

Example: Banana

Another example: a search for "Banana". The Banana fruit article on Ukranian Wikipedia isn't anywhere in the results, not even below the fold.

In English we get:

GSERP English Banana 2025-12-01.png (1,533×1,618 px, 618 KB)

  1. AI Overview about the fruit in English
  2. AI Overview links to supporting content about the fruit on English Wikipedia
  3. Knowledge graph sidebar about the fruit in English
  4. Knowledge graph links to the fruit on English Wikipedia.
  5. Organic result 1: starting with the same English Wikipedia article.

In Ukraine we get:

GSERP Ukraine Banana 2025-12-01 p1.png (2,019×1,620 px, 435 KB) GSERP Ukraine Banana 2025-12-01 p2.png (1,837×1,620 px, 540 KB) GSERP Ukraine Banana 2025-12-01 p3.png (740×838 px, 67 KB)

  1. AI Overview about the Banana fruit in Ukrainian.
  2. (?) AI Overview links to "Banana tree" on Russian Wikipedia
  3. (?) AI Overview links to "Bannaa tree" on Ukranian Wikipedia
  4. Knowledge graph about the fruit in Ukrainian.
  5. Knowledge graph linking to the fruit on Ukranian Wikipedia.
  6. (?) Organic result 1: buying a computer game called "Banana".
  7. (?) Organic result 2: Google Gemini ad about something called "Nano Banana".
  8. (?) Organic result 3: Ukranian Wikipedia link about Banana tree.
  9. (?) Organic result 4: Russian Wikipedia link about Banana fruit.
  10. Google UI for suggested questions
  11. (?) Organic result 5: (Other website)
  12. (?) Organic result 6: (Other website)
  13. (?) Organic result 7: (Other website)
  14. (?) Organic result 8: (Other website)
  15. Google UI for suggested queries
  1. the impact has been less severe on English Wikipedia and other large wikis,

+ Various other comments about long tail effects….

@Krinkle btw reading all this, also makes me very concerned for the fact that we noindex new pages until they are patrolled. I still suspect (and have always) that this is VERY bad for our SEO, but very difficult to detect.

But… possibly those effects are also easier to discover on the smaller sites ? We could disable that functionality on a smaller wiki and see effects on traffic after a few months ?

@AntonProtsiukWMUA Are you still noticing indexing issues with ukwiki content? I'm hoping that the .m deprecation has resolved most/all of these, but please let me know if there is still un-indexed new content.

@TheDJ I don't believe new pages are no-index by default on any wiki other than English. I just checked ukwiki's Special:New Pages, found an unpatrolled page, and Googled the title – it came up as the first result. I followed the same process on a number of other wikis and saw the same. But please let me know if I'm mistaken!

@Maryana, thanks for your work! We haven't noticed indexing issues recently.