Page MenuHomePhabricator

Determine impact of $wgMFNoindexPages search traffic to arwiki, zhwiki, hiwiki, itwiki, nlwiki, kowiki
Closed, ResolvedPublic

Description

Background:

This task is about measuring the impact (if any) on search engine-referred traffic on the following wikis:
arwiki
zhwiki
hiwiki
itwiki
nlwiki
kowiki

If results to search traffic are negative, we will not be releasing the change to all projects.

Needed for analysis:

  • date(s) the changes were deployed. The change was deployed on October 2, 2019.
  • check overall pageviews, pre & post
  • check search-referred pageviews, pre & post

Related:

  • check that Google Search Console is now tracking non-index pages correctly

Event Timeline

This will not be completed before Megan's leave and is not one of Web's core metrics/goals. We need to reevaluate as a team.

@ovasileva Can you let me know what decisions are resting on this and any risks if we don't tackle this question? We don't have extra bandwidth so would have to drop something else to do this. Happy to discuss further in a 1:1.

Experiment continues to run while awaiting the analysis.

From discussion with Olga: not urgent, but it is important because we need to make a decision about whether to roll this out across all sites.

Benefit of analysis: if we see increases in traffic, we should roll it out because otherwise we're losing out on traffic that we could have.

Risk of negative impact: small-medium, but haven't observed dips and don't expect this change to make things worse.

Discussed with Olga in 1:1 today :

Action

  • Will review this task and previous task and analysis to let Olga know of any questions.
  • Evaluate if this is something that I (Maya) can handle or will need data analyst support.
  • Inform PA team in upcoming meeting of priority/urgency.

@ovasileva @Mayakp.wiki : @mpopov and I were discussing this task in relation to efforts to monitor Google Search Console (GSC) data (https://phabricator.wikimedia.org/T246694). Since this is a recommended change from Google, we propose doing a quick/high-level check of pageviews and GSC data pre- and post-rollout, but not a full statistical analysis.

Related note:
GSC is still showing clicks to both the mobile and desktop sites, even though we expect to no longer see data in the mobile site. (See Hindi examples below)

Screen Shot 2020-03-02 at 10.41.10 AM.png (1×2 px, 248 KB)

Screen Shot 2020-03-02 at 10.41.17 AM.png (1×2 px, 255 KB)

P.S. We expect to no longer see impressions & clicks to hi.m.wikipedia (which this change is supposed to mark as alternate and hi.wikipedia as canonical) based on the information at https://webmasters.googleblog.com/2019/02/consolidating-your-website-traffic-on.html

Given that this has been pushed back so much, I'm thinking we should prioritize this as one of the first things Megan takes on when she comes back.

@ovasileva - Can you confirm the dates that these changes were deployed to the test wikis?

@ovasileva - Can you confirm the dates that these changes were deployed to the test wikis?

Yup - the change was deployed Oct 2, 2019 (T206497#5540674)

We checked all of those wikis in the Google Search Console and we're still seeing numbers for the mobile sites, but based on what Google wrote in https://webmasters.googleblog.com/2019/02/consolidating-your-website-traffic-on.html we would expect to see 0's, since the premise is that all the impressions and clicks for the alternate site would be rolled into the canonical site's numbers.

Hm…

Started with this notification in GSC:

Your site has been switched to Mobile First Indexing
The majority of Google's crawl requests to your site will be made using a mobile crawler.
Switch date: April 21, 2020

went on to https://webmasters.googleblog.com/2016/11/mobile-first-indexing.html?authuser=3 and saw

Sites do not have to make changes to their canonical links; we’ll continue to use these links as guides to serve the appropriate results to a user searching on desktop or mobile.

then went to look at https://developers.google.com/search/mobile-sites/mobile-seo/separate-urls

On the desktop page, add a rel="alternate" tag pointing to the corresponding mobile URL. This helps Googlebot discover the location of your site's mobile pages.
On the mobile page, add a rel="canonical" tag pointing to the corresponding desktop URL.

On the desktop page (http://www.example.com/page-1), add the following annotation:

<link rel="alternate" media="only screen and (max-width: 640px)" href="http://m.example.com/page-1">

On the mobile page (http://m.example.com/page-1), the required annotation should be:

<link rel="canonical" href="http://www.example.com/page-1">

I noticed that our mobile pages have the canonical part (for example https://it.m.wikipedia.org/wiki/COVID-19)

<link rel="canonical" href="https://it.wikipedia.org/wiki/COVID-19">

And that our desktop pages (e.g. https://it.wikipedia.org/wiki/COVID-19) have

<link rel="alternate" media="only screen and (max-width: 720px)" href="//it.m.wikipedia.org/wiki/COVID-19">
<link rel="canonical" href="https://it.wikipedia.org/wiki/COVID-19">

which makes me wonder if we're seeing issues because

  1. the desktop page has the canonical bit. it might not need to?
  2. I wonder if href="// instead of href="https:// is causing problems? The example suggests we might need to include the protocol. Maybe Googlebot is not smart enough to resolve the protocol relative URL correctly for search performance purposes.

Also, from https://www.paulirish.com/2010/the-protocol-relative-url/ and https://stackoverflow.com/questions/4831741/can-i-change-all-my-http-links-to-just/37654145#37654145:

this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.

I wonder if omitting the canonical bit for desktop pages (since Google's documentation doesn't mention anything about a canonical page declaring itself as the canonical version of itself) and switching from protocol relative URL to an https:// version would yield the results we expect.

Although:

Screen Shot 2020-05-20 at 10.43.21 AM.png (1×2 px, 337 KB)

which states that the page is an alternate page with a proper canonical tag

So I don't know what's going on in GSC but at least the tagging DOES appear to be working, disregard the comment above I suppose?

Digging deeper into it.m.wikipedia.org:

Screen Shot 2020-05-20 at 10.48.00 AM.png (1×2 px, 189 KB)

Screen Shot 2020-05-20 at 10.51.06 AM.png (1×2 px, 299 KB)

Screen Shot 2020-05-20 at 10.51.16 AM.png (558×1 px, 117 KB)

The URL selected by Google as the authoritative version of this page. Other versions can be served in search results, depending on factors such as the user's device type or language. This is not available in the live test, as Google selects a canonical URL only after a page is indexed.

So it appears that in some cases, despite us explicitly marking the desktop page as the canonical version, Google's system has decided to ignore that and use the mobile page as the canonical.

Which means – if I'm interpreting the situation correctly – that some impressions & clicks to the mobile domain (e.g. it.m.wikipedia.org) are indeed included in the impression & clicks to the desktop domain (it.wikipedia.org) when Google has accepted the canonical tag. But then pages where Google has NOT accepted the canonical tag and continues to call an alternate version as the canonical version, those impressions & clicks will show up in the mobile domain's performance report.

I completed a review of pre and post-deployment pageview trends to both the desktop and mobile versions of the test wikis, focusing on external search-related traffic, to determine if the change resulted in any significant impacts.

TL;DR: The results do not indicate any negative impact on search-referred traffic to the test wikis. There were no changes in traffic to either the desktop or mobile versions of the site that could be clearly attributed to the deployment of this change.

See the summary below and codebase, report for further details.

There was no sudden decline or increase in search-referred traffic following the change. In addition, the traffic changes that follow the deployment of the change on October 2nd follow the same patterns as previous years and are seen on a set of wikis I reviewed where the change was not deployed [1] , indicating that these fluctuations are likely seasonal and not attributable to this deployment.

search_yoy.png (2×4 px, 1 MB)

To confirm, I reviewed average daily pageviews during two week period (15 days) before and after deployment for this year and compared to the changes that occured over the same period the previous year.

Both mobile and desktop daily average search-referred pageviews went up slightly (3.47% on desktop and 0.03% on mobile) following the deployment. This percent increase was only slightly lower than the percent increase seen in 2018 over the same period. In addition, the percent increase in average daily search-referred pageviews around the deployment is similar to the changes in traffic observed for the wikis reviewed that were not in the test.

The impact on overall pageview traffic was fairly flat as well.

pageview_yoy.png (2×4 px, 1 MB)

[1] While there was not a set control for this test, I reviewed traffic trends for the following set of wikis where the test was not deployed to compare overall trends: Bhojpuri (bhwiki), Cherokee (chrwiki), Kazakh (kkwiki), Catalan (cawiki), French (frwiki), Yoruba (yowiki), Kalmyk (xalwiki ).

@ovasileva - Let me know if you have any questions.

Discussed results with Olga in June 11th meeting. As there was no identified negative impact, this change will likely be rolled out to all sites. I will publish a summary of my findings and link to report on https://www.mediawiki.org/wiki/Reading/Search_Engine_Optimization.

Thanks @MNeisler! Resolving this. Follow up and implementation will be continued in T255458: Enable $wgMFNoindexPages for all wikis