Page MenuHomePhabricator

Enable $wgMFNoindexPages for: Italian, Dutch, Korean, Arabic, Chinese, and Hindi Wikipedias
Open, HighPublic2 Story Points

Description

Background

In T205495: Enable $wgMFNoindexPages for beta we added alternate tags for mobile versions of the page on the beta cluster so that the site can be indexed properly. We would like to test these on more wikis to determine their effects on traffic from mobile search engines

Acceptance criteria:

  • add link rel="alternate" to the following projects (as in {T205495):
NOTE: tentative deployment June ?
  • arwiki
  • zhwiki
  • hiwiki
  • add link rel="alternate" to the following projects (as in {T205495):
NOTE: tentative deployment June ?

Event Timeline

Restricted Application added subscribers: alanajjar, Cosine02, revi, Aklapper. · View Herald TranscriptOct 9 2018, 12:21 AM
Liuxinyu970226 added a subscriber: Liuxinyu970226.EditedOct 9 2018, 1:10 AM

Should we also enable this on Turkish Wikipedia? Because not only peoples from Turkey (although like red China they have to use VPN+IPBE to contribute), but also from Caucasus, Cyprus, Yugoslavia and Middle asian countries all can more or less use Turkish Wikis. And as far as I know the Turkish, as Well as a number of Turkic languages that use Latin scripts, are having I ı and İ i separated, but currently searching with site:tr.m.wikipedia.org can't handle this problem correctly, thus I need trwiki test feedbacks to see how process can be better.

Imarlier claimed this task.Nov 15 2018, 9:57 PM

Change 473889 had a related patch set uploaded (by Imarlier; owner: Imarlier):
[operations/mediawiki-config@master] wmf-config: Enable wgMFNoindexPages for 6 wikis

https://gerrit.wikimedia.org/r/473889

Imarlier updated the task description. (Show Details)Nov 16 2018, 3:05 PM

Can we please put this on hold until the other interventions (T208755 and T209720) are done and analyzed?

A patch for T206497 is in the works for several wikis to point mobile to desktop. Will this impact the a/b test in any way if enabled?

Theoretically, this would affect both the control and test pages roughly equally so the sameAs A/B test shouldn't be affected too much due to it being a true randomized controlled experiment, but the sitemaps test would be greatly contaminated by this. So it would be best to hold off on this for the time being.

Can we please put this on hold until the other interventions (T208755 and T209720) are done and analyzed?

A patch for T206497 is in the works for several wikis to point mobile to desktop. Will this impact the a/b test in any way if enabled?

Theoretically, this would affect both the control and test pages roughly equally so the sameAs A/B test shouldn't be affected too much due to it being a true randomized controlled experiment, but the sitemaps test would be greatly contaminated by this. So it would be best to hold off on this for the time being.

We're ok here. We actually took this into account when planning the sitemaps launch with 3 wikis with sitemaps, 3 with $wgMFNoindexPages, and 3 with both. @mpopov is checking to confirm wikis that we can use as a control.

@ovasileva just reminded me that we did discuss this before, but I forgot with everything else that was going on.

Alright, I got mixed results re: controls so we'll do the best we can with what we have. This indexing thing isn't so much about more traffic as just fixing errors, so we should be okay anyway.

So nevermind re: what I said earlier, it's alright to go ahead with this :)

Change 473889 merged by jenkins-bot:
[operations/mediawiki-config@master] wmf-config: Enable wgMFNoindexPages for 6 wikis

https://gerrit.wikimedia.org/r/473889

Change 476060 had a related patch set uploaded (by Imarlier; owner: Imarlier):
[operations/mediawiki-config@master] config: move wgMFNoindexPages to InitialiseSettings-labs

https://gerrit.wikimedia.org/r/476060

Legoktm added a subscriber: Legoktm.

+Wikimedia-Site-requests since this is a request for a configuration change.

The premise on T205495 was:

We are not currently providing Google the metadata that they request in order to properly index mobile sites...

To date, no one has responded to cscott on T198970#4706473 about how the Google indexing pipeline works. What indications do we have that this will make any difference for Google? What are the contingency plans in case this experiment goes wrong?

It doesn't seem like this is ready for a production deployment yet.

Restricted Application added a subscriber: Dereckson. · View Herald TranscriptNov 28 2018, 6:40 AM

+Wikimedia-Site-requests since this is a request for a configuration change.
The premise on T205495 was:

We are not currently providing Google the metadata that they request in order to properly index mobile sites...

To date, no one has responded to cscott on T198970#4706473 about how the Google indexing pipeline works. What indications do we have that this will make any difference for Google? What are the contingency plans in case this experiment goes wrong?
It doesn't seem like this is ready for a production deployment yet.

We have an explicit indication that this will make a difference for Google, provided by Google: https://developers.google.com/search/mobile-sites/mobile-seo/separate-urls. What we are doing here is implementing Google's recommendation, in order to validate that it actually does improve mobile site results.

The contingency plan is to revert the change (set wgMFNoindexPages=false for all wikis).

Legoktm added a subscriber: cscott.Nov 28 2018, 8:37 PM

We have an explicit indication that this will make a difference for Google, provided by Google: https://developers.google.com/search/mobile-sites/mobile-seo/separate-urls. What we are doing here is implementing Google's recommendation, in order to validate that it actually does improve mobile site results.

I expect that's true for most websites, but we already know that Google has a special pipeline for Wikipedia content, and they aren't scraping our webpages. What I'm asking is if anyone has explicitly asked our Google contacts on whether this change would be helpful, and whether it will even make a difference. Maybe @cscott could help here?

Change 476060 merged by jenkins-bot:
[operations/mediawiki-config@master] config: move wgMFNoindexPages to InitialiseSettings-labs

https://gerrit.wikimedia.org/r/476060

kchapman removed Imarlier as the assignee of this task.Jan 25 2019, 2:39 AM
kchapman added a project: Performance-Team.
kchapman added a subscriber: kchapman.

Ian had taken this on, but it isn't really in Performance's remit. Is this something Readers Web wants to take on?

kchapman moved this task from Inbox to Radar on the Performance-Team board.Jan 29 2019, 6:30 PM
kchapman edited projects, added Performance-Team (Radar); removed Performance-Team.
pmiazga claimed this task.May 21 2019, 4:14 PM

I'm trying to understand what's missing before we can start working on this task.
@Legoktm Looks like you would like to check with Google team to find out is it worth doing this task, right? If something goes wrong (less pageviews, etc) we will revert this patch.
@cscott I read your comment (https://phabricator.wikimedia.org/T198970#4706473), is it still up to date? Did we contact Google? Is there anything I can help?
@mpopov are you still happy to continue with this patch?

I'm trying to understand what's missing before we can start working on this task.
@Legoktm Looks like you would like to check with Google team to find out is it worth doing this task, right? If something goes wrong (less pageviews, etc) we will revert this patch.

I don't think anything has changed since my last comment (T206497#4783049), but I have not discussed this topic since then.

Also note that since then, proposals like T214998: Remove .m. subdomain, serve mobile and desktop variants through the same URL have been raised, which would get rid of the need for this entirely.

Just wanted to clarify that this task would be a test only. Based on the results we would either consider making the change for more wikis or reverting it back to the current state.

Looks like no one is fully familiar whats gonna happen if we decide to enable that $wgMFNoindexPages, I propose to enable it on couple wikis, and then check the results. It won't hurt us.

ovasileva updated the task description. (Show Details)May 30 2019, 11:46 AM
pmiazga removed pmiazga as the assignee of this task.May 30 2019, 4:30 PM
pmiazga moved this task from Needs Prioritization to Upcoming on the Readers-Web-Backlog board.
pmiazga added a subscriber: pmiazga.

Looks like no one is fully familiar whats gonna happen if we decide to enable that $wgMFNoindexPages, I propose to enable it on couple wikis, and then check the results. It won't hurt us.

Not knowing what enabling something is going to do seems like a bad reason to just do it, unless I'm misunderstanding your comment. I don't really understand the resistance to just asking our contacts for advice (though cscott kind of already did that).

ovasileva updated the task description. (Show Details)Jun 4 2019, 4:42 PM
ovasileva updated the task description. (Show Details)Jun 4 2019, 4:50 PM

We will reach out to our contacts from Google. However, we know that they cannot share details of their ranking algorithms or confirm/reject that we are getting special preference in any way, so we are not expecting to get much detail from them on this subject. What we do know is what they say on their public documentation. This task is based on these public recommendations around the requested data for properly indexing the mobile site and the analysis done in T198970. Based on this, we devised this test, similar to the tests we already completed for sitemaps T206496 and the sameAs schema.org property T208755.

ovasileva set the point value for this task to 2.Jul 9 2019, 4:49 PM
cscott added a comment.Jul 9 2019, 5:01 PM

I find it puzzling that "we know they cannot". We know for a fact that they are using a special pipeline for handling wiki content, and we have worked with them to build it. We can see their User-Agent in our logs ("MediaWikiCrawler-Google/2.0"). I've talked to Google engineers personally. They come to Wikimania. I can understand that Google is a big company, and sometimes it can be hard to find the right person at Google who actually knows how things work. But talking to the wrong person at Google isn't proof of anything.

Per T206497#5274358 probably belongs in PO backlog but ready to work on if and when we need to.