Page MenuHomePhabricator

Properly canonicalize Special:MobileCite pages and article backlink
Closed, ResolvedPublic3 Estimated Story Points

Description

Lazy loaded reference fallback page metadata and backlinking should be set so that we avoid unnecessary concentrations of references-only pages in search engine listings and we furthermore provide users a way back to the main article in case someone did share a references-only page URL to the user.

In the mean time we can do the following things

  • Add a link to page back to the originating page
  • Remove the <link rel="canonical"... element from the output.
  • Change the indexing policy to <meta name="robots" content="noindex,nofollow"/>

Background

When mobile pages are indexed by search engines and when you do a search on mobile, the result can end up linking to the mobile version of the site.

How it works today

I as a student have literature about a subject that I want to know more about. I do a a search on a popular search engine for the title of the book, one of the first results links to a Wikipedia article that has that book as a reference. What's cool about that is that I immediately gets more information about the subject, where the book is a reference.

How it will work

When a user lands on a reference page instead of the article, it's important to make it easy and clear for the user that these are the references for a specific article.

Based on initial discussion with a search engine contact, the initial plan of trialing this solution is the right way to go in order to look at search engine result sets and traffic.

Screen Shot 2016-06-01 at 9.31.59 AM.png (625×921 px, 182 KB)

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
Resolveddr0ptp4kt
Duplicate Jhernandez
Duplicatedr0ptp4kt
DeclinedNone
ResolvedJdlrobson
DeclinedNone
DuplicateNone
DuplicateNone
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
Resolved Jhernandez
ResolvedJdlrobson
ResolvedJdlrobson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson set the point value for this task to 3.Jun 1 2016, 4:36 PM

Note guidance on how to do the robots tag can be found in this patch: https://gerrit.wikimedia.org/r/247532

Jdlrobson triaged this task as Medium priority.Jun 1 2016, 4:37 PM

@Jdlrobson I thought the secondary page was a very temporary placeholder that will be remedied by this T135539. If that is the case, is this extra work a waste of time? Making plans for a future in which references are held on a separate page make me nervous about the promise to address T135539 ASAP.

This is unrelated to T135539 and that is prioritised higher.

This task is not "a waste of time" as search engines will follow links and not necessarily include content that requires JS to be loaded. This page is not going away, we are reducing the likelihood of getting there but the page still exists and will thus still be indexed.

@Jdlrobson I would we rather look into what google needs in order to find JS-loaded content to support people finding the actual article. I don't think having a user land ona reference page is a good idea. AJAX has been around for long enough that I find it hard to believe google will not index--we also have google webmaster tools and can test to see if it's being crawled correctly.

dr0ptp4kt added a subscriber: Nirzar.

Punting this to the next sprint.

I agree with the notion that going to a references page from a set of search engine results would be unusual. This said, @Nirzar, can you think of a reason we would want to support that and how it might be made user friendly? If the answer is No, understood.

Typically, the canonical URL is the thing upon which the bulk of indexing occurs. As an example, when you visit the mdot URL https://en.m.wikipedia.org/wiki/Droid_(robot) you'll see the following in the HTML source:

<link rel="canonical" href="https://en.wikipedia.org/wiki/Droid_(robot)"/>

This signals to the crawler it should really look at the desktop URL for the full material. However, it's also true that various factors are applied at the search provider infrastructure to analyze similarity between desktop and mobile pages.

We need a little more time to think about adding such an enhancement. There are a couple reasons why this is okay:

  1. We plan to roll out lazy loading references on just one small mobile web Wikipedia to start, which will give us some practical exposure to the real world implications. It will be easy enough to look at search referred traffic from Google on that domain to get a sense if something is amiss.
  2. As this suggests, I'm going to give a contact a heads up about our plan and see if there's any feedback we need to take into consideration.
dr0ptp4kt renamed this task from Index Special:MobileCite in Google to Properly canonicalize Special:MobileCite pages and article backlink.Jun 6 2016, 8:44 PM
dr0ptp4kt updated the task description. (Show Details)

The default indexing policy is already set as "noindex,nofollow" as can bee seen at https://github.com/wikimedia/mediawiki/blob/cb66cb2045c880cbc4af627814194e696506bd22/includes/specialpage/SpecialPage.php#L714. The Special:MobileCite extends SpecialPage so no further action is needed.

I am not sure, I'll look into it. Thanks for checking, @dr0ptp4kt.

Edit: The function to add the robots policy wasn't being called. The below patch takes care of it.

Change 295308 had a related patch set uploaded (by Bmansurov):
Add a link back to the article from Special:MobileCite

https://gerrit.wikimedia.org/r/295308

@dr0ptp4kt

  • Make the canonical link Special:MobileCite/p/Title of Page (on the mobile domain)

Currently this URL doesn't exist, for example if the page title is "Refs" then the page /Special:MobileCite/p/Refs is non-existent. So is the above requirement asking to create such a page too? If yes, what would the content of the page be?

I'm moving the patch to "Code Review" for now until my question above is answered.

@bmansurov, my apologies, the criteria should have read as follows, which is the technically accurate thing to do (one shouldn't mix rel canonical with no robots directives). I'll update the Description.

  • Remove the <link rel="canonical" element from the output.

Change 295379 had a related patch set uploaded (by Bmansurov):
WIP: Remove canonical URL from Special:MobileCite

https://gerrit.wikimedia.org/r/295379

Change 295308 merged by jenkins-bot:
Add an indexing policy and a link back to the article from Special:MobileCite

https://gerrit.wikimedia.org/r/295308

Change 295379 merged by jenkins-bot:
Remove canonical URL from Special:MobileCite

https://gerrit.wikimedia.org/r/295379

Jdlrobson removed a project: Patch-For-Review.

Note there is an open comment about using isSpecial method. Let's resolve this before signing off.

Change 295766 had a related patch set uploaded (by Bmansurov):
Check if a page is a special page without creating a new Title

https://gerrit.wikimedia.org/r/295766

Change 295766 merged by jenkins-bot:
Check if a page is a special page without creating a new Title

https://gerrit.wikimedia.org/r/295766

The edit icon is a bit offset meaning the centering looks strange. Can we add a css rule to hide it on this page?

Change 296494 had a related patch set uploaded (by Bmansurov):
Hide edit icon placeholder on Special:MobileCite

https://gerrit.wikimedia.org/r/296494

Change 296494 merged by jenkins-bot:
Hide edit icon placeholder on Special:MobileCite

https://gerrit.wikimedia.org/r/296494

Jdlrobson claimed this task.

Looks great! Thanks @bmansurov for leading this!

Change 298803 had a related patch set uploaded (by Jdlrobson):
Hide edit icon placeholder on Special:MobileCite

https://gerrit.wikimedia.org/r/298803

Change 298803 abandoned by Jdlrobson:
Hide edit icon placeholder on Special:MobileCite

Reason:
Heh I guess that solves the problem then :) Thanks Florian.

https://gerrit.wikimedia.org/r/298803