Page MenuHomePhabricator

Google uses unsighted version of the page
Closed, ResolvedPublic


Screenshot of Google search for "alexandre pato" on Oct. 15

The hu.wikipedia page for Alexandre Pato has been vandalized recently [1], and only restored after about three weeks. hu.wikipedia uses flagged revisions, with the last sighted version shown to non-logged-in users; the review log [2] shows that the vandalized version was never sighted. Despite this, Google's Knowledge Graph still picked up the first few sentences from the vandalised version (see attached image - interestingly, the result snippet is from the sighted version, but the KG snippet is from the vandalized one).

This might not be a bug in the strict sense, but Google being tricky, and requesting something other than the standard non-logged-in HTML version (for example, action=raw always shows the newest version); at any rate, it is annoying (I found out about it from some newspaper article using the Pato article as the example of the unreliable nature of Wikipedia). It would be nice if we could figure out where exactly is Google getting its data from.


Version: unspecified
Severity: minor


google-flagrev.png (768×1 px, 175 KB)



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:33 AM
bzimport set Reference to bz56526.
bzimport added a subscriber: Unknown Object (MLST).

Same problem yesterday in the German Wikipedia with Eriksen:

While the entry was on Wikipedia for few minutes for registered users only, it was longer on Google. The media showed a Google Screenshot...

Should be fixed, since it can cause bad reputation for Wikipedia.

Tacsipacsi subscribed.

Do we know what exactly Google uses? Can we probably get them use an extra parameter asking for a stable revision? German market is maybe big enough for Google to care for it.

These days this issue is probably more relevant to Wikimedia Enterprise than to MediaWiki-extensions-FlaggedRevs (and very likely the duplicate of something).

Probably not yet, it's a new project. But providing stable vandalism-free high-availability content APIs to big tech companies is its exact purpose. Trying to initiate a conversation with Google about how they use MediaWiki APIs seems like an unhelpful distraction when WM Enterprise intends to initiate a conversation with them about using instead a different set of APIs custom-made for that purpose.

@Ladsgroup @Tgr - cool ticket, thanks for flagging this. Been following peripherally.

So in Okapi (Enterprise), we will include flaggedrevs and other notable signals from projects to augment reusers' ability to judge the revision. Still figuring out what that looks like in product form, but we did some work on T263885 to start to lay out some of the best areas we could find signals (thoughts welcome!). This type of stuff is intended to be included in the schema that we provide (see T281499) -- however at the end of the day, the extent at which folks use it is up to them.

As we get closer, I really want to start writing up docs specific to projects to lay out the best practices as a reuser. I'm specifically thinking that if the German Wikipedia community uses flaggedrevs consistently, then it is actually a very reliable data point for this purpose and we should recommend usage consistently. We'll get there and I believe our product will definitely help situations like this.

JArguello-WMF claimed this task.