Page MenuHomePhabricator

Edits which has been reverted and revision deleted over 40 hours ago were visible on page previews
Open, Needs TriagePublic

Description

https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=11198468

There must be a way to purge the cache of page previews, especially if the content has been revision deleted. I tried repeatedly to purge both the article which had been edited to include the vandalism, but also the page which hosted the link and displayed the page preview without direct success. About 10 minutes later after my first attempted purge, it was no longer visible.

Please make it easier to purge the content of a page preview, and if a revision deletion happens on a page always clear that page cache.

Event Timeline

Looks like the page preview just pulls the extract from https://en.wikipedia.org/api/rest_v1/page/summary/{title} - did something go wrong with RB cache purging? Theoretically the reverting edit should've triggered RB to update this extract, theoretically more recent purges of the edited page should've done it too.

I'm not sure about the purges. Sometimes you need a null edit in order to really bypass some caches.

As for the initial cause, there was a single edit (the revert at xx:21). It that purge request was lost (it sometimes happens, are they still being sent through UDP?), the old version could still be presented by some caches (depending on preferences, sometimes only to unlogged users).

Masumrezarock100 renamed this task from Edits which has been reverted and revision deleted over 40 hours ago were visible on page previous to Edits which has been reverted and revision deleted over 40 hours ago were visible on page previews.Oct 12 2019, 11:20 PM
Masumrezarock100 subscribed.

Assuming you meant "Page previews", I changed the title of this task.

@mobrovac @Pchelolo Is it possible that the Restbase endpoint is serving an outdated extract" If I remember right it has some caching built-in.

I looked at the page in question and I see it has the correct content matching the wikitext. What could have happened is that the purge request itself fell through, so RB did have the new version stored, but Varnish was serving the old one until it fell out of cache.

@mobrovac can you take this task and push it forward?

Given that the page in question does not exhibit the problem any more, I'm not sure there's something to do here.

@DragonflySixtyseven mentioned in #wikipedia-en on freenode an off-wiki complaint that https://en.wikipedia.org/wiki/Jeddah_Tower had a popup with vandalism in it. That popup showed an old revision of Skyscraper design and construction from 3 days ago that was quickly reverted. I confirmed using the REST API that the old revision was still being used as the summary. I then purged the page (using the UTC purge clock) and re-sent the API request. The API returned the newest revision, which is also what was shown by Popups.

I am not sure if anyone had tried purging the page and the issue is now gone, but the issue was reported at https://en.wikipedia.org/wiki/Wikipedia_talk:In_the_news#Spam. The IP screenshotted and uploaded to imgur, not sure of the reporting rules and norms here, hopefully this is enough information.

This happened still in fiwiki. Method was to vandalize the template:Yhdiste (Chembox) by images containing pornography. (diff) The vandalism was reverted in 15 minutes. However, the vandalized image was visible in previews still in day later. Purging or null editing the articles didn't help. However making real edit updated the page. (diff)

However as the template was used in other articles and the vandalized image was visibile in those.

I was able to replicate the problem

https://fi.wikipedia.org/api/rest_v1/page/summary/Mevalonihappo

{"type":"standard","title":"Mevalonihappo","displaytitle":"<span class=\"mw-page-title-main\">Mevalonihappo</span>","namespace":{"id":0,"text":""},"wikibase_item":"Q241678","titles":{"canonical":"Mevalonihappo","normalized":"Mevalonihappo","display":"<span class=\"mw-page-title-main\">Mevalonihappo</span>"},"pageid":847795,"thumbnail":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/Porno.webm/320px--Porno.webm.jpg","width":320,"height":180},"originalimage":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/Porno.webm/1920px--Porno.webm.jpg","width":1920,"height":1080},"lang":"fi","dir":"ltr","revision":"22796540","tid":"24dfb481-a14c-11ef-ae7d-745d52b14388","timestamp":"2024-11-12T23:16:28Z","description":"kemiallinen yhdiste","description_source":"central","content_urls":{"desktop":{"page":"https://fi.wikipedia.org/wiki/Mevalonihappo","revisions":"https://fi.wikipedia.org/wiki/Mevalonihappo?action=history","edit":"https://fi.wikipedia.org/wiki/Mevalonihappo?action=edit","talk":"https://fi.wikipedia.org/wiki/Keskustelu:Mevalonihappo"},"mobile":{"page":"https://fi.m.wikipedia.org/wiki/Mevalonihappo","revisions":"https://fi.m.wikipedia.org/wiki/Special:History/Mevalonihappo","edit":"https://fi.m.wikipedia.org/wiki/Mevalonihappo?action=edit","talk":"https://fi.m.wikipedia.org/wiki/Keskustelu:Mevalonihappo"}},"extract":"Mevalonihappo (C6H12O4) on karboksyylihappo, jonka rakenteessa on myös kaksi hydroksyyliryhmää. Mevalonihappo on tärkeä yhdiste terpeenien ja steroidien biosynteesissä. Yhdisteen ionimuoto ja suolat ovat mevalonaatteja.","extract_html":"<p><b>Mevalonihappo</b> (C<sub>6</sub>H<sub>12</sub>O<sub>4</sub>) on karboksyylihappo, jonka rakenteessa on myös kaksi hydroksyyliryhmää. Mevalonihappo on tärkeä yhdiste terpeenien ja steroidien biosynteesissä. Yhdisteen ionimuoto ja suolat ovat mevalonaatteja.</p>"}

Confirmed with cat image in testwiki that bug effects to search results thumbnails. Ie. the seach bar which is at top of the wikipedia which uses restbase as backend

The same issue seems to be existing in mlwikisource too.
Sample - https://ml.wikisource.org/wiki/%E0%B4%B8%E0%B5%82%E0%B4%9A%E0%B4%BF%E0%B4%95:Doothavakyam_Gadyam.djvu

It shows a module missing message which was already created on April 19 this year

This is still happening. See https://en.wikipedia.org/wiki/Talk:Killing_of_Austin_Metcalf#Racist_Language for a report of revision deleted content still being displayed in previews almost two hours (and possibly longer) after the vandalism was reverted (and which was on the page only for a single minute).

There should be some mechanism whereby editors can explicitly purge the cache of article previews, and as requested in the opening comment from 2019 a revision deletion (or oversight) of a cached revision should always invalidate the cache. It's not impossible there could be legal implications if the revdelled content is a copyright violation or libellous.

Got a report via Discord of vandalism appearing in the preview for https://en.wikipedia.org/wiki/Formula_One_cars, which is a redirect. I tried purging the redirect page, and it didn't work. Then I purged the redirect target, which also did nothing. Then I purged the redirect page again and it worked.

As a temporary workaround, wouldn't it be possible to force purge pages automatically as and when a page is loaded?

Another similar issue in VRT #2025051210011409

Is this happening more now? Seems like multiple reports for multiple articles within just a few days.

image.png (436×866 px, 286 KB)

Please forgive if this isn't welcome, but I suspect in the current political climate in the USA it may very well be more likely to see in the short/medium term. Furthermore, I'm very concerned about the fact that, even though there's no evidence of happening yet, this would be a way for vandalism to introduce even potentially oversightable information into previews that, as far as I can see, there is no guaranteed way to remove quickly (and even if there is, most of us non-techy contributors have no idea what it is). Furthermore, previews are built in to much more than desktop - they're automatic in mobile (as far as I'm aware) for example - meaning it is highly visible even to logged-out/non-contributors. If nothing else, it should be a high priority to add a noticable and easy way to purge the previews for local contributors who revert an edit - ideally it would be automatic, but failing that a clear way to purge is needed.

If nothing else, it should be a high priority to add a noticable and easy way to purge the previews

Perhaps more importantly also it needs to be reliable.

As an oversighter, I'm not aware of any oversightable vandalism cached in previews (yet) but this is not something that we regularly check for, but the Killing of Austin Metcalf vandalism was racist towards someone charged with a serious crime in a manner that could be argued to be prejudicial to a trial.

Ideally any revert and any oversight action on a page should invalidate any and all caches of that page (that are not equal to the version reverted to), but there is always going to be a need for a manually-triggered purge option as well.

doctaxon subscribed.

While revision deleted oversights keep visible for a time I guess that this is an security issue for WMF Trust and Safety Production Team.

Dreamy_Jazz subscribed.

While revision deleted oversights keep visible for a time I guess that this is an security issue for WMF Trust and Safety Production Team.

Product Safety and Integrity don't own revision deletion per the maintainers page on MediaWiki, so I'd say it's probably not within the team scope.

The maintainers page says revision deletion is owned by the MW Interfaces team who are tagged already on this task

BPirkle subscribed.

After some off-phab discussion, this may be more related to PCS than revision deletion. Tagging CTT because, as (at least per this page), they're the owners. Moving to MWI's Radar column for visibility in case that's incorrect and/or we need to be involved.

Change #1240805 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/services/restbase@master] pcs: propagate cache-control to upstream service on cache bust

https://gerrit.wikimedia.org/r/1240805

Change #1240805 abandoned by Kosta Harlan:

[mediawiki/services/restbase@master] pcs: propagate cache-control to upstream service on cache bust

Reason:

The fix should go elsewhere

https://gerrit.wikimedia.org/r/1240805