Page MenuHomePhabricator

Deleted files sometimes remain visible to non-privileged users if permanently linked
Closed, DuplicatePublic

Description

https://upload.wikimedia.org/wikipedia/commons/archive/2/2a/20121125171455!Columbia_City_Cinema_main_hall.jpg is a deleted revision of the file https://commons.wikimedia.org/wiki/File:Columbia_City_Cinema_main_hall.jpg

However, it is visable if using a static history link.I am not an admin, but can see that (deleted) file.

Upon asking another user why this was, the respons I got was "looks like it was not moved to the filearchive table". May there be more files that is like this?

Related Objects

StatusAssignedTask
Openaaron
Resolvedaaron
ResolvedOttomata
DuplicateNone
DuplicateNone
StalledNone
Resolvedaaron
Resolvedema
Resolvedema
Resolvedema
Resolvedema
Resolvedema
Resolvedema
Resolvedema
Resolvedema
ResolvedBBlack
Resolvedema
Resolvedema
Resolvedema
ResolvedBBlack
Resolvedema
Resolveddaniel
Resolved GWicke
ResolvedOttomata
InvalidOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedRobH
ResolvedOttomata
ResolvedCmjohnson
Resolvedelukey
ResolvedRobH
Resolvedmobrovac
ResolvedEevans
Declinedcsteipp
Resolvedcsteipp
Resolved GWicke
Resolvedssastry
ResolvedPchelolo
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
Resolved madhuvishy
ResolvedOttomata
Resolved madhuvishy
ResolvedOttomata
Resolvedmobrovac
Resolvedmobrovac
ResolvedBBlack

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a project: Security. · View Herald TranscriptAug 17 2015, 4:59 PM
Steinsplitter moved this task from Incoming to Backlog on the Commons board.Aug 17 2015, 5:05 PM
csteipp closed this task as Invalid.Aug 18 2015, 9:54 PM
csteipp claimed this task.
csteipp added a subscriber: csteipp.

Looks like this original link is 404-ing now. It feels to me like a possible caching issue. It looks like the deletion is working correctly when tested.

I'm going to tentatively mark this as resolved. If anyone is able to reproduce this, let's reopen.

Josve05a reopened this task as Open.EditedAug 18 2015, 10:23 PM

User:FireflySixtySeven and User:Huon on IRC reproduced it as well.

Had they visited it already? I also have a 404, it still sounds like the cache at the moment...

Ah, esams still has it cached, equid gives a 404.

@BBlack, do you know who in ops could verify that purges are typically going to esams, or if this is an indication of a larger issue?

I've confirmed realtime flow of PURGE traffic on esams cache instances at the varnish level, with requests that look like:

16 ReqEnd       c 2734020121 1439942720.255194902 1439942720.255226612 0.000029087 0.000018358 0.000013351
16 ReqStart     c 127.0.0.1 42514 2734020122
16 RxRequest    c PURGE
16 RxURL        c /wikipedia/commons/8/82/David_Ispiryan_1.JPG
16 RxProtocol   c HTTP/1.1
16 RxHeader     c Host: upload.wikimedia.org
16 RxHeader     c User-Agent: vhtcpd

We also have ganglia stats on PURGE traffic flowing through vhtcpd on the nodes. The rate pattern is always spiky, but doesn't seem to show any major dropouts. This is the 1w view (scroll past the generic cpu/mem/net stats):

http://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=Upload+caches+esams&h=&tab=m&vn=&hide-hf=false&m=vhtcpd_inpkts_recvd&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name

In general, our purges are not 100% reliable, but they're usually fairly reliable.

I went ahead and manually hit the one image in question in esams, should be gone now, it case it was just a rare failure. If this is systemic, we can dig deeper on why.

So this particular instance should be dealt with now. @Josve05a are you seeing any more instances?

Josve05a added a comment.EditedAug 19 2015, 12:30 AM

Not that I know of at least (have no way to find any either, unless I stalk images being delted). So, perhaps just one-time thing (although it seems odd)

Thanks @Josve05a. I'm going to close this for now, but please reopen if you see any further discrepencies

Krenair closed this task as Resolved.Aug 23 2015, 6:16 PM
Krenair added a subscriber: Krenair.

(actually close)

Not that I know of at least (have no way to find any either, unless I stalk images being delted). So, perhaps just one-time thing (although it seems odd)

This isn't the first time something like this has happened. HTCP is an unreliable protocol, and purges sometimes don't go through.

HTCP is an unreliable protocol, and purges sometimes don't go through.

Yep. A more correct status for this report seems to be "Declined" (or "Invalid" if we think reliable purging is nearly an unsolvable problem of CS).

HTCP is an unreliable protocol, and purges sometimes don't go through.

Yep. A more correct status for this report seems to be "Declined" (or "Invalid" if we think reliable purging is nearly an unsolvable problem of CS).

I vaguely remember Tim talking about alternatives to multicast udp somewhere. Such things certainly do exist now (But didn't really, or at least weren't mature enough at the time HTCP was chosen from what I understand). See things like ZeroMQ and PGM.

csteipp reopened this task as Open.Aug 24 2015, 9:12 PM

I actually didn't close because I wanted to stalk the deletion log a bit. I'm seeing more cases of this (file 404's from eqiad, but is cached and loaded from esams).

I vaguely remember Tim talking about alternatives to multicast udp somewhere. Such things certainly do exist now (But didn't really, or at least weren't mature enough at the time HTCP was chosen from what I understand). See things like ZeroMQ and PGM.

See also: T45449#471604

The caching issue is a known problem. If i remember correctly it was reported years ago, but can't find the bug.

Yeah, known issue. I didn't realize it was as bad as it is.

We could possibly have a bot watch esams and purge stuff again if it doesn't get removed.

Krenair added a comment.EditedAug 25 2015, 9:04 PM

I made a script at terbium:~krenair/T109331.php which goes through the list of the last 100 deleted file entries on commons and checks that they 404 from upload-lb.esams.

If the rate of missed purges is truly higher in esams than elsewhere, then there's likely a technical issue here that we can investigate and address (not to prevent all possible missed purges, but to bring the excessive rate at esams back into line).

Jdforrester-WMF triaged this task as High priority.Sep 4 2015, 6:50 PM
Jdforrester-WMF moved this task from Untriaged to Next up on the Multimedia board.

T129845#2173254

matmarex added a parent task: Restricted Task.Apr 2 2016, 9:48 PM

@csteipp I think this is pretty much common knowledge by now. Do you think it'd be okay to make this task public, so to make it clear that someone is aware of the issue?

For the record I attempted to use eraseArchivedFile.php to clear the cache but doesn't work. I imagine this is already known by those smarter then me but since I know it purges some caches (like thumbnails) I thought it might do something here. It could obviously but used to force delete it from the servers but that's not generally what we do.

T129845#2173254 and https://commons.wikimedia.org/w/index.php?title=Commons:Administrators%27_noticeboard&oldid=192030706#Deleted_videos_still_accessible_at_Facebook (both mentioned above) are about this file:

The file seems to be still cached in Varnish.

$ curl -I https://upload.wikimedia.org/wikipedia/commons/1/18/Pee_Jaun_(Official_Video_Song)_Ft._Farhan_Saeed_Uploded_by_Free_download_Links_BD.webm
HTTP/1.1 200 OK
Date: Sat, 02 Apr 2016 23:55:47 GMT
Content-Type: video/webm
Content-Length: 28563766
Connection: keep-alive
X-Object-Meta-Sha1base36: a1r0ggxeg3zflgapjy04gutf9tyo5sq
Last-Modified: Sat, 02 Apr 2016 15:05:19 GMT
Etag: b6cf4ed823a37252ed4585033000bc88
X-Timestamp: 1459609518.07927
X-Trans-Id: txfe475a81178a4defb836c-0056ffe0fd
X-Varnish: 1021489314 1018498251, 546653600 463751166, 914281061 913860628
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Age: 31494
X-Cache: cp1050 hit(4), cp3046 hit(28), cp3035 frontend hit(1)
Strict-Transport-Security: max-age=31536000
Set-Cookie: WMF-Last-Access=02-Apr-2016;Path=/;HttpOnly;Expires=Wed, 04 May 2016 12:00:00 GMT
X-Analytics: https=1;nocookies=1
X-Client-IP: 91.218.200.230
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Timing-Allow-Origin: *
MaxSem added a subscriber: MaxSem.Apr 3 2016, 2:02 AM

I've purged it with purgeList.php and it appears gone. Now the question is why it didn't happen at the moment of deletion? Braindump:

  • No purge was issued
  • Not all purges go through
  • Race condition: purge happens before the file is gone from Swift
  • Edge case with percent encoding of parentheses, for example (for the reference, I didn't encode them in my purge command)

I think this is pretty much common knowledge by now. Do you think it'd be okay to make this task public, so to make it clear that someone is aware of the issue?

+1 Moreover, I'm quite sure there's some other old, public bug there telling how deleted images are not always purged.

matmarex renamed this task from Deleted file visible to non-privileged users if permanently linked to Deleted files sometimes remain visible to non-privileged users if permanently linked.Apr 4 2016, 2:08 PM

I think this is pretty much common knowledge by now. Do you think it'd be okay to make this task public, so to make it clear that someone is aware of the issue?

+1 Moreover, I'm quite sure there's some other old, public bug there telling how deleted images are not always purged.

Likely, it has been reported multiple times in the last years.

matmarex added subscribers: Dereckson, Urbanecm, JEumerus and 2 others.

Another example: https://commons.wikimedia.org/wiki/File:Sajid-kpanda.webm deleted, https://upload.wikimedia.org/wikipedia/commons/7/75/Sajid-kpanda.webm still accessible.

$ curl -I https://upload.wikimedia.org/wikipedia/commons/7/75/Sajid-kpanda.webm
HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 13:26:15 GMT
Content-Type: video/webm
Content-Length: 513112755
Connection: keep-alive
X-Object-Meta-Sha1base36: 9p5m45i48eyiu0hgcg8rxdis09j6abi
Last-Modified: Mon, 11 Apr 2016 14:27:17 GMT
Etag: f601a89f8e5b783c2fcf023a287d1cd7
X-Timestamp: 1460384836.70472
X-Trans-Id: txbe8483118bb54296932b1-00570bb46a
X-Varnish: 718224630 711003208, 537033183 313367721, 502112080
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Age: 82700
X-Cache: cp1064 hit(19), cp3034 hit(1188), cp3044 frontend pass(0)
Strict-Transport-Security: max-age=31536000
Set-Cookie: WMF-Last-Access=12-Apr-2016;Path=/;HttpOnly;Expires=Sat, 14 May 2016 12:00:00 GMT
X-Analytics: https=1;nocookies=1
X-Client-IP: 91.218.200.230
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Timing-Allow-Origin: *
matmarex changed the visibility from "Custom Policy" to "Public (No Login Required)".Apr 12 2016, 1:27 PM
matmarex changed the edit policy from "Custom Policy" to "All Users".
matmarex changed Security from Software security bug to None.
Restricted Application added a subscriber: Malyacko. · View Herald TranscriptApr 12 2016, 1:27 PM

I made the task public, with this many duplicates filed there is really no reason not to.

@csteipp, are you currently working on this? You're set as the assignee.

csteipp removed csteipp as the assignee of this task.Apr 12 2016, 3:05 PM

@BBlack pointed out that if the upload.wikimedia.org URL is still accessible after deleting a file, doing action=purge on the commons.wikimedia.org deleted file's page should delete it for good (e.g. https://commons.wikimedia.org/w/index.php?title=File:Sajid-kiptus.webm&action=purge). I tried it with this file, and it didn't seem to work at first, but then the file disappeared after a few minutes; I'm not sure if it was caused by the purge action or unrelated. The next time someone runs into this problem, please try if action=purge works and report back. In general this is apparently a "known issue with no easy fix".

Restricted Application added a subscriber: Poyekhali. · View Herald TranscriptApr 14 2016, 4:28 PM
Dereckson added a comment.EditedApr 14 2016, 4:42 PM

There are 404 now, it's probably the time for the cache to expire. So the task goal is to trigger a removal from cache immediately after the file is deleted?

NahidSultan added a comment.EditedApr 14 2016, 5:02 PM

There are 404 now, it's probably the time for the cache to expire.

Though it's still visible in here but it's probably my browser.

So the task goal is to trigger a removal from cache immediately after the file is deleted?

Yes, if that's possible because there is no point of immediate deletion if they (the Facebook users) still access the files. Usually they announce in the facebook group/page that they are uploading some movie in Wikimedia than downloads took place immediately after sharing the links. So there is no other way (at this moment) to discourage them otherwise.

CTRL + SHIFT + R should do the trick in your browser. It's not possible to invalidate a cached file locally from remote as long a your browser is happy with the version and doesn't want a new one.

It's not possible to invalidate a cached file locally from remote as long a your browser is happy with the version and doesn't want a new one.

Well, the task's goal is NOT invalidate the cached version of user's browser but from the server. Similar request: T132419

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 4:34 PM

Confirming that this file still is viewable on upload.wikimedia.org. I have never viewed the file before so it can't be my cache.

I've still a 404 for https://upload.wikimedia.org/wikipedia/commons/7/7f/Sajid-Monkey-Bizness.webm.

On IRC, I've asked 7 persons, they all have 404. Note they probably all use European upload-lb.esams.wikimedia.org.

Could you open a console/terminal, write ping upload.wikimedia.org and tell us what IP you have?

It was on a windows mobile phone on LTE data (AT&T), so I don't have a console. When viewing on my computer it gives me a 404.

It was on a windows mobile phone on LTE data (AT&T), so I don't have a console. When viewing on my computer it gives me a 404.

Actually, I could enable internet sharing and use a laptop...

I've still a 404 for https://upload.wikimedia.org/wikipedia/commons/7/7f/Sajid-Monkey-Bizness.webm.
On IRC, I've asked 7 persons, they all have 404. Note they probably all use European upload-lb.esams.wikimedia.org.
Could you open a console/terminal, write ping upload.wikimedia.org and tell us what IP you have?

I can still access it,

ubuntu@ubuntu:~$ ping upload.wikimedia.org
PING upload.wikimedia.org (198.35.26.112) 56(84) bytes of data.
64 bytes from upload-lb.ulsfo.wikimedia.org (198.35.26.112): icmp_seq=1 ttl=49 time=354 ms
Jay8g added a subscriber: Jay8g.Apr 28 2016, 4:43 AM

I too can still see the file, and I get the same IP from ping.

Nemo_bis removed a subscriber: Nemo_bis.May 22 2016, 6:50 AM
Base added a subscriber: Base.Nov 28 2016, 6:03 PM