Here is an example: https://upload.wikimedia.org/wikipedia/office/f/fd/401k_Rollover_Contribution_Form.pdf
(open it in incognito and so on)
Here is an example: https://upload.wikimedia.org/wikipedia/office/f/fd/401k_Rollover_Contribution_Form.pdf
(open it in incognito and so on)
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Zuul: Follow IncidentReporting -> ReportIncident extension rename | integration/config | master | +6 -6 |
Found it while debugging T338765.
The ACL has access to public but not the private:
T338765#8958173
Mmm, that's not ideal, but is it intended?
A few questions:
My last question arises because, picking a public example https://commons.wikimedia.org/wiki/File:Geraldine_Ulmar_in_Gilbert_and_Sullivan%27s_The_Mikado.jpg - all of the images and thumbs if you visit them you get to an upload.w.o page without auth - e.g. https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Geraldine_Ulmar_in_Gilbert_and_Sullivan%27s_The_Mikado.jpg/530px-Geraldine_Ulmar_in_Gilbert_and_Sullivan%27s_The_Mikado.jpg
I don't see how that could ever work for a private wiki unless those URLs work without auth?
As to the number of buckets question, e.g. for office-wiki:
root@ms-fe2009:~# swift list | grep office wikipedia-office-local-deleted wikipedia-office-local-public wikipedia-office-local-temp wikipedia-office-local-thumb wikipedia-office-local-transcoded wikipedia-office-timeline-render
For answer to most of your questions, go to a page in office wiki. E.g. https://office.wikimedia.org/wiki/Contact_list
Read access must go through mw with thumb.php
These are list of private wikis. I can get you the canonical list.
All others should be fine for now
Sorry, I am too stupid. All the images on the contact page are from commons anyway aren't they? And in any case if I want to e.g. download the CEO's picture, the "download" button in media viewer is just a link to the upload.wm.o page https://upload.wikimedia.org/wikipedia/commons/f/fb/Maryana_Iskander.jpg
Note that the image URL only depends on the file name, and can easily be reconstructed by anyone – the /f/fd/ part in the example from the task description is just the first two characters of the MD5 hash of the file name (underscores, no File: prefix).
Example:
not all of them:
<figure class="mw-halign-center" typeof="mw:File"><a href="/wiki/File:Mo_abualruz_picture.jpg" class="mw-file-description"><img src="/w/thumb.php?f=Mo_abualruz_picture.jpg&width=134" decoding="async" class="mw-file-element" srcset="/w/thumb.php?f=Mo_abualruz_picture.jpg&width=201 1.5x, /w/thumb.php?f=Mo_abualruz_picture.jpg&width=268 2x" data-file-width="800" data-file-height="800" width="134" height="134"></a><figcaption></figcaption></figure>
In another way, look at the html https://office.wikimedia.org/wiki/Special:NewFiles
I think that if we take the global-read off wikipedia-office-local-public it will no longer be possible to download original images from office wiki at all.
Is that wrong? [I mean, maybe we should do so anyway and fix it later, but...]
It is wrong :D MediaWiki in private wikis serves the images with this url not the swift one:
https://office.wikimedia.org/w/thumb.php?f=Mo_abualruz_picture.jpg&width=134
Try this logged in and logged out.
That means as long as mw:media has access to those containers, it should be able to retrieve the images and pass it to the authorized user.
And for non-thumb images, the image page links to mw too:
e.g. go to
https://office.wikimedia.org/wiki/File:CA_KPIs_-_Q2.pdf
The link in the page is https://office.wikimedia.org/w/img_auth.php/e/e6/CA_KPIs_-_Q2.pdf which means first mw authorized it and then proxies it to the user.
Thank you for your patience!
Does thumbor-private need write access to anything other than wikipedia-office-local-thumb and wikipedia-office-local-transcoded? of the following
wikipedia-office-local-deleted wikipedia-office-local-public wikipedia-office-local-temp wikipedia-office-local-thumb wikipedia-office-local-transcoded wikipedia-office-timeline-render
-deleted probably, an admin should be able to see thumbnail of a deleted image for undeletion or other reasons. not sure about timeline-render, regardless it should be really low priority so we can skip it for now.
Current state:
wikipedia-office-local-deleted: Read ACL: mw:thumbor-private,mw:media Write ACL: mw:thumbor-private,mw:media wikipedia-office-local-public: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media wikipedia-office-local-temp: Read ACL: mw:thumbor-private,mw:media Write ACL: mw:thumbor-private,mw:media wikipedia-office-local-thumb: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media wikipedia-office-local-transcoded: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media wikipedia-office-timeline-render: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media
So I think we want:
for c in wikipedia-office-local-public wikipedia-office-local-thumb wikipedia-office-local-transcoded wikipedia-office-timeline-render ; do swift post "$c" --read-acl 'mw:thumbor-private,mw:media' --write-acl 'mw:thumbor-private,mw:media' ; done
?
Done, and I think working correctly - https://office.wikimedia.org/w/thumb.php?f=Abbrev-bot.png&width=120 gives me a thumb if logged in or an error otherwise:
Error generating thumbnail Access denied. You do not have permission to access the source file.
I think upload.wm.o should now be DTRT:
mvernon@ms-fe2012:~$ curl -o /tmp/foo -v -H "Host: upload.wikimedia.org" http://ms-fe2012.codfw.wmnet/wikipedia/office/e/e6/CA_KPIs_-_Q2.pdf [...] * Connected to ms-fe2012.codfw.wmnet (10.192.48.44) port 80 (#0) > GET /wikipedia/office/e/e6/CA_KPIs_-_Q2.pdf HTTP/1.1 > Host: upload.wikimedia.org > User-Agent: curl/7.74.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 401 Unauthorized
(this being the variant of the traditional kitten test for a new proxy cf https://wikitech.wikimedia.org/wiki/Swift/How_To#Add_a_proxy_node_to_the_cluster )
Let's fix collab so we can close the public task:
for c in wikipedia-collab-local-public wikipedia-collab-local-thumb wikipedia-collab-local-transcoded wikipedia-collab-timeline-render ; do swift post "$c" --read-acl 'mw:thumbor-private,mw:media' --write-acl 'mw:thumbor-private,mw:media' ; done
look good? Current state:
root@ms-fe2009:~# for i in $(swift list | grep collab); do echo "$i:" ; swift stat "$i" | grep "ACL" ; done wikipedia-collab-local-deleted: Read ACL: mw:thumbor-private,mw:media Write ACL: mw:thumbor-private,mw:media wikipedia-collab-local-public: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media wikipedia-collab-local-temp: Read ACL: mw:thumbor-private,mw:media Write ACL: mw:thumbor-private,mw:media wikipedia-collab-local-thumb: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media wikipedia-collab-local-transcoded: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media wikipedia-collab-timeline-render: Read ACL: mw:thumbor,mw:media,.r:* Write ACL: mw:thumbor,mw:media
Okay, for that I suggest we make a ticket for traffic but that should be pretty low priority given that logged in users wouldn't load the file directly (by hitting https://upload.wikimedia.org/wikipedia/office/f/fd/401k_Rollover_Contribution_Form.pdf) so it won't get cached in edges (unless mw does it internally...)
You can get the list of private wikis in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/thumbor/values.yaml#87 and do a for loop on that.
There is some sadness here because we're not consistent in naming, but...
#download https://noc.wikimedia.org/conf/dblists/private.dblist for i in $( sed -re '/^#/d;s/wiki(media)?$//;s/_/-/' private.dblist ); do if swift list | grep -q "wikimedia-${i}-local-public" ; then prefix="wikimedia-$i" elif swift list | grep -q "wikipedia-${i}-local-public" ; then prefix="wikipedia-$i" else echo "$i not found"; continue fi echo "$prefix" for suffix in local-public local-thumb local-transcoded timeline-render ; do swift post "${prefix}-${suffix}" --read-acl 'mw:thumbor-private,mw:media' --write-acl 'mw:thumbor-private,mw:media' done done
?
I'll tweak to add an echo and paste the output for plausibility checking...
that would work. One fun aspect of this is that if thumbor doesn't have it in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/thumbor/values.yaml#87 then we change the containers to be accessed only by thumbor-pirvate but thumbor itself would try to access via the public account.
@Ladsgroup do you want to try and fix that values.yaml file? and/or have me restrict the set of private wikis I'm trying to fix now?
OK, I've run that script in both eqiad and codfw, so this is fixed for now.
I think @Ladsgroup is kindly fixing the thumbor charts.
New private wikis are created thus:
https://wikitech.wikimedia.org/wiki/Add_a_wiki#IMPORTANT:_For_Private_Wikis
which, per IRC, ends up calling https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/blob/master/filebackend/setZoneAccess.php
which to my untrained eye ought to be doing something half-way plausible with ACLs ?
Thanks @Ladsgroup and @MatthewVernon for the quick action on this. I'd guess that attempting to analyze potential exposure for something like this would be a bit of a nightmare.
We could query hadoop for anything like https://upload.wikimedia.org/wikipedia/office/... and that should workTM given that private wikis have a different url for loading ( https://office.wikimedia.org/w/img_auth.php/...) but that goes as far as three months and this probably have been broken for years.
Yeah but this is not the first time the file backend code in mediawiki having serious and critical bugs (which we fixed multiple ones just in the last couple of months) and it's also not written in readable nor understandable manner. Until further notice, any new private wiki created should be double checked by SRE if it has local upload enabled IMHO.
I've updated the notes on creating new wikis to note the need to get container permissions fixed, and also added https://wikitech.wikimedia.org/wiki/Swift/How_To#Checking_/_Fixing_container_ACLs_for_private_wikis so that future-us will know what needs doing.
The immediate issue from this task seems resolved - is that correct? I'm not sure what follow-up work still needs to happen.
A couple of follow ups would be useful:
We definitely should run a hadoop query (or a set of queries) to get a sense of access over the past 90 days. I pulled database codes / domain names from canonical_data.wikis where status = "open" and visibility = "private" and got the following list:
database_code | domain |
advisorswiki | advisors.wikimedia.org |
arbcom_cswiki | arbcom-cs.wikipedia.org |
arbcom_dewiki | arbcom-de.wikipedia.org |
arbcom_enwiki | arbcom-en.wikipedia.org |
arbcom_fiwiki | arbcom-fi.wikipedia.org |
arbcom_nlwiki | arbcom-nl.wikipedia.org |
arbcom_ruwiki | arbcom-ru.wikipedia.org |
auditcomwiki | auditcom.wikimedia.org |
boardgovcomwiki | boardgovcom.wikimedia.org |
boardwiki | board.wikimedia.org |
chairwiki | chair.wikimedia.org |
chapcomwiki | affcom.wikimedia.org |
checkuserwiki | checkuser.wikimedia.org |
collabwiki | collab.wikimedia.org |
ecwikimedia | ec.wikimedia.org |
electcomwiki | electcom.wikimedia.org |
execwiki | exec.wikimedia.org |
fdcwiki | fdc.wikimedia.org |
grantswiki | grants.wikimedia.org |
id_internalwikimedia | id-internal.wikimedia.org |
iegcomwiki | iegcom.wikimedia.org |
ilwikimedia | il.wikimedia.org |
legalteamwiki | legalteam.wikimedia.org |
movementroleswiki | movementroles.wikimedia.org |
noboard_chapterswikimedia | noboard-chapters.wikimedia.org |
officewiki | office.wikimedia.org |
ombudsmenwiki | ombuds.wikimedia.org |
otrs_wikiwiki | vrt-wiki.wikimedia.org |
projectcomwiki | projectcom.wikimedia.org |
stewardwiki | steward.wikimedia.org |
sysop_itwiki | sysop-it.wikipedia.org |
techconductwiki | techconduct.wikimedia.org |
wg_enwiki | wg-en.wikipedia.org |
wikimaniateamwiki | wikimaniateam.wikimedia.org |
Could we run queries to get unauthorized access for these wikis? We should take up the question of notifying legal once we have a sense of the scale of the problem.
Re: cached data — is there any way of getting all potentially vulnerable files and forcing them to be purged from the cache?
Change 934654 had a related patch set uploaded (by QChris; author: Christian Aistleitner):
[integration/config@master] Zuul: Follow IncidentReporting -> ReportIncident extension rename
Yeah, it should be rather easy to do it, query webrequest with uri_host = 'upload.wikimedia.org' and url_path like '/wikipedia/office/%' (made from top of my head, not sure if it works 100%)
Re: cached data — is there any way of getting all potentially vulnerable files and forcing them to be purged from the cache?
So they get cached for seven days and now it's easier to just let them expire. My thinking was along the lines of possibly swift setting cache header is private in case something similar happens in the future. But it's honestly so low-prio that we can simply not do it.
Change 934654 abandoned by Hashar:
[integration/config@master] Zuul: Follow IncidentReporting -> ReportIncident extension rename
Reason:
Kosta went to do the same via https://gerrit.wikimedia.org/r/c/integration/config/+/935044/ which I have merged before noticing your change :)
Given that more than ninety days have passed since this bug got fixed, we don't have any logs of who might have accessed the private files. I suggest closing this and filing follow ups for fixing setZone and other issues?
Sounds fine. Is there anything keeping this task from becoming public? After a quick glance, I'm not seeing any obvious PII?
No complaints from me (after all, the docs update at least hints that there has been a problem in this area).