Page MenuHomePhabricator

Information leak on wikidata-externalid-url
Closed, InvalidPublic


If the service wikidata-externalid-url is important enough for WMF, then WMF itself should provide this service in a secure environment. This is not something a random user should operate.

Right now this service is used all over nowiki because P1630 was assumed safe in production. This is not safe at all, as a user accessing any of the external links using this service will expose their activity instead of a simple link.

If this is acceptable for WMF, fine, but then a full security audit of the service should be made and any developers or maintainers should sign the necessary privacy agreements.

Event Timeline

Seriously, is it necessary to create an information leak for this code?

There are two basic issues which the url redirect script tackles - ID"s that need cleaning up (such as ISNI that is supposed to be entered as an ID with space characters, but the URL requires the spaces to be removed) and formatter URL's that require some more sophisticated handling than just a single $1 substitution - the IMDB case for example where the first two characters of the id determine the specific formatter URL to be used. It's not clear to me where is the best place for either of those pieces of logic. Wikibase could have some code for this (feel free to import what I've written) which would be perhaps exposed as some sort of service, but anybody using the P1630 values directly wouldn't benefit from that. It's not clear to me where this belongs. For now if there's some protocol for wiping log files or not even recording them on the tool labs server I'd be happy to implement that too. I have no interest in these log files.

Or if there's some privacy agreement to sign as jeblad suggested then I'm happy to do that too. I met Lydia Pintscher in person last week so she can vouch for who I am :)

Sorry can either @jeblad or @ArthurPSmith clarify what is the concern here?

Where is this tool being used on Wikidata? Is it default loaded/available on that project?

This is not safe at all, as a user accessing any of the external links using this service will expose their activity instead of a simple link.

What is the user exposing here? Do you mean the user might be redirected to an external site using this tool? Can you provide a specific use case that would be problematic here?


Wikidata use several properties that is a halfway representation of the identifier used on external sites. One example is IMDB code for movies like For Your Eyes Only. Those codes are not valid URLs but are turned into URLs by using format descriptions. These URL format descriptions now include redirects to a service on WMFlabs, which is operated by @ArthurPSmith. Those URL formats are used to construct valid URLs on other projects, like nowiki, which can be seen in external links for the movie. The first link to the movie on IMDB now include a link routing through the service, it does not go straight to the external site. When a user access this link the user will expose his or her activity to a third part that previously was not involved in surfing on Norwegian Wikipedia, giving the user of that site access to surfing habits of the user.

Note that this is not a feature the user chose to use, it is a feature that all access to the external site is filtered through. It can be compared to sending all external requests through a link shortener.

WMF does not give anyone unrestricted access to server logs for Wikipedia. What this service do is to give @ArthurPSmith access to surfing habits for readers of external sites for all properties where those URL formats are in use. That gives him an unprecedented access to readers surfing habits on and even off Wikipedia, as the logs will include how the user left Wikipedia.

I can't really see a good use case for this service, it is a convenience feature of some kind. If it is necessary feature then access to the tool should be govern by the same strict rules as any other such tools, and the user sign any necessary privacy agreement.

Other than that the tool needs a security audit. It does not make any attempt to clean up arguments or escape its output. As it is now it can be used for redirecting, but after 5.1.2 it can't be used for more fancy injection of header args. It can although be used for attacks against other sites. That is not limited to use on Wikidata or other WMF sites.

I have described a solution on Wikidata:Project chat. The solution avoids the external service altogether.

Seriously, is it necessary to create an information leak for this code?

What is leaked exactly? Toollabs is covered by the same privacy policy as the wiki's. Logs don't contain the user ip addresses: - [13/Feb/2016:06:33:05 +0000] "GET /multichill/nowcommons.php?language=%s&page=%s&filter= HTTP/1.1" 404 345 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"

As far as I know ipaddresses are only available to a select group of admins who signed a NDA.

Ha, if I'd actually looked at the logs I would have known that. Yes all the IP addresses in the file are a 10.68 address, which is locally identified as "tools-proxy....wmflabs" so yes, no external IP addresses are visible to the service.

@jeblad I'm resolving this as invalid as the initial claim of an information leak seems to be incorrect. However you might want to open up a separate phabricator ticket with your detailed suggestion on how to do formatter URL's better, I think it's a promising approach to allow pulling components from the "regular expression" syntax.

sbassett triaged this task as Medium priority.Oct 16 2019, 5:37 PM
sbassett removed a project: Cloud-Services.