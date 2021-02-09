Page MenuHomePhabricator
Create Task
Maniphest T273741

Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons
Open, MediumPublic
Actions

Assigned To
None
Authored By
Joe
Wed, Feb 3, 11:21 AM
Tags
Subscribers
aaroncarson0
Aawarapam
Addshore
Aklapper
Amorymeltzer
AMuigai
amy_rc
View All 77 Subscribers
Tokens
"Barnstar" token, awarded by valerio.bozzolan."Barnstar" token, awarded by jijiki."Barnstar" token, awarded by mmodell."Cookie" token, awarded by Ladsgroup."Y So Serious" token, awarded by Prtksxna."Meh!" token, awarded by KartikMistry."The World Burns" token, awarded by Amire80."Cup of Joe" token, awarded by Elitre.

Description

Please avoid adding drive-by comments such as "hello from Hacker News" to this task as they are not helpful. Thank you.

We've noticed today that we get about 90M requests per day from various ISPs in India, all with the same characteristics:

URL: https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/AsterNovi-belgii-flower-1mb.jpg/1280px-AsterNovi-belgii-flower-1mb.jpg
Referer: "-"
User-Agent: "-"

These are very strange, as they come from wildly different IPs, follow a daily traffic pattern, so we are hypothesising there is some mobile app predominantly used in india that hotlinks the above image for e.g. a splash screen.

We need to investigate this further as this kind of requests consitutes about 20% of all requests we get in EQSIN for media.

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+40 -0upload-frontend: ban a specific url with no referer nor UA
Customize query in gerrit

Related Objects
Search...

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nshahquinn-wmf added a comment.Mon, Feb 8, 7:21 PM
In T273741#6812424, @mforns wrote:

If it's an app, it would need to be very popular.
Maybe Aarogya Setu, the app for reducing Covid infections?
IIUC it's mandatory in India.

I just installed it and poked around (I was able to do all the initial setup and get to the main functionality), but I didn't see that photo anywhere. There actually weren't any photos all (just illustrations and videos), so it seems very unlikely it was somewhere in the app that I didn't see.

MoritzMuehlenhoff added a subscriber: MoritzMuehlenhoff.Mon, Feb 8, 9:53 PM
Mvolz added a subscriber: Mvolz.Mon, Feb 8, 10:17 PM
Legoktm added a subscriber: Legoktm.Mon, Feb 8, 10:33 PM
Michaelrhanson added a subscriber: Michaelrhanson.Mon, Feb 8, 10:41 PM

I found several places where this URL is being used in sample code, which might have been picked up by somebody and built into an app:

https://stackoverflow.com/questions/18586466/foursqaure-photo-add-against-checkin
https://stackoverflow.com/questions/18232898/node-js-http-get-with-node-js-step-module
https://html.developreference.com/article/14455997/Downloading+image+from+the+web+with+imagemagick+and+saving+to+parse

seems like this particular flower has been kicking around as a sample image for quite a few years.

Daniel.gayo added a subscriber: Daniel.gayo.Mon, Feb 8, 10:42 PM

Could it be this app?

https://apps.apple.com/hk/app/iclass-corporate/id1439400748?l=en

The picture appears in a screenshot...

cscott added a subscriber: cscott.Mon, Feb 8, 10:48 PM
Legoktm added a subscriber: spinda.EditedMon, Feb 8, 11:07 PM

@spinda found that this image is used in quite a few different places:

Michaelrhanson added a comment.Mon, Feb 8, 11:10 PM

Hm! It is included in the imagenet URL list, I think. Could we be looking at some CV training pipeline that's not caching properly?

http://image-net.org/api/text/imagenet.synset.geturls?wnid=n11934807

ssingh added a comment.EditedMon, Feb 8, 11:14 PM
In T273741#6813531, @Michaelrhanson wrote:

I found several places where this URL is being used in sample code, which might have been picked up by somebody and built into an app:

https://stackoverflow.com/questions/18586466/foursqaure-photo-add-against-checkin
https://stackoverflow.com/questions/18232898/node-js-http-get-with-node-js-step-module
https://html.developreference.com/article/14455997/Downloading+image+from+the+web+with+imagemagick+and+saving+to+parse

seems like this particular flower has been kicking around as a sample image for quite a few years.

It is most likely an app, given the header information above and also based on some other connection attributes. The question is which app though as some of us have gone through the popular apps in India but haven't been able to identify which app it is. It is also possible that the code was embedded in some app and that it requests the image but does not display it.

ssingh added a comment.Mon, Feb 8, 11:14 PM
In T273741#6813536, @Daniel.gayo wrote:

Could it be this app?

https://apps.apple.com/hk/app/iclass-corporate/id1439400748?l=en

The picture appears in a screenshot...

Unlikely, given the volume of the requests and the popularity/rating of this app.

fdans added a subscriber: fdans.Mon, Feb 8, 11:18 PM
In T273741#6813616, @Michaelrhanson wrote:

Hm! It is included in the imagenet URL list, I think. Could we be looking at some CV training pipeline that's not caching properly?

http://image-net.org/api/text/imagenet.synset.geturls?wnid=n11934807

That's an interesing idea, and that list includes several other commons images, but none with as much traffic as OP's.

fdans added a comment.EditedMon, Feb 8, 11:29 PM

As was suggested on Twitter, this surge coincides almost perfectly with the ban of TikTok, as well as other 223 Chinese apps, in India Wiki article

AntiCompositeNumber added a project: Commons.Mon, Feb 8, 11:46 PM
Hubzi added a subscriber: Hubzi.Mon, Feb 8, 11:48 PM
Peteskomoroch added a subscriber: Peteskomoroch.Tue, Feb 9, 12:24 AM
Joe added a comment.Tue, Feb 9, 12:26 AM

Another suggestion coming from twitter is https://play.google.com/store/apps/details?id=com.app.rcn, which anyways doesn't seem popular enough (in india specifically) to cause that volume of requests.

At this point, I'd bet it's one of those two mobile apps, possibly both.

I would suggest that we start banning requests for this image without a UA, while we try to contact the app authors. It will likely have the side-effect of breaking some code samples using that (admittedly beautiful) photo.

Ladsgroup added a comment.Tue, Feb 9, 12:32 AM

I don't have much knowledge about India's internet infrastructure but from experience of Iran and blocking apps/websites. They show you a page from a reserved IP (so it's not accessible to the outside) saying "Sorry, this page is not accessible in Iran". It might be part of that page and we can't see it because of that. Specially given that the raise coincides with the block of TikTok in India. What happens if you try to access TikTok (or any other blocked app/website) in India? I totally understand Iran't "internet" is different from what most countries have so I might be talking rubbish here.

Xxpor added a subscriber: Xxpor.Tue, Feb 9, 12:44 AM
Mahir256 added a subscriber: Mahir256.Tue, Feb 9, 12:48 AM
Preinheimer added a subscriber: Preinheimer.Tue, Feb 9, 12:59 AM

Going to the TikTok website from India results in the regular TikTok page loading, with a banner from TikTok saying that the service is unavailable in India. Not a dedicated block page.

Screenshot

(I have access to proxy servers in India).

AntiCompositeNumber added a subscriber: AntiCompositeNumber.Tue, Feb 9, 1:02 AM
Dzahn added a subscriber: Dzahn.EditedTue, Feb 9, 1:13 AM

https://newshimalaya.com/2021/02/09/%E2%9A%93-t273741-investigate-unusual-media-traffic-pattern-for-asternovi-belgii-flower-1mb-jpg-on-commons/

^ wut? I tried to search for links to this image and found... this Phabricator ticket content on a Nepali news site?

varenc added a subscriber: varenc.Tue, Feb 9, 1:35 AM
In T273741#6813823, @Preinheimer wrote:

Going to the TikTok website from India results in the regular TikTok page loading, with a banner from TikTok saying that the service is unavailable in India. Not a dedicated block page.

The lack of a User-Agent and all other distinguishing headers means this can't be coming from a web browser.

I would look up the IPs and see if some of them are from IP blocks associated with cellular providers. Since in general only mobile phones are on those IPs, that'll provide be some strong evidence that this is from a mobile app. My guess is an online connectivity check? I assume the detailed IP/request logs aren't public or I'd go investigate this myself. 🙃

Peteskomoroch removed a subscriber: Peteskomoroch.Tue, Feb 9, 2:05 AM
SuperHamster added a subscriber: SuperHamster.Tue, Feb 9, 2:47 AM
aaroncarson0 added a subscriber: aaroncarson0.Tue, Feb 9, 3:00 AM
Izno added a subscriber: Izno.Tue, Feb 9, 3:09 AM
tomglynch added a subscriber: tomglynch.EditedTue, Feb 9, 3:16 AM

Hi all, I've been doing a bit of research into possible apps that could be causing this and found two potential culprits that I am currently investigating.

The first is Mitron TV, (news article here), an Indian TikTok alternative which was made available again on the app store June 6th.

The second is Say Namaste, (news article here), an Indian Zoom alternative which was launched on the app stores June 9th.

Both fall into the timeline of huge increases, have millions of users and may be using '1280px-AsterNovi-belgii-flower-1mb.jpg' to check the users internet connection - especially for Say Namaste to ensure video connectivity. I've reached out to some developers at both companies and will report back. Let me know your thoughts.

EDIT: I have also noticed the dates match the reopening after lockdown for the whole of India: "This first phase of reopening was termed as "Unlock 1.0"[13] and permitted shopping malls, religious places, hotels and restaurants to reopen from 8 June." from Wikipedia

Tom

MZMcBride added a subscriber: MZMcBride.Tue, Feb 9, 3:17 AM
ssingh added a comment.Tue, Feb 9, 3:29 AM

Thank you everyone for the comments and suggestions. I just wanted to share that we have identified the app and will update this task tomorrow. (And yes, it is a mobile app.)

Vahurzpu added a subscriber: Vahurzpu.Tue, Feb 9, 3:34 AM
Chlod added a subscriber: Chlod.Tue, Feb 9, 3:51 AM
Michaelbrabec added a subscriber: Michaelbrabec.Tue, Feb 9, 3:54 AM
mfkp69 added a subscriber: mfkp69.Tue, Feb 9, 4:02 AM
Phuzion added a subscriber: Phuzion.Tue, Feb 9, 4:10 AM
PatsagornY added a subscriber: PatsagornY.Tue, Feb 9, 4:15 AM
mmodell added a subscriber: mmodell.Tue, Feb 9, 4:33 AM
In T273741#6813839, @Dzahn wrote:

^ wut? I tried to search for links to this image and found... this Phabricator ticket content on a Nepali news site?

Looks like it's just a big rss aggregator?

mmodell awarded a token.Tue, Feb 9, 4:33 AM
This comment was removed by mmodell.
rootkea added a subscriber: rootkea.Tue, Feb 9, 4:54 AM
TheOv3rminD added a subscriber: TheOv3rminD.Tue, Feb 9, 4:55 AM
In T273741#6813995, @mmodell wrote:

Also, hello hacker news! https://news.ycombinator.com/item?id=26072025

Hello From us Hacker News readers ;)

Str0nArm added a subscriber: Str0nArm.Tue, Feb 9, 6:43 AM
Ltrlg added a subscriber: Ltrlg.Tue, Feb 9, 7:17 AM
Devenvdev added a subscriber: Devenvdev.Tue, Feb 9, 7:20 AM
Zardula added a subscriber: Zardula.Tue, Feb 9, 7:27 AM
This comment was removed by Zardula.
Majavah updated the task description. (Show Details)Tue, Feb 9, 7:30 AM
Thibaut120094 added a subscriber: Thibaut120094.Tue, Feb 9, 7:30 AM
miyuru added a subscriber: miyuru.Tue, Feb 9, 7:45 AM
R4356th added a subscriber: R4356th.Tue, Feb 9, 8:27 AM
Gilles mentioned this in T274228: Phabricator should cache tasks for a few minutes for logged-out users.Tue, Feb 9, 8:58 AM
Shizhao added a subscriber: Shizhao.Tue, Feb 9, 9:00 AM

Rename to a new filename?

Asartea added a subscriber: Asartea.Tue, Feb 9, 10:36 AM
Amorymeltzer added a subscriber: Amorymeltzer.Tue, Feb 9, 10:59 AM
IKhitron added a subscriber: IKhitron.Tue, Feb 9, 12:43 PM
semenko added a subscriber: semenko.Tue, Feb 9, 12:48 PM
Matafagafo added a subscriber: Matafagafo.Tue, Feb 9, 1:22 PM
MBH added a subscriber: MBH.Tue, Feb 9, 1:34 PM
GeneralNotability added a subscriber: GeneralNotability.Tue, Feb 9, 1:47 PM
wkandek added a subscriber: wkandek.Tue, Feb 9, 1:52 PM
lmata added a subscriber: lmata.Tue, Feb 9, 2:34 PM
Tks4Fish added a subscriber: Tks4Fish.Tue, Feb 9, 2:58 PM
rabbbit added a subscriber: rabbbit.Tue, Feb 9, 3:08 PM
DannyS712 added a subscriber: DannyS712.Tue, Feb 9, 3:22 PM
AntiCompositeNumber added a comment.Tue, Feb 9, 4:01 PM
In T273741#6814266, @Shizhao wrote:

Rename to a new filename?

The Commons community generally avoids moving files, as it can break attribution and cause issues for (reasonable) external reusers. "Turn it off and see who screams" is a valid method when less disruptive methods fail, but it appears to be unnecessary in this case. Others have previously suggested serving a different file for users with the matching user agent header, which wouldn't break all external links to the file. While that solution would require more work than simply moving the file, it may also have been more effective (depending on how it was used).

Mvolz updated the task description. (Show Details)Tue, Feb 9, 4:01 PM
gerritbot added a comment.Tue, Feb 9, 4:27 PM

Change 663004 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] upload-frontend: ban a specific url with no referer nor UA

https://gerrit.wikimedia.org/r/663004

gerritbot added a project: Patch-For-Review.Tue, Feb 9, 4:27 PM
Chlod removed a subscriber: Chlod.Tue, Feb 9, 4:29 PM
ssingh added a comment.Tue, Feb 9, 5:17 PM

Update:

Thank you for the interest in this task! Like we shared yesterday, we have identified that the traffic is coming from a popular mobile app in India. We have initiated contact with the app developers, and are waiting to hear back from them. In the meantime, given the volume of requests, we have decided to ban those specific requests until the issue is resolved. While we will refrain from naming the app at this time, we can share that it is not on the list of apps mentioned in this task. Nevertheless, we thank you for your comments and suggestions on how to debug this!

Since there has been some interest in how we narrowed it down to this particular app:

  1. The header attributes suggested that this was a mobile app. We then queried Hive, we determined that the connection attributes related to these headers (User-Agent and Referer) were mostly from IPv6 addresses, further confirming the theory that this was a popular mobile app.
  2. We then tried to isolate connections from geographical regions and ISPs in India but it was clear that there was no pattern there, as users were spread across the country.
  3. A few things were clear given the volume of the requests: it was a popular app with traffic throughout the day (and even late at night) with a peak on December 31 2020, suggesting that it may be a chat or social media app.
  4. We noticed that the image/app gained popularity somewhere around the time India blocked Chinese internet services and websites, thus affecting popular apps in India like TikTok. (This was pointed out by a user.)
  5. Based on the information above, we gathered a list of popular chat and social media applications in the country, especially apps that gained popularity after the above censorship event.
  6. We first started by downloading and running these apps to see if we could identify the image in their splash screens or within the apps. We also asked the community on the ground and there are many unnamed people who helped us with this -- thank you!
  7. This unfortunately didn't work as none of the apps we tested had the image anywhere -- neither in the splash screen nor in the apps themselves. The community in India was equally surprised given the popularity of this image/app and the fact that they had not seen it in their daily usage.
  8. It was then speculated that the app fetches the image but does not show it. (This was based on this comment.)
  9. To recap, we were aware of the following at this stage:
    • it is a popular chat/social media mobile app used in India
    • it sets the User-Agent and Referer to '-'
    • it fetches the image from Wikimedia Commons but does not display it
  10. To narrow down the app, we decided to observe connections to the image from clients (phones) to our servers. We did this by opening the popular apps one-by-one and noting down the time. After doing this for all the apps, we then ran this query in Hive: SELECT * FROM wmf.webrequest WHERE year=2021 AND month=2 AND day=9 AND parse_media_file_url(uri_path).base_name='/wikipedia/commons/1/16/AsterNovi-belgii-flower-1mb.jpg' AND webrequest_source='upload' AND uri_host = 'upload.wikimedia.org' AND user_agent='-' AND ip=<IP>;
  11. We then found the specific app that was making the request by matching the time when it was opened and the time image was requested from our servers, restricting the results to the User-Agent '-' and from the IP we tested.
  12. By this time, we had isolated the app and were convinced that this is the one that is fetching the image on startup. We could not find the image anywhere in the app, confirming our theory that it fetches the image but does not display it.
  13. To further confirm this finding and to ensure that we had the correct app, we decided to log DNS queries from a phone by setting up a local resolver to capture DNS traffic. After pointing the phone towards it and launching the app, we noticed that it was indeed the one looking up upload.wikimedia.org on startup.
fdans added a comment.Tue, Feb 9, 5:23 PM

@ssingh said it yesterday on chat but this is such stellar data detective work. Congrats on finding the culprit!!

Majavah added a comment.Tue, Feb 9, 5:25 PM

Is the effect that the block will have in the app known?

rootkea removed a subscriber: rootkea.Tue, Feb 9, 5:25 PM
jijiki awarded a token.Tue, Feb 9, 5:40 PM
calbon added a subscriber: calbon.Tue, Feb 9, 5:49 PM

Just to second what @fdans said, the data detective work was great and this was such a fun ticket to watch.

Joe added a comment.Tue, Feb 9, 6:02 PM
In T273741#6815874, @Majavah wrote:

Is the effect that the block will have in the app known?

No, hence we tried to reach out to them, although it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.

I think this is more than enough courtesy on our part. Anyways, the block will clearly link to this task as the reason for the block, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/663004/4/modules/varnish/templates/upload-frontend.inc.vcl.erb#380, so that whoever is responsible for this can figure out how to reach us.

Aawarapam added a subscriber: Aawarapam.Wed, Feb 10, 5:17 AM

The traffic spikes are closely matching indian holidays. 2 Oct, 5 sept, 14 Nov, 31 Dec, 12-14 Jan etc.

amy_rc added a subscriber: amy_rc.Wed, Feb 10, 10:44 AM
Joe added a comment.EditedWed, Feb 10, 8:43 PM
In T273741#6816099, @Joe wrote:
In T273741#6815874, @Majavah wrote:

Is the effect that the block will have in the app known?

No, hence we tried to reach out to them, although it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.

I think this is more than enough courtesy on our part. Anyways, the block will clearly link to this task as the reason for the block, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/663004/4/modules/varnish/templates/upload-frontend.inc.vcl.erb#380, so that whoever is responsible for this can figure out how to reach us.

Small - and happy! - status update: we managed to get in touch with the app developers, who confirmed our suspicions were correct, and promised a quick turnaround in posting an updated version of the application. We will thus hold back the banning of the url for now, awaiting for confirmation of the desired effect to reduce the potential harmful impact on the application users.

Given how much "sample code" we found around the internet using that url, it might still be a good idea to merge the patch later just to prevent this from happening again.

ttaylor added a subscriber: ttaylor.Thu, Feb 11, 3:11 AM
Arrbee added a subscriber: Arrbee.Thu, Feb 11, 10:19 AM
valerio.bozzolan awarded a token.Thu, Feb 11, 11:10 AM
valerio.bozzolan added a subscriber: valerio.bozzolan.
eamedina added a subscriber: eamedina.Thu, Feb 11, 1:00 PM
crusnov added a subscriber: crusnov.Thu, Feb 11, 5:20 PM
Kevin-Cox added a subscriber: Kevin-Cox.Thu, Feb 11, 6:46 PM
Content licensed under Creative Commons Attribution-ShareAlike 3.0 (CC-BY-SA) unless otherwise noted; code licensed under GNU General Public License (GPL) or other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL