Page MenuHomePhabricator

Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons
Closed, ResolvedPublic

Assigned To
Authored By
Joe
Feb 3 2021, 11:21 AM
Referenced Files
F34101275: asteri-requests.png
Feb 12 2021, 7:27 AM
F34095671: Screenshot at 2021-02-08 17-18-10.png
Feb 9 2021, 1:18 AM
Tokens
"Burninate" token, awarded by Harej."The World Burns" token, awarded by DannyS712."Barnstar" token, awarded by Asartea."Barnstar" token, awarded by valerio.bozzolan."Barnstar" token, awarded by jijiki."Barnstar" token, awarded by mmodell."Cookie" token, awarded by Ladsgroup."Y So Serious" token, awarded by Prtksxna."Meh!" token, awarded by KartikMistry."The World Burns" token, awarded by Amire80."Cup of Joe" token, awarded by Elitre.

Description

Please avoid adding drive-by comments such as "hello from Hacker News" to this task as they are not helpful. Thank you.

We've noticed today that we get about 90M requests per day from various ISPs in India, all with the same characteristics:

URL: https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/AsterNovi-belgii-flower-1mb.jpg/1280px-AsterNovi-belgii-flower-1mb.jpg
Referer: "-"
User-Agent: "-"

These are very strange, as they come from wildly different IPs, follow a daily traffic pattern, so we are hypothesising there is some mobile app predominantly used in india that hotlinks the above image for e.g. a splash screen.

We need to investigate this further as this kind of requests consitutes about 20% of all requests we get in EQSIN for media.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

https://newshimalaya.com/2021/02/09/%E2%9A%93-t273741-investigate-unusual-media-traffic-pattern-for-asternovi-belgii-flower-1mb-jpg-on-commons/

^ wut? I tried to search for links to this image and found... this Phabricator ticket content on a Nepali news site?

Screenshot at 2021-02-08 17-18-10.png (855×1 px, 310 KB)

Going to the TikTok website from India results in the regular TikTok page loading, with a banner from TikTok saying that the service is unavailable in India. Not a dedicated block page.

The lack of a User-Agent and all other distinguishing headers means this can't be coming from a web browser.

I would look up the IPs and see if some of them are from IP blocks associated with cellular providers. Since in general only mobile phones are on those IPs, that'll provide be some strong evidence that this is from a mobile app. My guess is an online connectivity check? I assume the detailed IP/request logs aren't public or I'd go investigate this myself. 🙃

Hi all, I've been doing a bit of research into possible apps that could be causing this and found two potential culprits that I am currently investigating.

The first is Mitron TV, (news article here), an Indian TikTok alternative which was made available again on the app store June 6th.

The second is Say Namaste, (news article here), an Indian Zoom alternative which was launched on the app stores June 9th.

Both fall into the timeline of huge increases, have millions of users and may be using '1280px-AsterNovi-belgii-flower-1mb.jpg' to check the users internet connection - especially for Say Namaste to ensure video connectivity. I've reached out to some developers at both companies and will report back. Let me know your thoughts.

EDIT: I have also noticed the dates match the reopening after lockdown for the whole of India: "This first phase of reopening was termed as "Unlock 1.0"[13] and permitted shopping malls, religious places, hotels and restaurants to reopen from 8 June." from Wikipedia

Tom

Thank you everyone for the comments and suggestions. I just wanted to share that we have identified the app and will update this task tomorrow. (And yes, it is a mobile app.)

^ wut? I tried to search for links to this image and found... this Phabricator ticket content on a Nepali news site?

Looks like it's just a big rss aggregator?

This comment was removed by mmodell.
This comment was removed by Zardula.

Rename to a new filename?

The Commons community generally avoids moving files, as it can break attribution and cause issues for (reasonable) external reusers. "Turn it off and see who screams" is a valid method when less disruptive methods fail, but it appears to be unnecessary in this case. Others have previously suggested serving a different file for users with the matching user agent header, which wouldn't break all external links to the file. While that solution would require more work than simply moving the file, it may also have been more effective (depending on how it was used).

Change 663004 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] upload-frontend: ban a specific url with no referer nor UA

https://gerrit.wikimedia.org/r/663004

Update:

Thank you for the interest in this task! Like we shared yesterday, we have identified that the traffic is coming from a popular mobile app in India. We have initiated contact with the app developers, and are waiting to hear back from them. In the meantime, given the volume of requests, we have decided to ban those specific requests until the issue is resolved. While we will refrain from naming the app at this time, we can share that it is not on the list of apps mentioned in this task. Nevertheless, we thank you for your comments and suggestions on how to debug this!

Since there has been some interest in how we narrowed it down to this particular app:

  1. The header attributes suggested that this was a mobile app. We then queried Hive, we determined that the connection attributes related to these headers (User-Agent and Referer) were mostly from IPv6 addresses, further confirming the theory that this was a popular mobile app.
  2. We then tried to isolate connections from geographical regions and ISPs in India but it was clear that there was no pattern there, as users were spread across the country.
  3. A few things were clear given the volume of the requests: it was a popular app with traffic throughout the day (and even late at night) with a peak on December 31 2020, suggesting that it may be a chat or social media app.
  4. We noticed that the image/app gained popularity somewhere around the time India blocked Chinese internet services and websites, thus affecting popular apps in India like TikTok. (This was pointed out by a user.)
  5. Based on the information above, we gathered a list of popular chat and social media applications in the country, especially apps that gained popularity after the above censorship event.
  6. We first started by downloading and running these apps to see if we could identify the image in their splash screens or within the apps. We also asked the community on the ground and there are many unnamed people who helped us with this -- thank you!
  7. This unfortunately didn't work as none of the apps we tested had the image anywhere -- neither in the splash screen nor in the apps themselves. The community in India was equally surprised given the popularity of this image/app and the fact that they had not seen it in their daily usage.
  8. It was then speculated that the app fetches the image but does not show it. (This was based on this comment.)
  9. To recap, we were aware of the following at this stage:
    • it is a popular chat/social media mobile app used in India
    • it sets the User-Agent and Referer to '-'
    • it fetches the image from Wikimedia Commons but does not display it
  10. To narrow down the app, we decided to observe connections to the image from clients (phones) to our servers. We did this by opening the popular apps one-by-one and noting down the time. After doing this for all the apps, we then ran this query in Hive: SELECT * FROM wmf.webrequest WHERE year=2021 AND month=2 AND day=9 AND parse_media_file_url(uri_path).base_name='/wikipedia/commons/1/16/AsterNovi-belgii-flower-1mb.jpg' AND webrequest_source='upload' AND uri_host = 'upload.wikimedia.org' AND user_agent='-' AND ip=<IP>;
  11. We then found the specific app that was making the request by matching the time when it was opened and the time image was requested from our servers, restricting the results to the User-Agent '-' and from the IP we tested.
  12. By this time, we had isolated the app and were convinced that this is the one that is fetching the image on startup. We could not find the image anywhere in the app, confirming our theory that it fetches the image but does not display it.
  13. To further confirm this finding and to ensure that we had the correct app, we decided to log DNS queries from a phone by setting up a local resolver to capture DNS traffic. After pointing the phone towards it and launching the app, we noticed that it was indeed the one looking up upload.wikimedia.org on startup.

@ssingh said it yesterday on chat but this is such stellar data detective work. Congrats on finding the culprit!!

Is the effect that the block will have in the app known?

Just to second what @fdans said, the data detective work was great and this was such a fun ticket to watch.

In T273741#6815874, @Majavah wrote:

Is the effect that the block will have in the app known?

No, hence we tried to reach out to them, although it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.

I think this is more than enough courtesy on our part. Anyways, the block will clearly link to this task as the reason for the block, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/663004/4/modules/varnish/templates/upload-frontend.inc.vcl.erb#380, so that whoever is responsible for this can figure out how to reach us.

The traffic spikes are closely matching indian holidays. 2 Oct, 5 sept, 14 Nov, 31 Dec, 12-14 Jan etc.

In T273741#6815874, @Majavah wrote:

Is the effect that the block will have in the app known?

No, hence we tried to reach out to them, although it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.

I think this is more than enough courtesy on our part. Anyways, the block will clearly link to this task as the reason for the block, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/663004/4/modules/varnish/templates/upload-frontend.inc.vcl.erb#380, so that whoever is responsible for this can figure out how to reach us.

Small - and happy! - status update: we managed to get in touch with the app developers, who confirmed our suspicions were correct, and promised a quick turnaround in posting an updated version of the application. We will thus hold back the banning of the url for now, awaiting for confirmation of the desired effect to reduce the potential harmful impact on the application users.

Given how much "sample code" we found around the internet using that url, it might still be a good idea to merge the patch later just to prevent this from happening again.

According to data in turnilo, the flow of requests has gone down significantly after the app authors fixed the issue on their side:

asteri-requests.png (684×1 px, 108 KB)

I'm inclined to resolve the task once we have a longer-range confirmation that the traffic is indeed gone for good.

Our traffic also decreased significantly in Eqsin.

In T273741#6815874, @Majavah wrote:

Is the effect that the block will have in the app known?

No, hence we tried to reach out to them, although it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.

I think this is more than enough courtesy on our part. Anyways, the block will clearly link to this task as the reason for the block, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/663004/4/modules/varnish/templates/upload-frontend.inc.vcl.erb#380, so that whoever is responsible for this can figure out how to reach us.

Small - and happy! - status update: we managed to get in touch with the app developers, who confirmed our suspicions were correct, and promised a quick turnaround in posting an updated version of the application. We will thus hold back the banning of the url for now, awaiting for confirmation of the desired effect to reduce the potential harmful impact on the application users.

Given how much "sample code" we found around the internet using that url, it might still be a good idea to merge the patch later just to prevent this from happening again.

The sort of suggests the developer community needs an example image to use in example code, the way we do for urls (e.g. http://example.com ).

Although I assume hosting it ourselves would be out of scope. (We actually do have a category for these on commons though! https://commons.wikimedia.org/wiki/Example_images )

The sort of suggests the developer community needs an example image to use in example code, the way we do for urls (e.g. http://example.com ).
Although I assume hosting it ourselves would be out of scope.

There is https://placeholder.com/ which seems to position itself as just that, placeholder images for designers.

Quick fly-by, but this image corresponds to an internet meme in Hong Kong, though seems unlikely it affects EQSIN servers.

I suggest it to be similar to a placeholder using this image in large sites, it may be server side.

Need more info on UAs before some conclusion can be made.

The UA in question used an empty user agent string. See also User-Agent policy, especially the last paragraph about as-needed enforcement.

Joe claimed this task.

Given the original issue is definitely resolved, there is no point in keeping this task open. Kudos to the app developers who managed to fix this issue promptly!

Change 663004 abandoned by Giuseppe Lavagetto:
[operations/puppet@production] upload-frontend: ban a specific url with no referer nor UA

Reason:
Solved in another way

https://gerrit.wikimedia.org/r/663004