Page MenuHomePhabricator

Accounts created via enwiki's WP:ACC have weird user-agents in checkuser data
Closed, ResolvedPublic

Description

A fellow steward approached me, asking why is there a lot of accounts created via 172.16.0.107, example from login.wikimedia.org is below:

image.png (150×717 px, 9 KB)

Investigation shown that the IP belongs to enwiki's [[WP:ACC]]:

urbanecm@tools-sgebastion-07  ~
$ host 172.16.0.107
107.0.16.172.in-addr.arpa domain name pointer accounts-appserver5.account-creation-assistance.eqiad1.wikimedia.cloud.
urbanecm@tools-sgebastion-07  ~
$

However, the data in the screenshot above have confusing UA, 0, which doesn't tell anyone anything. The same UA is in enwiki's CU database:

mysql:research@dbstore1003.eqiad.wmnet [enwiki]> select * from cu_changes where cuc_id=1077092657\G
*************************** 1. row ***************************
        cuc_id: 1077092657
 cuc_namespace: 2
     cuc_title:
      cuc_user: <redacted>
 cuc_user_text: <redacted>
cuc_actiontext: was created
   cuc_comment:
     cuc_minor: 0
   cuc_page_id: 0
cuc_this_oldid: 0
cuc_last_oldid: 0
      cuc_type: 3
 cuc_timestamp: <redacted>
        cuc_ip: 172.16.0.107
    cuc_ip_hex: AC10006B
       cuc_xff:
   cuc_xff_hex: NULL
     cuc_agent: 0
1 row in set (0.001 sec)

mysql:research@dbstore1003.eqiad.wmnet [enwiki]>

I thought this is an ACC issue and was going to fill a task for ACC people to fix, but...data in event.mediawiki_api_request _do_ have some UA (generated though P17494):

dtuser_agentdatabaseparams[action]
2021-10-18T01:48:03ZMediaWiki/1.38.0-wmf.4enwikiedit
2021-10-18T01:32:29ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikiquery
2021-10-18T01:48:03ZMediaWiki/1.38.0-wmf.4enwikiquery
2021-10-18T01:43:51ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikititleblacklist
2021-10-18T01:46:14ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikititleblacklist
2021-10-18T01:29:09ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikiantispoof
2021-10-18T01:45:05ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikiantispoof
2021-10-18T01:45:05ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikititleblacklist
2021-10-18T01:48:02ZMediaWiki/1.38.0-wmf.4enwikiquery
2021-10-18T01:45:02ZWikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)enwikiantispoof

Apparently, part of the requests have a MW user agent, part of them have a properly-formated ACC agent, and CheckUser ends up with yet another UA, 0?

This...looks like a bug on WMF's end rather than in ACC?

Related Objects

Event Timeline

I've done a little bit of digging on ACC's side, and while I can't give an in-depth analysis just now, I can confirm the following points:

At the moment, this is vague disjointed thoughts, and I've not done anything to confirm this is actually what's happening - that'll need to wait for tonight when I have a debugger available to trace a request end-to-end. I suspect ACC isn't setting a UA for OAuth requests due to the apparent lack of support for it provided by the OAuth client, and that's the underlying cause.

Change 731740 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/oauthclient-php@master] Let users to supply user-agent

https://gerrit.wikimedia.org/r/731740

I set up a demo version of the OAuth library and played with it for a while. Changing Client::makeCurlCall to add new header, User-Agent: foo, caused all of mediawiki to see that new UA. Adding support for that looks to be easy, so I went ahead and uploaded a patch -- reviews welcomed.


For the zero user-agent, CheckUser blindly passes result of $wgRequest->getHeader( 'User-Agent' ) to the database. In case of no user agent, that call returns false. Since MariaDB represents false/true with 0/1, that explains where the 0 comes from.

Likely should be fixed to actually store an empty string -- I'll fill a (public) task for that.


I'm still very confused on the MediaWiki/ agents mentioned in the description. When setting X-Wikimedia-Debug: backend=mwdebug1001.eqiad.wmnet; log in the OAuth library for debug purposes, the verbose logs did not include user-agent header at all (as I'd expect). However:

[urbanecm@stat1005 ~]$ kafkacat -C -b kafka-jumbo1001.eqiad.wmnet:9092 -o -1 -t eqiad.mediawiki.api-request 2> /dev/null | jq -c 'select(.meta.domain | contains("meta.wikimedia.org"))' | jq -c 'select(.performer.user_text | contains("Martin Urbanec"))'
{"$schema":"/mediawiki/api/request/1.0.0","meta":{"request_id":"c52cc659-2c23-4fcd-9429-f38446339e9d","id":"c20e59f8-a777-4519-b463-91ff333c1160","dt":"2021-10-18T14:23:16Z","domain":"meta.wikimedia.org","stream":"mediawiki.api-request"},"http":{"method":"GET","client_ip":"redacted","request_headers":{"user-agent":"MediaWiki/1.38.0-wmf.4"}},"performer":{"user_text":"Martin Urbanec","user_id":7034294},"database":"metawiki","backend_time_ms":12,"api_error_codes":["readapidenied"],"params":{"action":"query","format":"json","meta":"userinfo"}}
^C
[urbanecm@stat1005 ~]$

In another words, analytics cluster sounds to receive incorrect UA. Funnily enough, the verbose logs (which are supposed to contain the same verbose) don't have that incorrect UA:

2021-10-18 14:25:20 [865dd832-63c0-4614-8c6e-07c066c56643] mwdebug1001 metawiki 1.38.0-wmf.4 api-request INFO:  {"$schema":"/mediawiki/api/request/1.0.0","meta":{"request_id":"865dd832-63c0-4614-8c6e-07c066c56643","id":"a50d4993-1b0a-4de5-9a6f-3fb566fa575f","dt":"2021-10-18T14:25:20Z","domain":"meta.wikimedia.org","stream":"mediawiki.api-request"},"http":{"method":"GET","client_ip":"redacted"},"performer":{"user_text":"Martin Urbanec","user_id":7034294},"database":"metawiki","backend_time_ms":22,"api_error_codes":["readapidenied"],"params":{"action":"query","format":"json","meta":"userinfo"}}

Looks like something in the analytics pipeline adds UA if one's missing? CC @Ottomata. Happy to fill a new task for this, I noticed this accidentally, when I wanted to understand how I got the mediawiki/ useragents in the description.

Change 731740 merged by jenkins-bot:

[mediawiki/oauthclient-php@master] Let users to supply user-agent

https://gerrit.wikimedia.org/r/731740

This is now fixed on ACC's side too as of 18:46 UTC. Account creations and edits made by the ACC tool should have the correct useragent attached - thanks @Urbanecm and @Majavah! If there's any other instances you spot where we're still not setting a useragent (I'd be surprised, but it's not outside the realm of possibility), please do let me know :)

Urbanecm assigned this task to stwalkerster.

This is now fixed on ACC's side too as of 18:46 UTC. Account creations and edits made by the ACC tool should have the correct useragent attached - thanks @Urbanecm and @Majavah! If there's any other instances you spot where we're still not setting a useragent (I'd be surprised, but it's not outside the realm of possibility), please do let me know :)

Did a quick check on few ACC reqs I just handled, looks okay now:

{"$schema":"/mediawiki/api/request/1.0.0","meta":{"request_id":"49c91ee7-4654-4d22-a4f0-a65fdfe247e5","id":"b47f2523-2bb0-48ff-a339-4fe62d789abd","dt":"2021-10-25T22:00:02Z","domain":"en.wikipedia.org","stream":"mediawiki.api-request"},"http":{"method":"POST","client_ip":"172.16.0.107","request_headers":{"user-agent":"Wikipedia-ACC Tool/0.1 (+https://accounts.wmflabs.org/internal.php/team)"}},"performer":{"user_text":"Martin Urbanec","user_id":23204849},"database":"enwiki","backend_time_ms":9,"params":{"action":"query","format":"json","assert":"user","meta":"tokens","type":"createaccount"}}
In [7]: df = spark.run("select distinct http.request_headers['user-agent'] from event.mediawiki_api_request where year=2021 and month=10 and day in (25, 24, 23) and http['client_ip']='172.16.0.107'")
PySpark executors will use /usr/lib/anaconda-wmf/bin/python3.

In [8]: df
Out[8]:
  http.request_headers AS `request_headers`[user-agent]
0  Wikipedia-ACC Tool/0.1 (+https://accounts.wmfl...

Resolving.

Urbanecm updated the task description. (Show Details)
Urbanecm changed the visibility from "Custom Policy" to "Public (No Login Required)".
Urbanecm changed the edit policy from "Custom Policy" to "All Users".

And published, double checking no PII is in this task.