Page MenuHomePhabricator

Record IP reputation data for account creations and edits
Open, Needs TriagePublic

Description

Now that iPoid-Service is in production, we have easy access to IP reputation data. This task proposes to create an event logging schema for fetching IP reputation data on account creation and edit activity, and logging that data in an event.

That would allow us to verify the usefulness of ipoid's data as a predictor of user activity. e.g. which percentage of accounts created from IPs known to ipoid were blocked after one month? Which percentage of edits created from IPs known to ipoid were eventually reverted?

We could also further break down these queries by including the full metadata provided by ipoid in the event, so we could for example distinguish between VPNs and callback proxies, or different proxy providers, etc.

Event Timeline

I'm planning to create a schema like "ip_reputation_log", looking something like:

Field nameTypeNotes
ipstringThe IP address used by the performer
ip_metadatastringThe JSON blob from ipoid
(data points from ip_metadata extracted as fields)
identifierintThe revision ID or log entry ID associated with the action
actionstringThe type of action, e.g. "edit", "createaccount"

Also include "performer" which would add the username.

The ip_metadata contains the following info:

ip                     The IP address queried
as                     Autonomous System Details
client.behaviors       Behaviors of clients on this IP
client.concentration   Location concentration of clients on this IP
client.count           Average number of clients observed per day
client.countries       Number of countries clients have come from
client.proxies         Call-back proxies running from devices on this IP
client.spread          The geographic spread of clients (km^2)
client.types           Types of client devices observed
infrastructure         The classification of infrastructure this IP is in
organization           The organization operating the IP address
location               Maxmind GeoLite2 location data
services               Protocols and services running on this IP (e.g. OpenVPN)
tunnels                VPN/Proxy/Anonymization details and operator information
risks                  Risks and threats from this IP address

We'd probably want to add most of these into distinct fields, to make it easier to query.

Change 1011281 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[schemas/event/secondary@master] Add ip_reputation/score schema

https://gerrit.wikimedia.org/r/1011281

Change 1011281 merged by jenkins-bot:

[schemas/event/secondary@master] Add ip_reputation/score schema

https://gerrit.wikimedia.org/r/1011281

kostajh renamed this task from Record IP reputation data for account creations, logins, and edits to Record IP reputation data for account creations and edits.Mar 21 2024, 7:53 AM
kostajh updated the task description. (Show Details)

Change #1013723 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/WikimediaEvents@master] WIP: Record IP reputation data with edits

https://gerrit.wikimedia.org/r/1013723

Change #1015106 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] zuul: Add EventBus to phan dependencies for WikimediaEvents

https://gerrit.wikimedia.org/r/1015106

Change #1015106 merged by jenkins-bot:

[integration/config@master] zuul: Add EventBus to phan dependencies for WikimediaEvents

https://gerrit.wikimedia.org/r/1015106

Change #1015114 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/WikimediaEvents@master] Record IP reputation data for account creation

https://gerrit.wikimedia.org/r/1015114

Change #1015295 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] beta: Disable wgWikimediaEventsIPoidUrl

https://gerrit.wikimedia.org/r/1015295

Change #1015296 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] WikimediaEvents: Set IPoid URL

https://gerrit.wikimedia.org/r/1015296

Change #1015299 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] EventStreamConfig: Register ip_reputation/score

https://gerrit.wikimedia.org/r/1015299

Change #1017812 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[schemas/event/secondary@master] ip_reputation/score: Add action for account auto creation

https://gerrit.wikimedia.org/r/1017812

Change #1017813 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/WikimediaEvents@master] IPReputationHooks: Record account autocreation events

https://gerrit.wikimedia.org/r/1017813

Change #1017812 merged by jenkins-bot:

[schemas/event/secondary@master] ip_reputation/score: Add action for account auto creation

https://gerrit.wikimedia.org/r/1017812

Change #1013723 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Record IP reputation data with edits

https://gerrit.wikimedia.org/r/1013723

Change #1015114 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Record IP reputation data for account creation

https://gerrit.wikimedia.org/r/1015114

Change #1017813 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] IPReputationHooks: Record account autocreation events

https://gerrit.wikimedia.org/r/1017813

Change #1015296 abandoned by Kosta Harlan:

[operations/mediawiki-config@master] WikimediaEvents: Set IPoid URL

Reason:

https://gerrit.wikimedia.org/r/1015296

Change #1015299 abandoned by Kosta Harlan:

[operations/mediawiki-config@master] EventStreamConfig: Register ip_reputation/score

Reason:

https://gerrit.wikimedia.org/r/1015299

Change #1015295 merged by jenkins-bot:

[operations/mediawiki-config@master] WikimediaEvents: Set IPoid URL and enable ip_reputation/score

https://gerrit.wikimedia.org/r/1015295

Mentioned in SAL (#wikimedia-operations) [2024-04-17T13:33:33Z] <logmsgbot> lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:1015295|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (T354597)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-17T13:36:32Z] <logmsgbot> lucaswerkmeister-wmde@deploy1002 kharlan and lucaswerkmeister-wmde: Backport for [[gerrit:1015295|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (T354597)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Change #1015295 merged by jenkins-bot:

[operations/mediawiki-config@master] WikimediaEvents: Set IPoid URL and enable ip_reputation/score

https://gerrit.wikimedia.org/r/1015295

Note for visibility: this was reverted because it didn’t work on mwdebug (and there was a logstash error about “Event submitted for unregistered stream name "mediawiki.ip_reputation.score"”), the revert just isn’t attached to this task.

Change #1020929 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt)

https://gerrit.wikimedia.org/r/1020929

Change #1020929 merged by jenkins-bot:

[operations/mediawiki-config@master] WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt)

https://gerrit.wikimedia.org/r/1020929

Change #1021338 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] ext-EventLogging: Add mediawiki.ip_reputation.score

https://gerrit.wikimedia.org/r/1021338

Change #1021338 merged by jenkins-bot:

[operations/mediawiki-config@master] ext-EventLogging: Add mediawiki.ip_reputation.score

https://gerrit.wikimedia.org/r/1021338

Mentioned in SAL (#wikimedia-operations) [2024-04-18T07:25:13Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:1020929|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt) (T354597)]], [[gerrit:1021338|ext-EventLogging: Add mediawiki.ip_reputation.score (T354597)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-18T07:28:16Z] <urbanecm@deploy1002> kharlan and urbanecm: Backport for [[gerrit:1020929|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt) (T354597)]], [[gerrit:1021338|ext-EventLogging: Add mediawiki.ip_reputation.score (T354597)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Change #1021355 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score

https://gerrit.wikimedia.org/r/1021355

Mentioned in SAL (#wikimedia-operations) [2024-04-18T07:47:40Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:1020929|WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt) (T354597)]], [[gerrit:1021338|ext-EventLogging: Add mediawiki.ip_reputation.score (T354597)]] (duration: 22m 27s)

Change #1021355 merged by jenkins-bot:

[operations/mediawiki-config@master] EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score

https://gerrit.wikimedia.org/r/1021355

Mentioned in SAL (#wikimedia-operations) [2024-04-18T07:51:14Z] <kharlan@deploy1002> Started scap: Backport for [[gerrit:1021355|EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score (T354597)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-18T07:54:16Z] <kharlan@deploy1002> urbanecm and kharlan: Backport for [[gerrit:1021355|EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score (T354597)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-04-18T08:10:51Z] <kharlan@deploy1002> Finished scap: Backport for [[gerrit:1021355|EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score (T354597)]] (duration: 19m 36s)