Page MenuHomePhabricator

Record recent IP contributions in the ipinfo_ip_changes table [M]
Closed, InvalidPublic

Description

Background

Per the outcome of T292623: [SPIKE] Investigate getting global contribution information for IP Info [8H], we need to record in our ipinfo_ip_changes table that a logged-out user recently made a contribution. Because IP addresses are regularly reassigned (especially in the case of a roaming mobile device), we should purge these records regularly.

AC

  • When a logged-out user makes a contribution, a corresponding row is inserted into the ipinfo_ip_changes table in a central database
    • The ipc_ip_hex field is set to the hexadecimal representation of the IP, i.e. IPUtils::toHex( $ip )
  • The central database is configurable
    • $wgDBserver in development
    • wikishared in production

[] When a logged-out user makes a contribution, a job is enqueued that purges rows that are older than X days (90, by default) see T299017

Notes

Understanding how a contribution is being log in the first place and how to hook it to IP info.

  • We create a PageSaveComplete hook handler that:
    • Checks if the editor is logged out; and, if so
    • Queues a job that inserts a row into/purges rows that are older than X seconds from the ipinfo_ip_changes table in the wikishared database

Event Timeline

Hello @phuedx , we have questions regarding the process of this ticket, especially around the choice of the hook handler (PageSaveComplete. vs RevisionFromEditComplete) and the job running by a Cron
Thank you!

Hello @phuedx , we have questions regarding the process of this ticket, especially around the choice of the hook handler (PageSaveComplete. vs RevisionFromEditComplete) and the job running by a Cron

I reached for the PageSaveComplete hook reflexively as I've used it before. The RevisionFromEditComplete hook also seems like a good candidate. From a very brief glance, I can't see a case where one hook would be called but the other wouldn't. We should make sure whichever hook is chosen covers that cases where a row is inserted into the ip_changes table. That investigation could be captured as part of this task or in a separate spike. I'll be bold an suggest the former but I defer to team's judgement during their next AHaT Estimation meeting.

Re. running the job: My recommendation is/was to leverage MediaWiki's Job Queue rather than using cron. By enqueuing a short-lived job to update the table in a hook handler, we keep the table very-nearly-almost up to date without negatively impacting performance for logged-out users.

Re. running the job: My recommendation is/was to leverage MediaWiki's Job Queue rather than using cron. By enqueuing a short-lived job to update the table in a hook handler, we keep the table very-nearly-almost up to date without negatively impacting performance for logged-out users.

We are going to have to write a maintenance script to bootstrap the table. This should be captured in a task.

@phuedx how does wikishared database work and are there any examples of tools already using it that can be linked to this ticket ?

@phuedx how does wikishared database work and are there any examples of tools already using it that can be linked to this ticket?

It's just another production database, which is used by several tools to store global information. We can connect to the wikishared database in the same way that we connect to production databases for other wikis:

// Elsewhere...
$centralDB = 'wikishared';

$services = \MediaWiki\MediaWikiServices::getInstance();
$loadBalancer = MediaWikiServices::getInstance()
  ->getDBLoadBalancerFactory()
  ->getMainLB( $centralDB );
$dbw = $loadBalancer->getConnectionRef( DB_PRIMARY );
$dbr = $loadBalancer->getConnectionRef( DB_REPLICA );

I looked here for examples of other codebases that use the database in production and found:

ARamirez_WMF renamed this task from Record recent IP contributions in the ipinfo_ip_changes table to Record recent IP contributions in the ipinfo_ip_changes table [M].Jan 11 2022, 5:04 PM
ARamirez_WMF renamed this task from Record recent IP contributions in the ipinfo_ip_changes table [M] to Record recent IP contributions in the ipinfo_ip_changes table [L].Jan 11 2022, 5:08 PM
phuedx renamed this task from Record recent IP contributions in the ipinfo_ip_changes table [L] to Record recent IP contributions in the ipinfo_ip_changes table.Jan 11 2022, 5:11 PM
phuedx updated the task description. (Show Details)
Tchanders renamed this task from Record recent IP contributions in the ipinfo_ip_changes table to Record recent IP contributions in the ipinfo_ip_changes table [M].Jan 12 2022, 4:34 AM