Page MenuHomePhabricator

Create ipinfo_ip_changes table [M]
Closed, ResolvedPublic

Description

Per the outcome of T292623: [SPIKE] Investigate getting global contribution information for IP Info [8H], we need to create a table to track the most recent edits by logged-out users.

  1. Seek approval from DBA before merging the ticket,
  2. Get DBA to +2 the patch,
  3. Make them run the maintenance script once in prod.

AC

  • The ipinfo_ip_changes schema is defined in the codebase. The (MySQL) schema is as follows:
    • ipc_id: bigint unsigned (required, automatically incrementing primary key)
    • ipc_ip_hex: varbinary(35) (required)
    • ipc_timestamp: binary(14) (required)
    • index on ipc_ip_hex
    • index on ipc_timestamp for purge performance - T292623#7516250
  • Merge: https://gerrit.wikimedia.org/r/754806 (to be done by DBAs)
  • Restart sanitarium hosts to apply the above filter (to be done by DBAs)
  • The ipinfo_ip_changes table is created on all wikis that the extension is loaded on

Notes

  1. As of writing, the IPInfo extension is only loaded on the Beta Cluster

Event Timeline

AGueyte updated the task description. (Show Details)
AGueyte updated the task description. (Show Details)
AGueyte updated the task description. (Show Details)
Tchanders renamed this task from Create ipinfo_ip_changes table to Create ipinfo_ip_changes table [M].Jan 12 2022, 4:40 AM

@Ladsgroup For the first part of this task:

  1. Seek approval from DBA before merging the ticket

...do we need to do anything more than the discussions we had with you on T292623: [SPIKE] Investigate getting global contribution information for IP Info [8H], plus code review on the patch to introduce the table (coming soon)?

Thanks for your help!

Change 754032 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/IPInfo@master] Create ipinfo_ip_changes table

https://gerrit.wikimedia.org/r/754032

@phuedx Do we also want an index on ipc_ip_hex?

@Ladsgroup For the first part of this task:

  1. Seek approval from DBA before merging the ticket

...do we need to do anything more than the discussions we had with you on T292623: [SPIKE] Investigate getting global contribution information for IP Info [8H], plus code review on the patch to introduce the table (coming soon)?

Thanks for your help!

Not that I'm aware of but I think @Marostegui should be in the loop as well (and having an estimation of the size would be great). I assume this goes to x1 (due to its central nature) and x1 should be okay-ish in terms of capacity.

We don't replicate x1 to wiki replicas (for now) but it would be good to know whether the content of this table could be made public or not.

@phuedx Do we also want an index on ipc_ip_hex?

Good catch. Yes. We'll be querying the table by that field the majority of the time.

We don't replicate x1 to wiki replicas (for now) but it would be good to know whether the content of this table could be made public or not.

I think this data should be fine to be public. It's a record of when an IP made a contribution, which would be publicly visible via Special:Contributions/<ip> anyway

We don't replicate x1 to wiki replicas (for now) but it would be good to know whether the content of this table could be made public or not.

I think this data should be fine to be public. It's a record of when an IP made a contribution, which would be publicly visible via Special:Contributions/<ip> anyway

I think making it public would leak cases where the IP has been marked as deleted/suppressed.

Ah, thanks @JJMC89

Testing/reviewing notes for AHT devs:

  • to create the table locally, run php maintenance/update.php from the mediawiki core directory with the above patch
  • note that the schema outlined in the task description is for MySQL.

We don't replicate x1 to wiki replicas (for now) but it would be good to know whether the content of this table could be made public or not.

I think this data should be fine to be public. It's a record of when an IP made a contribution, which would be publicly visible via Special:Contributions/<ip> anyway

I think making it public would leak cases where the IP has been marked as deleted/suppressed.

If we are not sure, I would recommend to make it private.
Keep in mind that at the moment, x1 content isn't replicated at all to our public clouddb databases, so nothing would be exposed but we want to be sure for the future.
So I am going to go ahead and create the patch to make it private for now.

Change 754806 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] realm.pp: Add ipinfo_ip_changes to private tables

https://gerrit.wikimedia.org/r/754806

Change 754806 merged by Marostegui:

[operations/puppet@production] realm.pp: Add ipinfo_ip_changes to private tables

https://gerrit.wikimedia.org/r/754806

So I am going to go ahead and create the patch to make it private for now.

Thanks @Marostegui - sounds like the best course of action.

I have a couple more questions about deployment (pinging @phuedx too):

  • Do we need to do add this to createExtensionTables.php, or is the LoadExtensionSchemaUpdatesHook handler enough? (I notice some recent extensions missing from createExtensionTables.php, e.g. DiscussionTools, but I'm not sure why)
  • Since this table is intended for the wikishared database, do we need to do anything to prevent it also being added to each wiki's database? (The application will only add to and read from wikishared, which will be controlled by a config.)
  • Might there be any privacy concerns when adding this to the beta cluster?

I don't know how those work on top of my head but check how url shortener does it, it's also on x1. HTH.

I have a couple more questions about deployment (pinging @phuedx too):

  • Do we need to do add this to createExtensionTables.php, or is the LoadExtensionSchemaUpdatesHook handler enough? (I notice some recent extensions missing from createExtensionTables.php, e.g. DiscussionTools, but I'm not sure why)

The LoadExtensionSchemaUpdatesHook is preferable IMO as it's more extensible.

  • Since this table is intended for the wikishared database, do we need to do anything to prevent it also being added to each wiki's database? (The application will only add to and read from wikishared, which will be controlled by a config.)

I defer to DBA on this. In the (default?) case where the table is created in all databases, all but wikishared.ipinfo_ip_changes would be empty. Is that a problem?

  • Since this table is intended for the wikishared database, do we need to do anything to prevent it also being added to each wiki's database? (The application will only add to and read from wikishared, which will be controlled by a config.)

I defer to DBA on this. In the (default?) case where the table is created in all databases, all but wikishared.ipinfo_ip_changes would be empty. Is that a problem?

Yes unfortunately: This would mean a lot of inodes in s3 (that's why we had to move a set of a wiki off of it to s5) and this can also become problematic in backups. The other problem with this approach is that if due to a software bug (which I has seen happening), the code take local connection instead of connection to x1, it wouldn't explode and that can lead to data corruption. It's better that it would explode instead.

OK. In order to avoid creating a lot of inodes unnecessarily, we could add schema updates conditionally in the LoadExtensionSchemaUpdatesHook handler, e.g.

IPInfo/includes/Hook/SchemaUpdatesHookHandler.php
class SchemaUpdatesHookHandler implements LoadExtensionSchemaUpdatesHook {
  public function onLoadExtensionSchemaUpdates( $updater ) {
    // NOTE: The global service locator has not been initialised in this context.
    global $wgIPInfoSharedDatabase;

    if ( $updater->getDB()->getDomainID() === $wgIPInfoSharedDatabase ) {
      // ...
    }
  }
}

LoadExtensionSchemaUpdatesHook hooks into update.php but we don't run update.php in production (it would bring everything down). That's for third party systems. We need to deploy this manually but Thalia done that before (you can do that on your own for creation of tables but for the rest a DBA has to do it, an example: T285149). If you need help in deploying it, let me know.

Thanks for clarifying, @Ladsgroup. I was getting the deployment scenarios (first- and third-party) mixed up.

Thanks everyone. My understanding at this point:

  • Since we do run update.php for the Beta Cluster, once we merge the patch, ipinfo_ip_changes will soon be added there
  • We'll need a configurable database getter that uses wikishared on both beta and production, once we deploy there. (The UrlShortener also does this.) We'll do this in T297701: Record recent IP contributions in the ipinfo_ip_changes table [M].
  • When we come to deploy IPInfo to production we'll want to add the table manually to wikishared only. (And given that it's only adding one table to one database, maybe no point in using createExtensionTables.php.)

Change 754032 merged by jenkins-bot:

[mediawiki/extensions/IPInfo@master] Create ipinfo_ip_changes table

https://gerrit.wikimedia.org/r/754032