Page MenuHomePhabricator

Create core ip_changes view for replicas
Closed, ResolvedPublic

Description

The ip_changes table is now live and populated on all wikis. We'd like to have this on the replica hosts.

Per T173891#3546336 we will need to create a specialized view that does not include revisions-deleted IPs.

Event Timeline

Marostegui moved this task from Triage to Pending comment on the DBA board.
Marostegui added subscribers: jcrespo, Marostegui.

Hi,

We will take care of this, but if possible try to avoid assigning it directly to Jaime (unless he specified it) so any of the DBAs can pick it up.

jcrespo edited projects, added acl*security; removed Schema-change-in-production.

We will take care of this

Actually, we will not, as you can see on https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change creating tables is not something that #DBAs do, as we are not mediawiki deployers. The only thing we ask is to block on us to check labs filtering, but we will not execute it.

The only step here is to check filtering is working. For that, given this is a supposedly a public table that contains IPs, I only need the ok from security to replicate to labs and that its data is ok according to privacy policy- I will then add it to the list of public tables. After that, cloud team can add it to too to the list of available tables.

We will take care of this

Actually, we will not, as you can see on https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change creating tables is not something that #DBAs do, as we are not mediawiki deployers. The only thing we ask is to block on us to check labs filtering, but we will not execute it.

Thanks Jaime! First time I deal with a CREATE table statement :-)

The only step here is to check filtering is working. For that, given this is a supposedly a private table that contains IPs, I only need the ok from security to replicate to labs and that its data is ok according to privacy policy- I will then add it to the list of public tables. After that, cloud team can add it to too to the list of available tables.

Makes sense!

if possible try to avoid assigning it directly to Jaime

I assigned it to Jaime because the documentation said the task should be "owned by DBA", which should probably be changed to "tagged for the DBA" (or something less confusing).

given this is a supposedly a public table that contains IPs, I only need the ok from security to replicate to labs and that its data is ok according to privacy policy

It only records IPs in cases where a logged out user makes an edit, which should be public data. (The records are also deleted from the table when the revision is deleted.) The code that interacts with the table can be found at https://gerrit.wikimedia.org/r/#/c/349457/.

It only records IPs in cases where a logged out user makes an edit, which should be public data

I trust you, I just want explicit Security ok, we will do a couple of patches and you will get unblocked. They have much more familiarity with mediawiki code than I do and @Bawolff recently did a full audit of all tables, so definitely they should be looped in, even if only to say "ok if dbas are ok with it".

Should the data be made available on the labs replicas and/or dumps: Yes, nothing in the table is private data

The table should not be in dumps - it could contain revision deleted IPs.

In labs the table is fine provided the view of it somehow deals with revdeleted stuff. Perhaps the view of it could do INNER JOIN revision on ipc_rev_id = rev_id AND rev_deleted&4=0

So, in conclusion, I'm fine with creating this table, and its fine to "replicate" to the labs hosts, but it needs a specialized view before exposing to users at labs.

The table should not be in dumps - it could contain revision deleted IPs.

@Bawolff: How are revision deleted IPs currently dealt with in dumps?

The table should not be in dumps - it could contain revision deleted IPs.

@Bawolff: How are revision deleted IPs currently dealt with in dumps?

Normally the tables involved (e.g. revision) aren't dumped, and instead users must rely on the xml dumps which redact the user in those cases.

AFAIK, tables aren't automatically added to the dump but need to be requested specificly, so this is probably not something you have to worry about. @ArielGlenn is the master of all things dump, and would know more about this then me.

Yes, mediawiki is used for dumping, not sql, as a precaution.

@kaldari - this means that you should not close this ticket (as we dbas and cloud have to add filtering rules), but you are unblocked from creating them on production. Please consider creating them on testwiki first if that allows some kind of production testing.

Ran foreachwiki sql.php /srv/mediawiki/php/maintenance/archives/patch-ip_changes.sql. Could someone verify that the new ip_changes table is live in all the databases? (Not sure if there's a handy tool for that.)

Ran foreachwiki sql.php /srv/mediawiki/php/maintenance/archives/patch-ip_changes.sql. Could someone verify that the new ip_changes table is live in all the databases? (Not sure if there's a handy tool for that.)

It appears on the masters for all the databases on the sX.dblists. So it looks good.

if possible try to avoid assigning it directly to Jaime

I assigned it to Jaime because the documentation said the task should be "owned by DBA", which should probably be changed to "tagged for the DBA" (or something less confusing).

Changed - thanks for pointing that out

The table should not be in dumps - it could contain revision deleted IPs.

@Bawolff: How are revision deleted IPs currently dealt with in dumps?

Normally the tables involved (e.g. revision) aren't dumped, and instead users must rely on the xml dumps which redact the user in those cases.

AFAIK, tables aren't automatically added to the dump but need to be requested specificly, so this is probably not something you have to worry about. @ArielGlenn is the master of all things dump, and would know more about this then me.

Indeed the revision table is not dumped; we have a whitelist of tables to be dumped, which can be found here:
https://phabricator.wikimedia.org/source/operations-puppet/browse/HEAD/modules/snapshot/files/dumps/table_jobs.yaml;refs/changes/80/346280/2
Tables marked as 'private' used to be dumped, but we turned that off a while ago, since we don't use these dumps as a backup of any sort.

@MusikAnimal: Do we want this table (minus the revdeleted content) available on Labs? (It will require creating a specialized view of the table.)

@MusikAnimal: Do we want this table (minus the revdeleted content) available on Labs? (It will require creating a specialized view of the table.)

I believe so, yes :)

MusikAnimal renamed this task from create production ip_changes table for RangeContributions to Create core ip_changes view for replicas.Sep 28 2017, 11:28 PM
MusikAnimal updated the task description. (Show Details)

The table is now live and populated on all wikis, so I've repurposed this ticket for creating the replicas view. Hope that is OK. Any update on that? Anything I can do to help? People have been asking about this :)

I can see the table on the replicas, so what is pending is cloud-services-team to create the view

The table is now live and populated on all wikis, so I've repurposed this ticket for creating the replicas view. Hope that is OK. Any update on that? Anything I can do to help? People have been asking about this :)

You can submit a patch to modules/role/templates/labs/db/views/maintain-views.yaml in operations/puppet

The view should do an inner join against revision to make sure no entries with rev_deleted&4=4 are included.

bd808 subscribed.

Should the data be made available on the labs replicas and/or dumps: Yes, nothing in the table is private data

The table should not be in dumps - it could contain revision deleted IPs.

In labs the table is fine provided the view of it somehow deals with revdeleted stuff. Perhaps the view of it could do INNER JOIN revision on ipc_rev_id = rev_id AND rev_deleted&4=0

So, in conclusion, I'm fine with creating this table, and its fine to "replicate" to the labs hosts, but it needs a specialized view before exposing to users at labs.

Change 383935 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] maintain-views: Add ip_changes view

https://gerrit.wikimedia.org/r/383935

Change 383935 merged by Andrew Bogott:
[operations/puppet@production] maintain-views: Add ip_changes view

https://gerrit.wikimedia.org/r/383935

This should be ready for maintain-views to be run on labsdb* to actually create the view in each wikidb. I think that sudo maintain-views --table ip_changes --all-databases will work. The --help for maintain-views says it only works with fullviews tables, but the code looks like it is actually there to handle customviews as well.

bd808 triaged this task as Medium priority.Oct 13 2017, 8:31 PM
bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
bd808 assigned this task to Andrew.
bd808 added a subscriber: Andrew.

@Andrew merged my config patch and ran the script to create the views.

$ sql --cluster web enwiki
(u3518@enwiki.web.db.svc.eqiad.wmflabs) [enwiki_p]> describe ip_changes;
+-------------------+------------------+------+-----+----------------+-------+
| Field             | Type             | Null | Key | Default        | Extra |
+-------------------+------------------+------+-----+----------------+-------+
| ipc_rev_id        | int(10) unsigned | NO   |     | 0              |       |
| ipc_rev_timestamp | binary(14)       | NO   |     |                |       |
| ipc_hex           | varbinary(35)    | NO   |     |                |       |
+-------------------+------------------+------+-----+----------------+-------+
3 rows in set (0.00 sec)