The ip_changes table is now live and populated on all wikis. We'd like to have this on the replica hosts.
Per T173891#3546336 we will need to create a specialized view that does not include revisions-deleted IPs.
The ip_changes table is now live and populated on all wikis. We'd like to have this on the replica hosts.
Per T173891#3546336 we will need to create a specialized view that does not include revisions-deleted IPs.
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
maintain-views: Add ip_changes view | operations/puppet | production | +8 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Andrew | T173891 Create core ip_changes view for replicas | |||
Resolved | MaxSem | T175105 Run ip_changes maintenance script on each wiki |
Hi,
We will take care of this, but if possible try to avoid assigning it directly to Jaime (unless he specified it) so any of the DBAs can pick it up.
We will take care of this
Actually, we will not, as you can see on https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change creating tables is not something that #DBAs do, as we are not mediawiki deployers. The only thing we ask is to block on us to check labs filtering, but we will not execute it.
The only step here is to check filtering is working. For that, given this is a supposedly a public table that contains IPs, I only need the ok from security to replicate to labs and that its data is ok according to privacy policy- I will then add it to the list of public tables. After that, cloud team can add it to too to the list of available tables.
Thanks Jaime! First time I deal with a CREATE table statement :-)
The only step here is to check filtering is working. For that, given this is a supposedly a private table that contains IPs, I only need the ok from security to replicate to labs and that its data is ok according to privacy policy- I will then add it to the list of public tables. After that, cloud team can add it to too to the list of available tables.
Makes sense!
if possible try to avoid assigning it directly to Jaime
I assigned it to Jaime because the documentation said the task should be "owned by DBA", which should probably be changed to "tagged for the DBA" (or something less confusing).
given this is a supposedly a public table that contains IPs, I only need the ok from security to replicate to labs and that its data is ok according to privacy policy
It only records IPs in cases where a logged out user makes an edit, which should be public data. (The records are also deleted from the table when the revision is deleted.) The code that interacts with the table can be found at https://gerrit.wikimedia.org/r/#/c/349457/.
It only records IPs in cases where a logged out user makes an edit, which should be public data
I trust you, I just want explicit Security ok, we will do a couple of patches and you will get unblocked. They have much more familiarity with mediawiki code than I do and @Bawolff recently did a full audit of all tables, so definitely they should be looped in, even if only to say "ok if dbas are ok with it".
Should the data be made available on the labs replicas and/or dumps: Yes, nothing in the table is private data
The table should not be in dumps - it could contain revision deleted IPs.
In labs the table is fine provided the view of it somehow deals with revdeleted stuff. Perhaps the view of it could do INNER JOIN revision on ipc_rev_id = rev_id AND rev_deleted&4=0
So, in conclusion, I'm fine with creating this table, and its fine to "replicate" to the labs hosts, but it needs a specialized view before exposing to users at labs.
The table should not be in dumps - it could contain revision deleted IPs.
@Bawolff: How are revision deleted IPs currently dealt with in dumps?
Normally the tables involved (e.g. revision) aren't dumped, and instead users must rely on the xml dumps which redact the user in those cases.
AFAIK, tables aren't automatically added to the dump but need to be requested specificly, so this is probably not something you have to worry about. @ArielGlenn is the master of all things dump, and would know more about this then me.
Yes, mediawiki is used for dumping, not sql, as a precaution.
@kaldari - this means that you should not close this ticket (as we dbas and cloud have to add filtering rules), but you are unblocked from creating them on production. Please consider creating them on testwiki first if that allows some kind of production testing.
Ran foreachwiki sql.php /srv/mediawiki/php/maintenance/archives/patch-ip_changes.sql. Could someone verify that the new ip_changes table is live in all the databases? (Not sure if there's a handy tool for that.)
Indeed the revision table is not dumped; we have a whitelist of tables to be dumped, which can be found here:
https://phabricator.wikimedia.org/source/operations-puppet/browse/HEAD/modules/snapshot/files/dumps/table_jobs.yaml;refs/changes/80/346280/2
Tables marked as 'private' used to be dumped, but we turned that off a while ago, since we don't use these dumps as a backup of any sort.
@MusikAnimal: Do we want this table (minus the revdeleted content) available on Labs? (It will require creating a specialized view of the table.)
The table is now live and populated on all wikis, so I've repurposed this ticket for creating the replicas view. Hope that is OK. Any update on that? Anything I can do to help? People have been asking about this :)
I can see the table on the replicas, so what is pending is cloud-services-team to create the view
You can submit a patch to modules/role/templates/labs/db/views/maintain-views.yaml in operations/puppet
The view should do an inner join against revision to make sure no entries with rev_deleted&4=4 are included.
Change 383935 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] maintain-views: Add ip_changes view
Change 383935 merged by Andrew Bogott:
[operations/puppet@production] maintain-views: Add ip_changes view
This should be ready for maintain-views to be run on labsdb* to actually create the view in each wikidb. I think that sudo maintain-views --table ip_changes --all-databases will work. The --help for maintain-views says it only works with fullviews tables, but the code looks like it is actually there to handle customviews as well.
@Andrew merged my config patch and ran the script to create the views.
$ sql --cluster web enwiki (u3518@enwiki.web.db.svc.eqiad.wmflabs) [enwiki_p]> describe ip_changes; +-------------------+------------------+------+-----+----------------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+------------------+------+-----+----------------+-------+ | ipc_rev_id | int(10) unsigned | NO | | 0 | | | ipc_rev_timestamp | binary(14) | NO | | | | | ipc_hex | varbinary(35) | NO | | | | +-------------------+------------------+------+-----+----------------+-------+ 3 rows in set (0.00 sec)