Page MenuHomePhabricator

Drop externallinks.el_from_namespace on wmf databases
Closed, ResolvedPublic

Description

After abandon of patch set https://gerrit.wikimedia.org/r/#/c/163470/ the schema change from T86415 should be rollbacked.

So please drop the column externallinks.el_from_namespace on all wmf wikis to bring the schema back to the current tables.sql. Thanks.

Feel free to WONTFIX this, if you think that the column should be part of the schema, because remove would also take cpu time, but it could free some space (for the column and for index).

ALTER to run: ALTER TABLE externallinks DROP COLUMN el_from_namespace, DROP KEY el_backlinks_to;

Progress:

s1

  • eqiad
  • codfw

s2

s3

s4

s5

s6

s7

s8

wikitech

  • eqiad
  • codfw

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Ran into this one. Take for example https://commons.wikimedia.org/wiki/File:Armor_of_Emperor_Ferdinand_I_(1503%E2%80%931564)_MET_DP-12881-008.jpg . It incorrectly returns el_from_namespace=0 instead of 6:

MariaDB [commonswiki_p]> SELECT * FROM externallinks WHERE el_to LIKE 'http://www.metmuseum.org/art/collection/search/%' AND el_from=61213894 LIMIT 1;
+-----------+----------+-------------------+------------------------------------------------------+-------------------------------------------------------+

el_idel_fromel_from_namespaceel_toel_index

+-----------+----------+-------------------+------------------------------------------------------+-------------------------------------------------------+

+-----------+----------+-------------------+------------------------------------------------------+-------------------------------------------------------+
1 row in set (0.00 sec)

jcrespo moved this task from Triage to Backlog on the DBA board.
jcrespo added subscribers: Marostegui, jcrespo.

Removing blocked because as far as I can see, this is not blocking anything or anyone (people should just not use the column until it is deleted). This doesn't mean it shouldn't be done, just that it is not urgent- it is still a DBA task, but of "clean up" level of priority.

The fact that this was deployed to production but not merged reinforces the policy of only deploying already merged patches. CC @Marostegui

This also includes the drop of KEY el_backlinks_to (el_from_namespace,el_to(60),el_from), which is not being used really:

s1:

root@db1089.eqiad.wmnet[sys]> select * from schema_unused_indexes where index_name='el_backlinks_to';
+---------------+---------------+-----------------+
| object_schema | object_name   | index_name      |
+---------------+---------------+-----------------+
| enwiki        | externallinks | el_backlinks_to |
+---------------+---------------+-----------------+
1 row in set (0.06 sec)

s4:

root@db1084.eqiad.wmnet[(none)]> select * from sys.schema_unused_indexes where index_name='el_backlinks_to';
+---------------+---------------+-----------------+
| object_schema | object_name   | index_name      |
+---------------+---------------+-----------------+
| commonswiki   | externallinks | el_backlinks_to |
+---------------+---------------+-----------------+
1 row in set (0.06 sec)

s8:

root@db1087.eqiad.wmnet[(none)]> select * from sys.schema_unused_indexes where index_name='el_backlinks_to';
+---------------+---------------+-----------------+
| object_schema | object_name   | index_name      |
+---------------+---------------+-----------------+
| wikidatawiki  | externallinks | el_backlinks_to |
+---------------+---------------+-----------------+
1 row in set (0.04 sec)

Ideally one slave will have all the stuff dropped and will be left for a few days to make sure no queries have any regression.

I have removed the column from db2075 (codfw - s5) and I am going to leave it like that for a few days, to make sure nothing writes to that column. If that happens replication will break and we'll know.

After that, probably an eqiad host will be altered so we can see if any query shows some regressions.

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore2001
  • dbstore1002
  • db2094
  • db2089
  • db2084
  • db2075 T114117#4483970
  • db2066
  • db2059
  • db2052
  • db2038
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070

I have altered db1100 and I will leave it there and check if there are any regressions on queries as that host receives reads

I wasn't sure why query results looks weird, so I eventually found this. Maybe you should set this number to -1 or something in all wikis for now, I don't know.

I wasn't sure why query results looks weird, so I eventually found this. Maybe younshould set this number to -1 or something in all wikis for now, I don't know.

Can you elaborate a bit more on this? I guess you are talking about the same thing as T114117#3492177 ?
What issue are you seeing?
What do you mean with -1 on all wikis?

I did not work with this table a lot of time, so I run a simple query to remember:

select * from externallinks limit 10

to see the schema.
I saw that el_from_namespace was 0, seems like all 10 examples where articles. So I used this field in a new query, something like

#...
and el_from_namespace in (0, 6, 10, 14, 100, 118)
#...

Run the query, get the results. Everything looks ok. Exported to pagepile, started to work. And only then saw links to talk namespaces. If the value of this field was always -1, and not unfortunately fitting one of actually existing namespaces, I would (1) saw it in the first query and understood it's not working, and (2), the second query was empty in place of bringing wrong results.
Summary: for now, all entries marked as they were in articles namespace, and it's wrong.

Thanks for clarifying it!
So that problem will be gone once this task is completed and the column gets dropped everywhere :-)

Absolutely. But looks like it takes years, so I suggested something simple until then.

Absolutely. But looks like it takes years, so I suggested something simple until then.

We have it started now at least and should be completed in some weeks hopefully :-)
We have so many tasks that it
You approach could work, but we try to avoid updating things directly on the database and we'd prefer to use a script that uses MW framework, to make sure it is consistent and safe, and it is probably not worth the time anymore that we have started and almost dropped it in a whole section (s5).

Thanks again!

Mentioned in SAL (#wikimedia-operations) [2018-08-22T12:15:37Z] <marostegui> Deploy schema change on db1070 (s5 primary master) - T67448 T114117 T51191

s6 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061

s2 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1125
  • db1122
  • db1105
  • db1103
  • db1095
  • db1090
  • db1076
  • db1074
  • db1066

This has broken views from dewiki (as an IRC user noticed this morning). That table is a full view, so when a column drops, the view is broken (and I confirmed the table is the same from s5-master through to the labsdb replicas). So if they will replicate down like that (and any s5 dbs now have the view broken I imagine?), then I can null or drop the column in a custom view (perhaps temporarily), or I can just do a rerun every time a shard is going to replicate the change down.

This has broken views from dewiki (as an IRC user noticed this morning). That table is a full view, so when a column drops, the view is broken (and I confirmed the table is the same from s5-master through to the labsdb replicas). So if they will replicate down like that (and any s5 dbs now have the view broken I imagine?), then I can null or drop the column in a custom view (perhaps temporarily), or I can just do a rerun every time a shard is going to replicate the change down.

So s6 is also probably broken as it was finished a few hours ago. So a re-run there is probably needed.
I would suggest I ping you everytime the change reaches labs and you can do a re-run and don't have to keep subscribed to this task?

That'd be great. I somehow worry that if I change the view, I'll forget about it later, even though it is a more straightforward thing.

Re-running that table on all replicas then.

ok I will ping you every time. I can also narrow it to the wikis affected, so you only have to run it for the given wikis.
Once s3 is reached (around 800 wikis) you can do a full re-run.

Up to you!

s5 - dewiki
s6 - frwiki jawiki ruwiki

A full re-run on just one table is fine when commonswiki isn't locked. rolls dice to see--yup, it is

I have to also do a full re-run on the table to change to a custom view anyway (which will require depooling if things are locked up). At least for now, I don't need to bother with that.

A full re-run on just one table is fine when commonswiki isn't locked. rolls dice to see--yup, it is

I have to also do a full re-run on the table to change to a custom view anyway (which will require depooling if things are locked up). At least for now, I don't need to bother with that.

Sure! You are the expert :)

@Bstorm s2 changes have been replicated to labs - whenever you can, please re-run the views there.
Thanks!

s8 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1124
  • db1109
  • db1104
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071

@Bstorm s8 changes have been replicated to labs - whenever you can, please re-run the views there.
Thanks!

s4 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1125
  • db1121
  • db1103
  • db1102
  • db1097
  • db1091
  • db1084
  • db1081
  • db1068

Mentioned in SAL (#wikimedia-operations) [2018-08-29T05:55:26Z] <marostegui> Deploy schema change on db1073 (labswiki) - T114117 T51191 T67448

s7 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1125
  • db1101
  • db1098
  • db1094
  • db1090
  • db1086
  • db1079
  • db1062

@Bstorm these changes have been replicated to s4 and s7, please re-run the views when you get a chance. Thank you!

s3 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1124
  • db1123
  • db1095
  • db1078
  • db1077
  • db1075

@Bstorm s3 is finished. You can re-run the views.
Only s1 pending which most likely be done in a couple of weeks.

Thanks for the patience!

Mentioned in SAL (#wikimedia-operations) [2018-09-14T07:49:20Z] <marostegui> Deploy schema change on s4 eqiad master (db1068) - T67448 T114117 T51191

Mentioned in SAL (#wikimedia-operations) [2018-09-17T05:28:00Z] <marostegui> Deploy schema change on s1 eqiad master (db1067) - T67448 T114117 T51191

@Bstorm can you run the views again? This is the last time, as s1 was the only pending wiki to alter in eqiad.
Thank you!

Thanks! So if I understand correctly, externallinks isn't going anywhere, correct? Am I safe to start using it again on the replicas?

Yeah, it was just a column drop :-)

Marostegui changed the task status from Open to Stalled.Oct 24 2018, 9:55 AM
Marostegui moved this task from In progress to Pending comment on the DBA board.
Marostegui changed the task status from Stalled to Open.Nov 12 2018, 9:09 AM

Mentioned in SAL (#wikimedia-operations) [2018-11-13T05:39:27Z] <marostegui> Deploy schema change on db2048 (s1 codfw master), this will create lag on s1 codfw - T114117

Marostegui updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2020-04-24T19:41:31Z] <Amir1> applying T114117 on labswiki (wikitech)