Page MenuHomePhabricator

Archive and drop education program (ep_*) tables on all wikis
Closed, ResolvedPublic

Description

Placeholder task until I find more wikis this applies to. The list so far:

  • arwiki cawiki cswiki dewikiversity elwiki enwiki enwikinews eswiki fawiki frwikisource hewiki hewiktionary itwikiversity lvwiki mkwiki nlwiki ptwiki ruwiki srwiki svwiki ukwiki legalteamwiki
  • Tables to drop
ep_articles
ep_cas
ep_courses
ep_events
ep_instructors
ep_oas
ep_orgs
ep_revisions
ep_students
ep_users_per_course

Cf: T125618: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018

From: T200316
Can we table SQL dumps of the ep_ tables (only course_token seems to be private data we don't expose on labs replicas) on all the wikis that have education program, publish them to somewhere like https://dumps.wikimedia.org/other/wep/ before we clean up and remove the tables from wikis (and then labs replicas)?

Deletion should probably wait a little while, but filing as a cleanup task as part of removing the extension

See also: T200316#4449887

List of databases to get them removed from:

  • s1
  • enwiki
  • s2
  • cswiki
  • nlwiki
  • ptwiki
  • svwiki
  • s3
  • dewikiversity
  • elwiki
  • enwikinews
  • frwikisource
  • hewiktionary
  • itwikiversity
  • legalteamwiki
  • lvwiki
  • mkwiki
  • sewiki
  • s4 - nothing
  • s5
  • srwiki
  • s6
  • ruwiki
  • s7
  • arwiki
  • cawiki
  • eswiki
  • fawiki
  • hewiki
  • ukwiki
  • s8 - nothing

Details

Related Gerrit Patches:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Reedy added a comment.Sep 17 2018, 4:35 PM

I may have time to work on this these week (the dumping) not sure if they will be available shortly after, as not sure what that entails for custom dumps.

Should just be a case of making sure they're named well, gzip (or similar) and then should be fairly easy to get them into the right place for people to access

Should just be a case of making sure they're named well, gzip (or similar)

I can do that.

should be fairly easy to get them into the right place for people to access

I will put it available to other global roots so the right people can do that, will notify ticket here.

Reedy added a comment.Sep 17 2018, 4:39 PM

Thanks! :)

If you want to produce them and put them somewhere, I can grab them, put them in the right directory (other/something) and set up a web page in puppet that describes them, as a one-off; we do this from time to time for various datasets.

jcrespo claimed this task.Sep 18 2018, 11:09 AM
jcrespo moved this task from Next to In progress on the DBA board.
jcrespo added a comment.EditedSep 18 2018, 11:20 AM

List of tables to archive:

root@labsdb1010:~$ mysql -BN --skip-ssl -e "select table_schema, table_name FROM information_schema.tables WHERE table_schema like '%wik%' and table_name like 'ep\_%' and table_schema not like '%\_p'"
arwiki  ep_articles
arwiki  ep_cas
arwiki  ep_courses
arwiki  ep_events
arwiki  ep_instructors
arwiki  ep_oas
arwiki  ep_orgs
arwiki  ep_revisions
arwiki  ep_students
arwiki  ep_users_per_course
cawiki  ep_articles
cawiki  ep_cas
cawiki  ep_courses
cawiki  ep_events
cawiki  ep_instructors
cawiki  ep_oas
cawiki  ep_orgs
cawiki  ep_revisions
cawiki  ep_students
cawiki  ep_users_per_course
cswiki  ep_articles
cswiki  ep_cas
cswiki  ep_courses
cswiki  ep_events
cswiki  ep_instructors
cswiki  ep_oas
cswiki  ep_orgs
cswiki  ep_revisions
cswiki  ep_students
cswiki  ep_users_per_course
dewikiversity   ep_articles
dewikiversity   ep_cas
dewikiversity   ep_courses
dewikiversity   ep_events
dewikiversity   ep_instructors
dewikiversity   ep_oas
dewikiversity   ep_orgs
dewikiversity   ep_revisions
dewikiversity   ep_students
dewikiversity   ep_users_per_course
elwiki  ep_articles
elwiki  ep_cas
elwiki  ep_courses
elwiki  ep_events
elwiki  ep_instructors
elwiki  ep_oas
elwiki  ep_orgs
elwiki  ep_revisions
elwiki  ep_students
elwiki  ep_users_per_course
enwiki  ep_articles
enwiki  ep_cas
enwiki  ep_courses
enwiki  ep_events
enwiki  ep_instructors
enwiki  ep_oas
enwiki  ep_orgs
enwiki  ep_revisions
enwiki  ep_students
enwiki  ep_users_per_course
enwikinews      ep_articles
enwikinews      ep_cas
enwikinews      ep_courses
enwikinews      ep_events
enwikinews      ep_instructors
enwikinews      ep_oas
enwikinews      ep_orgs
enwikinews      ep_revisions
enwikinews      ep_students
enwikinews      ep_users_per_course
eswiki  ep_articles
eswiki  ep_cas
eswiki  ep_courses
eswiki  ep_events
eswiki  ep_instructors
eswiki  ep_oas
eswiki  ep_orgs
eswiki  ep_revisions
eswiki  ep_students
eswiki  ep_users_per_course
fawiki  ep_articles
fawiki  ep_cas
fawiki  ep_courses
fawiki  ep_events
fawiki  ep_instructors
fawiki  ep_oas
fawiki  ep_orgs
fawiki  ep_revisions
fawiki  ep_students
fawiki  ep_users_per_course
frwikisource    ep_articles
frwikisource    ep_cas
frwikisource    ep_courses
frwikisource    ep_events
frwikisource    ep_instructors
frwikisource    ep_oas
frwikisource    ep_orgs
frwikisource    ep_revisions
frwikisource    ep_students
frwikisource    ep_users_per_course
hewiki  ep_articles
hewiki  ep_cas
hewiki  ep_courses
hewiki  ep_events
hewiki  ep_instructors
hewiki  ep_oas
hewiki  ep_orgs
hewiki  ep_revisions
hewiki  ep_students
hewiki  ep_users_per_course
hewiktionary    ep_articles
hewiktionary    ep_cas
hewiktionary    ep_courses
hewiktionary    ep_events
hewiktionary    ep_instructors
hewiktionary    ep_oas
hewiktionary    ep_orgs
hewiktionary    ep_revisions
hewiktionary    ep_students
hewiktionary    ep_users_per_course
itwikiversity   ep_articles
itwikiversity   ep_cas
itwikiversity   ep_courses
itwikiversity   ep_events
itwikiversity   ep_instructors
itwikiversity   ep_oas
itwikiversity   ep_orgs
itwikiversity   ep_revisions
itwikiversity   ep_students
itwikiversity   ep_users_per_course
lvwiki  ep_articles
lvwiki  ep_cas
lvwiki  ep_courses
lvwiki  ep_events
lvwiki  ep_instructors
lvwiki  ep_oas
lvwiki  ep_orgs
lvwiki  ep_revisions
lvwiki  ep_students
lvwiki  ep_users_per_course
mkwiki  ep_articles
mkwiki  ep_cas
mkwiki  ep_courses
mkwiki  ep_events
mkwiki  ep_instructors
mkwiki  ep_oas
mkwiki  ep_orgs
mkwiki  ep_revisions
mkwiki  ep_students
mkwiki  ep_users_per_course
nlwiki  ep_articles
nlwiki  ep_cas
nlwiki  ep_courses
nlwiki  ep_events
nlwiki  ep_instructors
nlwiki  ep_oas
nlwiki  ep_orgs
nlwiki  ep_revisions
nlwiki  ep_students
nlwiki  ep_users_per_course
ptwiki  ep_articles
ptwiki  ep_cas
ptwiki  ep_courses
ptwiki  ep_events
ptwiki  ep_instructors
ptwiki  ep_oas
ptwiki  ep_orgs
ptwiki  ep_revisions
ptwiki  ep_students
ptwiki  ep_users_per_course
ruwiki  ep_articles
ruwiki  ep_cas
ruwiki  ep_courses
ruwiki  ep_events
ruwiki  ep_instructors
ruwiki  ep_oas
ruwiki  ep_orgs
ruwiki  ep_revisions
ruwiki  ep_students
ruwiki  ep_users_per_course
sewiki  ep_articles
sewiki  ep_cas
sewiki  ep_courses
sewiki  ep_events
sewiki  ep_instructors
sewiki  ep_oas
sewiki  ep_orgs
sewiki  ep_revisions
sewiki  ep_students
sewiki  ep_users_per_course
srwiki  ep_articles
srwiki  ep_cas
srwiki  ep_courses
srwiki  ep_events
srwiki  ep_instructors
srwiki  ep_oas
srwiki  ep_orgs
srwiki  ep_revisions
srwiki  ep_students
srwiki  ep_users_per_course
svwiki  ep_articles
svwiki  ep_cas
svwiki  ep_courses
svwiki  ep_events
svwiki  ep_instructors
svwiki  ep_oas
svwiki  ep_orgs
svwiki  ep_revisions
svwiki  ep_students
svwiki  ep_users_per_course
ukwiki  ep_articles
ukwiki  ep_cas
ukwiki  ep_courses
ukwiki  ep_events
ukwiki  ep_instructors
ukwiki  ep_oas
ukwiki  ep_orgs
ukwiki  ep_revisions
ukwiki  ep_students
ukwiki  ep_users_per_course

Because it is almost a random list, this is not trivial- not difficult either, but needs to make sure I don't miss anything.

@Reedy legalteamwiki that you mention on your original task will not be on this list, as these are only public wikis, those should be directly dropped?

Reedy added a comment.Sep 18 2018, 2:16 PM

@Reedy legalteamwiki that you mention on your original task will not be on this list, as these are only public wikis, those should be directly dropped?

Yeah, that was seemingly installed for some testing purposes. But never used. Those can just be dropped.

The tables cannot be dropped from other wikis just yet, as the extension is still enabled (but mostly disabled) on other wikis.

So around the time we archive (sometime this week? we didn't want to disable without giving people the data to access), we should look at turning it off properly on all wikis, then look at undeploying it.

The tables cannot be dropped from other wikis just yet, as the extension is still enabled (but mostly disabled) on other wikis.

Thank you for the clarification, I thought the "deletion will take a while" was outdated, I don't delete anything without proper confirmation, but that makes it more clear.

jcrespo removed jcrespo as the assignee of this task.EditedSep 19 2018, 3:07 PM
jcrespo moved this task from In progress to Done on the DBA board.

So public dumps are ready:

root@labsdb1010:/srv/tmp/ep_dumps$ rm *; mysql -BN --skip-ssl -e "select table_schema, table_name FROM information_schema.tables WHERE table_schema like '%wik%' and table_name like 'ep\_%' and table_schema not like '%\_p'" | while read db table; do echo "Dumping $db.$table..."; mysqldump --skip-ssl --single-transaction $db $table | pv | pigz -c > $db.$table.$(date +%Y%m%d).gz; done
root@labsdb1010:/srv/tmp$ tar -cvf ep_dumps.tar ep_dumps

To examine those, tar xf ep_dumps.tar && zless ep_dumps/arwiki.ep_articles.20180919.gz
To load those into a database, tar xf ep_dumps.tar && zcat ep_dumps/* | mysql database

I (or any other root) can put those anywhere within our infrastructure by running:

root@neodymium:~$ /home/jynus/wmfmariadbpy/wmfmariadbpy/transfer.py labsdb1010.eqiad.wmnet:/srv/tmp/ep_dumps.tar <target_host>:<target_dir>

It would be nice if someone could check one of the databases to make sure nothing was private beyond the token- which shouldn't be an issue, as all these were already public on wikireplicas, but just in case (I checked one of them and saw nothing weird).

(I will come back to this when actual drops can be done.)

CC @ArielGlenn @Reedy

Thanks!

Can someone put them somewhere I can actually get them? As I don't believe I actually have access to labsdb1010 (via ssh anyway) :)

Not overly fussed where, whether it's deploy1001, a bastion, or similar. That way I can get them sanity checked, and then we can get them into place and then move on with getting EP undeployed

The file is at: deploy1001:/home/reedy/ep_dumps.tar

Reedy updated the task description. (Show Details)Sep 25 2018, 12:26 PM

Dumps LGTM

My only question would be about how we package them...

One big tar solves the problem, but chances are they won't want every wikis EP. Individual table files are annoying for downloading

So, I guess a tar.gz for each project/wiki is enough, which I've now done on deploy1001

Change 462691 had a related patch set uploaded (by Reedy; owner: Reedy):
[operations/puppet@production] Add a place to link Education Program extension sql dumps

https://gerrit.wikimedia.org/r/462691

Hey @Bstorm the dumps that @jcrespo did LGTM, but I have repackaged them slightly to make them easier for end users to download

I've added a puppet patch above (462691) to add the link to https://dumps.wikimedia.org/other/ so I just need someone to create an "educationprogram" folder on the labstore hosts in the appropriate place, and then copy deploy1001:/home/reedy/ep_dumps_repackaged.tar over to the labstore host, and extract the contents into the educationprogram folder...

I guess just then some rsyncing (do 1006 and 1007 replicate between themselves?) and then this chapter should be just about done :)

@ArielGlenn, there's a lot going on with the rsyncing on dumps servers. I imagine that there's some kind of notion of what gets synced and to-where somewhere in that mess. I'll take a look around, but if you know off the top of your head how something like this normally would get added, I'm all ears :)

The files are in place now on both servers.

Change 462691 merged by Bstorm:
[operations/puppet@production] Add a place to link Education Program extension sql dumps

https://gerrit.wikimedia.org/r/462691

Change 462726 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] dumps: add initial slash to url path

https://gerrit.wikimedia.org/r/462726

Change 462726 merged by Bstorm:
[operations/puppet@production] dumps: add initial slash to url path

https://gerrit.wikimedia.org/r/462726

Reedy added a comment.Sep 25 2018, 2:54 PM

Looks right to me! Thanks! :)

What is pending here?

  1. Disable the extension everywhere?
  2. Get a list of wikis where it is/was disabled?
  3. Do the proper table DROP?
Reedy added a subscriber: Urbanecm.Sep 28 2018, 5:58 PM

What is pending here?

  1. Disable the extension everywhere?

That's done

  1. Get a list of wikis where it is/was disabled?
arwiki cawiki cswiki dewikiversity elwiki enwiki enwikinews eswiki fawiki frwikisource hewiki hewiktionary itwikiversity lvwiki mkwiki nlwiki ptwiki ruwiki srwiki svwiki ukwiki legalteamwiki
  1. Do the proper table DROP?

Eventually. I'm not sure whether we should hold off on doing this for a little while, as in T188407#4620016 @Urbanecm is using them to do some "exports" to wikitext type pages.

Granted, it wouldn't be too much effort to ask him to import the SQL dumps and do it that way, but is a little more work for him

Urbanecm added a comment.EditedSep 29 2018, 6:01 PM

I think we should wait for some time before dropping the data. I exported the data from EP to wikitext pages on cswiki (as I know the expected page naming there). Maybe other wikis would like to use the same (or similar) solution as well - I don't know. If some wiki wants me to, I'm willing to export EP data to wikitext pages for them as well - but I'm currently not going to ask them about it.

Thanks guys!
Just to clarify, I didn't mean to delete anything just yet, just wanted to know what was pending (specially for the DBA side) :-)
Will not start any drops until you guys are comfortable with it!
Thanks!

greg added a subscriber: greg.Nov 8 2018, 8:24 PM

I think we should wait for some time before dropping the data. I exported the data from EP to wikitext pages on cswiki (as I know the expected page naming there). Maybe other wikis would like to use the same (or similar) solution as well - I don't know. If some wiki wants me to, I'm willing to export EP data to wikitext pages for them as well - but I'm currently not going to ask them about it.

What would the expected timeline be for us to wait on that? eg: what is "some time"? :)

Personally, I have no problem with dropping the data - I have transfered the data for cs.wiki and I don't care much about other wikis. What about sending a notification via User-notice and drop the data after a month since the notice was posted?

What about sending a notification via User-notice and drop the data after a month since the notice was posted?

Agreed

Suggested wording for the notice: "Tables used by nowadays archived extension Education Program will be archived and dropped after a month of this notice. If you want the data to be easily accessible, we recommend dumping them to wikitext pages. ".

greg added a comment.Nov 16 2018, 10:07 PM

So, basically the end of the year now. Thanks @Johan !

Izno added a subscriber: Izno.Nov 20 2018, 1:49 AM
Restricted Application added a subscriber: alanajjar. · View Herald TranscriptNov 20 2018, 8:12 AM
Jony added a subscriber: Jony.Nov 20 2018, 5:23 PM
NickK added a subscriber: NickK.Nov 21 2018, 12:22 AM

Is it there a guide on how to make a wikitext export? I would need it for ukwiki.

Please also note that wikis that used the extension this year might need this data longer, e.g. for annual affiliate reporting. Deletion in December might not be a good idea from this point of view.

Is it there a guide on how to make a wikitext export? I would need it for ukwiki.
Please also note that wikis that used the extension this year might need this data longer, e.g. for annual affiliate reporting. Deletion in December might not be a good idea from this point of view.

I've created https://gist.github.com/urbanecm/8a090da58429b121067bf491d1e9a510 for that. If you want me to dump the data onto regular pages, please let me know, I can do that.

Hey!
So as per https://meta.wikimedia.org/wiki/Tech/News/2018/47 are we good to go now and drop these tables? Realistically I don't think this will happen before February anyways, but, do we (DBAs) have green light to drop them from now on?

greg added a comment.Jan 11 2019, 11:26 PM

Hey!
So as per https://meta.wikimedia.org/wiki/Tech/News/2018/47 are we good to go now and drop these tables? Realistically I don't think this will happen before February anyways, but, do we (DBAs) have green light to drop them from now on?

yup!

Marostegui updated the task description. (Show Details)Jan 18 2019, 10:31 AM
Restricted Application added subscribers: Petar.petkovic, Base. · View Herald TranscriptJan 18 2019, 10:31 AM
Marostegui updated the task description. (Show Details)Jan 18 2019, 10:33 AM
Marostegui moved this task from Backlog to Next on the DBA board.Jan 23 2019, 6:57 AM
greg removed a subscriber: greg.Jan 23 2019, 7:22 AM

Is it there a guide on how to make a wikitext export? I would need it for ukwiki.
Please also note that wikis that used the extension this year might need this data longer, e.g. for annual affiliate reporting. Deletion in December might not be a good idea from this point of view.

I've created https://gist.github.com/urbanecm/8a090da58429b121067bf491d1e9a510 for that. If you want me to dump the data onto regular pages, please let me know, I can do that.

@Urbanecm Can you tell me names of all templates which should be made for this script? So I can do this for srwiki with my bot.

Is it there a guide on how to make a wikitext export? I would need it for ukwiki.
Please also note that wikis that used the extension this year might need this data longer, e.g. for annual affiliate reporting. Deletion in December might not be a good idea from this point of view.

I've created https://gist.github.com/urbanecm/8a090da58429b121067bf491d1e9a510 for that. If you want me to dump the data onto regular pages, please let me know, I can do that.

@Urbanecm Can you tell me names of all templates which should be made for this script? So I can do this for srwiki with my bot.

Hi @Zoranzoki21, of course! There are several templates needed:

Hope this helps.

Mentioned in SAL (#wikimedia-operations) [2019-01-28T20:54:09Z] <Zoranzoki21> Starting wikitext export of content of database for education program on srwiki - T174802 (21:54 UTC+1)

Mentioned in SAL (#wikimedia-operations) [2019-01-28T21:02:23Z] <Zoranzoki21> Done wikitext export of content of database for education program on srwiki - T174802 (duration: 8 minutes)

Marostegui updated the task description. (Show Details)Feb 12 2019, 7:05 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-12T07:09:48Z] <marostegui> Rename ep_* tables on db1089 (s1) - T174802

I have renamed all the tables on enwiki on db1089:

root@db1089.eqiad.wmnet[enwiki]> show tables like 'T174%';
+-----------------------------+
| Tables_in_enwiki (T174%)    |
+-----------------------------+
| T174802_ep_articles         |
| T174802_ep_cas              |
| T174802_ep_courses          |
| T174802_ep_events           |
| T174802_ep_instructors      |
| T174802_ep_oas              |
| T174802_ep_orgs             |
| T174802_ep_revisions        |
| T174802_ep_students         |
| T174802_ep_users_per_course |
+-----------------------------+
10 rows in set (0.00 sec)

I will leave them like that for a few days before starting to drop them everywhere.

In order to revert this:

set session sql_log_bin=0;
rename table T174802_ep_articles to ep_articles;
rename table T174802_ep_cas to ep_cas;
rename table T174802_ep_courses to ep_courses;
rename table T174802_ep_events to ep_events;
rename table T174802_ep_instructors to ep_instructors;
rename table T174802_ep_oas to ep_oas;
rename table T174802_ep_orgs to ep_orgs;
rename table T174802_ep_revisions to ep_revisions;
rename table T174802_ep_students to ep_students;
rename table T174802_ep_users_per_course to ep_users_per_course;
Marostegui moved this task from Next to In progress on the DBA board.
Marostegui updated the task description. (Show Details)Feb 12 2019, 7:16 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-18T08:27:21Z] <marostegui> Drop ep_* tables from s5 (srwiki) - T174802

Marostegui updated the task description. (Show Details)Feb 18 2019, 8:32 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-18T09:28:42Z] <marostegui> Drop ep_* from s6 (ruwiki) - T174802

Marostegui updated the task description. (Show Details)Feb 18 2019, 9:32 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-18T10:40:04Z] <marostegui> Drop tables ep_* from s2 (cswiki nlwiki ptwiki svwiki) T174802

Marostegui updated the task description. (Show Details)Feb 18 2019, 10:50 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-18T13:59:19Z] <marostegui> Drop ep_* tables from s7 - T174802

Marostegui updated the task description. (Show Details)Feb 18 2019, 2:22 PM

Mentioned in SAL (#wikimedia-operations) [2019-02-19T07:21:17Z] <marostegui> Drop ep_* tables on s3 - T174802

Marostegui updated the task description. (Show Details)Feb 19 2019, 7:49 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-19T07:51:40Z] <marostegui> Drop ep_* tables on s1 - T174802

Marostegui updated the task description. (Show Details)Feb 19 2019, 8:06 AM
Marostegui updated the task description. (Show Details)Feb 19 2019, 8:24 AM

Change 491413 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] filtered_tables.txt: Remove the ep_* tables

https://gerrit.wikimedia.org/r/491413

Marostegui updated the task description. (Show Details)Feb 19 2019, 8:34 AM
Marostegui updated the task description. (Show Details)Feb 19 2019, 8:41 AM

Change 491413 merged by Marostegui:
[operations/puppet@production] filtered_tables.txt: Remove the ep_* tables

https://gerrit.wikimedia.org/r/491413

Marostegui closed this task as Resolved.Feb 19 2019, 9:22 AM
Marostegui updated the task description. (Show Details)

This is all done.
The only pending follow up is to remove the views which has its own task T216481: Remove views on ep_* tables on the wikireplicas hosts