Page MenuHomePhabricator

Drop MoodBar tables from all wikis
Closed, ResolvedPublic

Description

Extension has undeployed, tables should eventually be dropped from all wikis

moodbar_feedback
moodbar_feedback_response

Tables dropped from:

  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Reedy created this task.Dec 13 2016, 1:56 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 13 2016, 1:56 AM
Reedy added a comment.Dec 13 2016, 2:01 AM

No rush to remove this one, but it should eventually. Need to check if the data has any use for anyone (Analytics or research, maybe?) before dropping it completely

Marostegui moved this task from Triage to Backlog on the DBA board.Feb 23 2017, 8:46 AM
demon triaged this task as Low priority.May 9 2017, 10:56 PM
demon updated the task description. (Show Details)

No rush to remove this one, but it should eventually. Need to check if the data has any use for anyone (Analytics or research, maybe?) before dropping it completely

Analytics any idea if these are used?

Nuria added a subscriber: Nuria.

Not on our end that we know of. are these tables also going to be deleted from analytics store?

Nuria moved this task from Incoming to Radar on the Analytics board.Jul 6 2017, 4:09 PM

Not on our end that we know of. are these tables also going to be deleted from analytics store?

They do exist on db1047 and dbstore1002 (enwiki db, not on log), so I would treat those like any other instance and I would delete from them too.

Nuria added a comment.Jul 6 2017, 8:19 PM

@maarostegui: then , the best way I can think of to find out whether anyone is using them is to send an e-mail to analytics@ (give people a month respond noting date by which you will delete those if no response is received) and if there are no responses then delete the tables

demon added a subscriber: demon.Jul 6 2017, 8:23 PM

A month?! I can't imagine there's any useful data to be gathered out of this. MoodBar was a complete and absolute failure.

Nuria added a comment.Jul 6 2017, 8:29 PM

@demon : a month is the standard time we give users to reply to usage requests, see some data that used this extension: https://meta.wikimedia.org/wiki/Research:MoodBar/First_month_of_activity

Again, regardless of failure of feature we just want to ping users with an acceptable timeframe

demon added a comment.Jul 6 2017, 8:32 PM

/me shrugs

The thing to consider here is that the people who are likely in the loop about what's happening on the mediawiki dbs are not in the loop about research being done on those clones on dbstore1002 (analytics-store) or vice versa. So I think it's totally fine to delete them from the production boxes whenever folks wish to do that. But we should give some warning to researchers who may be running queries on those tables, or have plans to. I can write to the typical research lists to ask this question if you like, so this task doesn't stall. But if this situation comes up again we should think about a good process.

Nuria added a comment.Jul 6 2017, 10:35 PM

I can write to the typical research lists to ask this question if you like, so this task doesn't stall. But if this situation comes up again we should think about a good process.

Let's assume that everyone interested on data from research list is subscribed to analytics-l@ @Marostegui can send an announcement there and wait for replies. It does not need to be harder than that.
If @Marostegui prefers that we are the ones e-mailing analytics@ that also works

Last research I could find in these tables is from 2012 so 99% chance none cares.

I can write to the typical research lists to ask this question if you like, so this task doesn't stall. But if this situation comes up again we should think about a good process.

Let's assume that everyone interested on data from research list is subscribed to analytics-l@ @Marostegui can send an announcement there and wait for replies. It does not need to be harder than that.
If @Marostegui prefers that we are the ones e-mailing analytics@ that also works

Last research I could find in these tables is from 2012 so 99% chance none cares.

@Nuria that sounds good, we can delete them from production but not from dbstore1002 until the announcement is sent.
I would prefer if you guys could handle the announcement within the research list, if that doesn't cause too much disruption for you guys.

Thanks!

Nuria added a comment.Jul 7 2017, 8:09 PM

@Marostegui e-mail sent cc to you

A month?! I can't imagine there's any useful data to be gathered out of this. MoodBar was a complete and absolute failure.

Looks like @Nuria already explained sufficiently that pinging Analytics-l and waiting a month or so is worth it.
But just to directly address the general fallacy here: The fact that a feature was a failure does not mean at all that the data it produced as an experiment can't be useful. For one, it enables people who are thinking about similar features to verify that it was a failure back then, and potentially also to understand why it was a failure (and help figure out what could be improved to make it work in the future).

demon added a comment.Jul 7 2017, 11:29 PM

I understand that not all things are failures. But whatever, moving on.

To keep things connected: See now also the mailing list discussion.

Been about a month and a half. Bump?

@Nuria can we get rid of these tables finally?

Nuria added a comment.Oct 2 2017, 3:12 PM

Sounds (per e-mail conversation) that reserachers are interested on the data: https://lists.wikimedia.org/pipermail/wiki-research-l/2017-July/005931.html so i do not think we can remove them

Marostegui closed this task as Declined.Oct 2 2017, 3:34 PM

How can something last written in 2013 still be useful?

root@db1052:/srv/sqldata/enwiki# ls -lh moodbar_feedback*.ibd
-rw-rw---- 1 mysql mysql 40M Mar 12  2013 moodbar_feedback.ibd
-rw-rw---- 1 mysql mysql 13M Mar 12  2013 moodbar_feedback_response.ibd

I don't like the idea of deleting it in production and leaving it on dbstore/db1047 servers as if we'd need to reclone any of those, we'd not be able to get them
As they are tiny anyways, let's revisit this in another year or something :-)

Reedy added a comment.Oct 2 2017, 3:40 PM

I don't see why it can't just be exported to an sql dump file, and archived somewhere. Possibly then imported to another db cluster (analytics or something) if someone wants it in future

jcrespo reopened this task as Open.Oct 2 2017, 4:03 PM
jcrespo added a subscriber: jcrespo.

If those tables are not in use in production, those tables have to be dropped from production boxes. Unless someone else wants to become the owner of mediawiki dbs, that is for mediawiki developers and maintainers to decide, and as far as I can say, Reedy, demon and other mediawiki-involved people have expressed the willingness of that to happen. On the maintenance side, both Manuel and myself seem to agree, I would say that, as a personal comment, because they create a huge burden on cleaning and maintenance, as each extra object per wiki is multiplied by 20,000 due to the redundancy provided on production.

If all of that is true, the table(s) must disappear from production.

However, that doesn't mean analytics-interested people should have nothing to say. Production has a different maintenance cycle than analytics/research/etc, and much less burden to maintain. - as long as someone takes care of enforcing the privacy policies of Wikimedia, and they provide resources to maintain them (both human and in servers) "Would it be possible to place historical research datasets like this on another server", I see no problem on keeping those on separate databases (different from the main replication shards). A month since this date should be given for someone in charge to copy them elsewhere and then drop all of them from the main "production" area (*wik* databases). Note that I am assuming the data on those tables is fully public- if it is private, this is a separate topic, as there are rules in place to anonymize and drop old private content that cannot be violated.

Nuria added a comment.Oct 2 2017, 4:39 PM

I don't see why it can't just be exported to an sql dump file, and archived somewhere. Possibly then imported to another db cluster (analytics or something) if someone wants it in future

I think this would work, tables can be removed from production but still present on analytics boxes, but this goes against @Marostegui comment above, correct? I only see two possibilities: 1) do not deleted tables and 2) deleted from prod but maintain on analytics replicas (this second option I think probably requires additional work from our dbas)

Any other options?

See my proposal, it keeps both manuel and you happy (but requires work from those wanting to keep these around, which I think is fair :-P).

Nuria added a comment.Oct 2 2017, 5:17 PM

but requires work from those wanting to keep these around, which I think is fair

@jcrespo from research team?

Nuria added a comment.Oct 2 2017, 5:20 PM

No, ok, you mean, analytics right? To be clear we do not have a use for that data ourselves but I think it should not be deleted if it is of interest for reserarch. Would you be so kind as to outline what do you want us to do?

jcrespo added a comment.EditedOct 2 2017, 5:21 PM

Whoever wants to keep them! I say it is fair because normally when you ask if to keep them around, everybody is for it; if you ask who wants to keep around and take care of archiving it, we may not get so many hands. Herby I announce the deletion, to be performed in a month from now; whoever wants to keep them (no matter the dptmt., organize yourselves to archive them and follow the privacy rules.) 0:-)

Nuria: I say we DBAs will not maintain that; that doesn't mean analytics should, if it is research who wants it.

Nuria added a comment.Oct 2 2017, 5:26 PM

@jcrespo: there is a staging database in the analytics replicas, could those tables be copied there before you delete them for all wikis? That is the best I can think of right now.

Yes, that was actually my implicit suggestion. Other things can be suggested, and we will help, we just need to take them outside the *wik* dbs.

Nuria added a comment.Oct 2 2017, 5:30 PM

@jcrespo ok, i wasn't clear. Then maybe we can put them in a better-names database like "mediawiki-archive"?

I like Jaime's solution :-)
Placing them on the staging database would work pretty well for me too, as we could simple issue the drop on the master and replication will not touch them for the staging database on analytics replicas.

Nuria added a comment.Oct 3 2017, 4:01 PM

@Marostegui: let's put them on a mediawiki-archive database, the staging database (if I am not mistaken) has open permits for everyone to delete /update. If that is possible i think that would be best.

@Marostegui: let's put them on a mediawiki-archive database, the staging database (if I am not mistaken) has open permits for everyone to delete /update. If that is possible i think that would be best.

Just to be clear, you are talking about dbstore1002/db1047?
We also have to keep in mind that there are thousands of tables (two per wiki basically), so we would need to rename them to something like:

wikiname_moodbar_feedback
wikiname_moodbar_feedback_response

Before (or during) the import there.

If dbstore1002 is involved, I would wait until T168303 is in a better state

demon added a comment.Oct 4 2017, 8:25 PM

Just to be clear, you are talking about dbstore1002/db1047?
We also have to keep in mind that there are thousands of tables (two per wiki basically), so we would need to rename them to something like:

wikiname_moodbar_feedback
wikiname_moodbar_feedback_response

Before (or during) the import there.

If dbstore1002 is involved, I would wait until T168303 is in a better state

Could/should we drop the ones that are completely empty already--assuming some wikis never actually used it. Would that make it more manageable?

Nuria added a comment.Oct 4 2017, 9:27 PM

Could/should we drop the ones that are completely empty already--assuming some wikis never actually used it. Would that make it more manageable?

Yes, please. I think that makes loads of sense.

Just to be clear, you are talking about dbstore1002/db1047?
We also have to keep in mind that there are thousands of tables (two per wiki basically), so we would need to rename them to something like:

wikiname_moodbar_feedback
wikiname_moodbar_feedback_response

Before (or during) the import there.

If dbstore1002 is involved, I would wait until T168303 is in a better state

Could/should we drop the ones that are completely empty already--assuming some wikis never actually used it. Would that make it more manageable?

Yes - totally
I didn't check yet how many actually have data there :-)

I have checked the tables across the masters:

s1: has data on enwiki
s2: has data only on:
nlwiki
s3: has data only on:

frwikisource
incubatorwiki
itwikivoyage
sewikimedia
tawiki
testwiki

s4: empty
s5: empty
s6: empty
s7: empty

demon added a comment.Oct 5 2017, 7:48 AM

Testwiki we can drop for sure. So that just leaves 7 total wikis with viable data. Farrrrrrrr better.

Thanks @demon! I will exclude testwiki from the list of wikis the tables need to be imported from

I have backuped the tables on:

root@dbstore1001:/srv/tmp/T153033# pwd
/srv/tmp/T153033
root@dbstore1001:/srv/tmp/T153033# ls -lh
total 16M
-rw-r--r-- 1 root root  14M Oct  6 07:56 moodbar_s1.sql
-rw-r--r-- 1 root root 1.8M Oct  6 07:58 moodbar_s2.sql
-rw-r--r-- 1 root root  90K Oct  6 08:01 moodbar_s3.tar.gz
Marostegui moved this task from Backlog to In progress on the DBA board.

I have imported those tables into db1047 and dbstore1002 with the name of the wiki at the start:

root@EVENTLOGGING m4[staging]> select @@hostname;
+------------+
| @@hostname |
+------------+
| db1047     |
+------------+
1 row in set (0.00 sec)

root@EVENTLOGGING m4[staging]> show tables like '%mood%';
+-----------------------------------------+
| Tables_in_staging (%mood%)              |
+-----------------------------------------+
| enwiki_moodbar_feedback                 |
| enwiki_moodbar_feedback_response        |
| frwikisource_moodbar_feedback           |
| frwikisource_moodbar_feedback_response  |
| incubatorwiki_moodbar_feedback          |
| incubatorwiki_moodbar_feedback_response |
| itwikivoyage_moodbar_feedback           |
| itwikivoyage_moodbar_feedback_response  |
| nlwiki_moodbar_feedback                 |
| nlwiki_moodbar_feedback_response        |
| sewikimedia_moodbar_feedback            |
| sewikimedia_moodbar_feedback_response   |
| tawiki_moodbar_feedback                 |
| tawiki_moodbar_feedback_response        |
+-----------------------------------------+
14 rows in set (0.00 sec)

root@DBSTORE[staging]> select @@hostname;
+-------------+
| @@hostname  |
+-------------+
| dbstore1002 |
+-------------+
1 row in set (0.00 sec)

root@DBSTORE[staging]> show tables like '%mood%';
+-----------------------------------------+
| Tables_in_staging (%mood%)              |
+-----------------------------------------+
| enwiki_moodbar_feedback                 |
| enwiki_moodbar_feedback_response        |
| frwikisource_moodbar_feedback           |
| frwikisource_moodbar_feedback_response  |
| incubatorwiki_moodbar_feedback          |
| incubatorwiki_moodbar_feedback_response |
| itwikivoyage_moodbar_feedback           |
| itwikivoyage_moodbar_feedback_response  |
| nlwiki_moodbar_feedback                 |
| nlwiki_moodbar_feedback_response        |
| sewikimedia_moodbar_feedback            |
| sewikimedia_moodbar_feedback_response   |
| tawiki_moodbar_feedback                 |
| tawiki_moodbar_feedback_response        |
+-----------------------------------------+
14 rows in set (0.00 sec)

I will start dropping the tables in production next week, not going to drop anything on a Friday :-)

Marostegui updated the task description. (Show Details)Oct 9 2017, 6:23 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-09T06:25:25Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s6 - T153033

Marostegui updated the task description. (Show Details)Oct 9 2017, 6:26 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-09T07:33:00Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s2 - T153033

Marostegui updated the task description. (Show Details)Oct 9 2017, 7:51 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-09T08:52:09Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s4 - T153033

Marostegui updated the task description. (Show Details)Oct 9 2017, 8:53 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-09T09:50:39Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s5 - T153033

Marostegui updated the task description. (Show Details)Oct 9 2017, 9:52 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-09T10:21:58Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s7 - T153033

Marostegui updated the task description. (Show Details)Oct 9 2017, 10:32 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-10T06:27:36Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s1 - T153033

Marostegui updated the task description. (Show Details)Oct 10 2017, 6:30 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-10T06:35:25Z] <marostegui> Drop moodbar_feedback and moodbar_feedback_response from s3 - T153033

Marostegui closed this task as Resolved.Oct 11 2017, 6:27 AM
Marostegui updated the task description. (Show Details)
Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:44 AM