Page MenuHomePhabricator

Clean up skin user preferences in user_properties table on Wikimedia wikis
Open, NormalPublic

Description

This bug tracks the request to write and run a maintenance script on Wikimedia wikis to clean up the skin user preferences. Many skins no longer exist. Many users are storing unneeded values, causing the table to be bloated (cf. T54777).

As noted at T47179#531440, Wikipedia:Database reports/User preferences#Skin has been messy since it was created. It'd be great to see it cleaned up.

The database report currently looks something like this:

SkinUsers
MediaWiki:Skinname-7190151
MediaWiki:Skinname-054155
MediaWiki:Skinname-11763
MediaWiki:Skinname-23075
MediaWiki:Skinname-amethyst1292
MediaWiki:Skinname-chick32061
Cologne Blue80642
Modern69010
MonoBook2185224
MediaWiki:Skinname-myskin16285
Nostalgia21256
MediaWiki:Skinname-on16
MediaWiki:Skinname-simple14245
MediaWiki:Skinname-standard34502
Vector166182

Details

Reference
bz52778

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 2:09 AM
bzimport set Reference to bz52778.
bzimport added a subscriber: Unknown Object (MLST).
Reedy added a comment.Aug 13 2013, 4:43 AM

effectively: DELETE FROM user_preferences WHERE up_property = 'skin' AND up_value NOT IN Skin::getSkinNames()

var_dump( Skin::getSkinNames() );

array(4) {

["monobook"]=>
string(8) "MonoBook"
["vector"]=>
string(6) "Vector"
["modern"]=>
string(6) "Modern"
["cologneblue"]=>
string(11) "CologneBlue"

}

But the up_values need lowercasing to match

Wouldn't take too much to write a script to do something like that...

What about the Skinname- (is this just 'no preference'?), Skinname-0, Skinname-1, and Skinname-2? Are the numeric ones interpreted as actual skins? If so, they should be converted over to the names.

demon added a comment.Aug 14 2013, 6:40 PM

(In reply to comment #2)

What about the Skinname- (is this just 'no preference'?), Skinname-0,
Skinname-1, and Skinname-2? Are the numeric ones interpreted as actual
skins?
If so, they should be converted over to the names.

Skinname- was possibly "default" a long time ago. Or a database artifact. In any case, it's invalid and these users are currently getting Vector.

Skinname-[0-2] used to refer to specific skins, they do have a 1:1 mapping. The ones that point to skins that are still around can be converted, others are using Vector and can be removed.

I wrote a script that does all of this...someone should just run it across all wikis :)

Thank you for volunteering, Chad ;).

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

demon added a comment.Aug 14 2013, 8:25 PM

(In reply to comment #4)

Thank you for volunteering, Chad ;).

Get me a window and I'll be more than happy to.

(In reply to comment #5)

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance extension. Here you go: https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.git/HEAD/cleanupSkinPrefs.php

Reedy added a comment.Aug 14 2013, 8:43 PM

(In reply to comment #6)

(In reply to comment #5)

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance
extension. Here you go:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.
git/HEAD/cleanupSkinPrefs.php

That's not batched though...

demon added a comment.Aug 14 2013, 9:05 PM

(In reply to comment #7)

(In reply to comment #6)

(In reply to comment #5)

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance
extension. Here you go:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.
git/HEAD/cleanupSkinPrefs.php

That's not batched though...

Really the only wiki that maybe needs batching is enwiki. The rest have a minuscule number of rows to update.

There's the DELETE part too.. Both should be easily updated to use batched queries

Our delete wrapper in it's current form doesn't take options...

Ideally

			do {
				$res = $dbw->delete(
					'user_properties',
					array(
						'up_property' => 'skin',
						'up_value NOT IN (' . $dbw->makeList( $dontChange ) . ')'
					),
					__METHOD__,
					array( 'LIMIT' => 50 )
				);
				wfWaitForSlaves();
			} while ( $res->numRows() === 50 );

but in the current state... Ugh.

			do {
				$res = $dbw->query(
					"DELETE FROM user_properties WHERE up_property='skin' AND up_value NOT IN ({$dbw->makeList( $dontChange )}) LIMIT 50",
					__METHOD__
				);
				wfWaitForSlaves();
			} while ( $res->numRows() === 50 );

Logging a bug for it!

(In reply to comment #9)

There's the DELETE part too.. Both should be easily updated to use batched
queries
Our delete wrapper in it's current form doesn't take options...

"DELETE ...LIMIT" isn't part of the standard, and it's only supported in MySQL and Sqlite :)

(In reply to comment #6)

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance
extension. Here you go:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.git/HEAD/cleanupSkinPrefs.php

Nice to see that there's a script for this (or a start to one). :-)

It's in the wrong place, though. The most natural place to find such a script would be in the maintenance directory of MediaWiki core (this is where I looked after Matt posted his question in comment 6). Other wikis will want to clean out their skin preferences as well, presumably. There's nothing Wikimedia-specific about this, as far as I can tell, so the WikimediaMaintenance extension is simply the wrong place for this script.

He7d3r updated the task description. (Show Details)Apr 16 2015, 1:36 PM
He7d3r set Security to None.
Elitre added a subscriber: Elitre.Sep 11 2015, 4:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 11 2015, 4:10 PM
Krenair added a subscriber: Krenair.EditedSep 29 2015, 7:49 PM

For enwiki alone:

mysql> select up_value, count(*) as c from user_properties where up_property = 'skin' group by up_value order by c desc;
+-------------+---------+
| up_value    | c       |
+-------------+---------+
|             | 7183420 |
| monobook    | 2182565 |
| vector      |  162135 |
| cologneblue |   83875 |
| modern      |   76036 |
| 0           |   53927 |
| standard    |   34108 |
| chick       |   31920 |
| nostalgia   |   21110 |
| myskin      |   16184 |
| simple      |   14140 |
| 2           |    3060 |
| 1           |    1752 |
| amethyst    |    1291 |
| minerva     |      39 |
| on          |      16 |
+-------------+---------+
16 rows in set (1 min 8.72 sec)
demon added a comment.Sep 29 2015, 8:59 PM

Could we get a report for all wikis? Would help in gauging how much cleanup to do.

Sure, see tin:~krenair/T54778.out. It's all in PHP serialized objects because sql.php.

demon added a comment.Sep 29 2015, 9:38 PM

Cleaned up all small and medium wikis.

greg added a subscriber: greg.Sep 30 2015, 3:54 PM
demon added a comment.Nov 20 2015, 9:10 PM

Propose rejecting this outright.

Propose rejecting this outright.

Doing it would make reporting easier. E.g. https://phabricator.wikimedia.org/T47179#531339 does not group them in a fully readable way.

However, reporting is still possible without this. It would just need some additional work (T47179: Skin and active editor correlation).

ori closed this task as Declined.Nov 20 2015, 10:23 PM
ori claimed this task.
ori added a subscriber: ori.

The Wikimedia Foundation database team is currently operating as a skeleton crew of one full-time DBA. There are 138 open tasks with the #database tag. The previous attempt to make progress on this task resulted in data loss. TL;DR: no.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptNov 20 2015, 10:23 PM
In T54778#1821943, @ori wrote:

The Wikimedia Foundation database team is currently operating as a skeleton crew of one full-time DBA. There are 138 open tasks with the #database tag. The previous attempt to make progress on this task resulted in data loss. TL;DR: no.

I don't see how any of what you write here makes this task declined. Stalled, perhaps. There's still real value in keeping a tidy database (cf. T54777).

demon added a comment.Nov 21 2015, 6:17 AM

I don't see how any of what you write here makes this task declined. Stalled, perhaps. There's still real value in keeping a tidy database (cf. T54777).

Our DBA would disagree. "no it [the cleanup] shouldn't [be done] - dropping rows like this has 0 effect on both disk space and performance due to fragmentation"

So... the disk space concerns on T54777: user_properties table bloat are not a real problem then?

I don't see how any of what you write here makes this task declined. Stalled, perhaps. There's still real value in keeping a tidy database (cf. T54777).

Our DBA would disagree. "no it [the cleanup] shouldn't [be done] - dropping rows like this has 0 effect on both disk space and performance due to fragmentation"

For the purposes of this task, I don't care about disk space or performance. Neither are of those were mentioned when filing this task. The rows in user_properties are queried as part of generating a database report. It's possible that we could put skin usage stats in the site_stats table or something.

MediaWiki isn't the only consumer of user_properties, and even if it were, doing database cleanup has as much merit as cleanup/tidying of anything. There's no rush; it should be possible to do this type of cleanup in a safe and methodical way.

It's fine to decline the task if we think it's too hard for the benefit here and now; we can reopen in a brighter future (perhaps one where Phabricator search works! then we'll query for declined blockers of T18660).

So... the disk space concerns on T54777: user_properties table bloat are not a real problem then?

This is the important part imho. T54777 was used in the past to block development; if that's not a real concern, please decline that one as well so that we can resume improving MediaWiki's feature without needless worries for the database. :)

Niharika reopened this task as Open.May 1 2018, 12:47 AM
Niharika added a project: Wikimedia-Rdbms.
Niharika added a subscriber: Niharika.

Hi. I ran into this today while running queries to generate some metrics for a task. It would be great if we can get these rows cleaned up. Reopening this to get a revaluation from our DBAs.

wikiadmin@db1093(frwiki)>select count(*) as cnt, up_value from user_properties where up_property='skin' group by up_value;
+--------+-------------+
| cnt    | up_value    |
+--------+-------------+
| 441785 |             |
|   2516 | 0           |
|     91 | 1           |
|    167 | 2           |
|     80 | amethyst    |
|    777 | chick       |
|   3337 | cologneblue |
|     51 | minerva     |
|     28 | minervaneue |
|   4483 | modern      |
| 143377 | monobook    |
|    507 | myskin      |
|    791 | nostalgia   |
|    391 | simple      |
|    292 | timeless    |
|  12723 | vector      |
+--------+-------------+

wikiadmin@db1076(zhwiki)>select count(*) as cnt, up_value from user_properties where up_property='skin' group by up_value;
+--------+-------------+
| cnt    | up_value    |
+--------+-------------+
| 449352 |             |
|   7209 | 0           |
|    274 | 1           |
|    731 | 2           |
|     17 | amethyst    |
|    976 | chick       |
|   8492 | cologneblue |
|    160 | minerva     |
|     28 | minervaneue |
|   8002 | modern      |
| 154341 | monobook    |
|    847 | myskin      |
|   2414 | nostalgia   |
|   1169 | simple      |
|    249 | timeless    |
|  14155 | vector      |
+--------+-------------+

wikiadmin@db1080(enwiki)>select count(*) as cnt, up_value from user_properties where up_property='skin' group by up_value;
+---------+-------------+
| cnt     | up_value    |
+---------+-------------+
| 7178562 |             |
|   53957 | 0           |
|    1752 | 1           |
|    3056 | 2           |
|    1290 | amethyst    |
|   31846 | chick       |
|   87450 | cologneblue |
|    1409 | minerva     |
|   81987 | modern      |
| 2190909 | monobook    |
|   16171 | myskin      |
|   21128 | nostalgia   |
|   14097 | simple      |
|    1850 | timeless    |
|  162338 | vector      |
+---------+-------------+
kaldari added a subscriber: kaldari.May 1 2018, 4:28 PM

I agree that this would be good to finally clean-up. If nothing else, it causes lots of confusion for people trying to analyze our preference stats (like Niharika was doing). The only reason this task was abandoned is that the maintenance script was buggy (https://gerrit.wikimedia.org/r/#/c/242374/) and should probably be rewritten from scratch (and at the time we only had 1 DBA to help with the effort).

kaldari removed ori as the assignee of this task.May 1 2018, 4:29 PM