Page MenuHomePhabricator

Clean up skin user preferences in user_properties table on Wikimedia wikis
Open, MediumPublic

Description

This bug tracks the request to write and run a maintenance script on Wikimedia wikis to clean up the skin user preferences. Many skins no longer exist. Many users are storing unneeded values, causing the table to be bloated (cf. T54777).

As noted at T47179#531440, Wikipedia:Database reports/User preferences#Skin has been messy since it was created. It'd be great to see it cleaned up.

The database report currently looks something like this:

SkinUsers
MediaWiki:Skinname-7190151
MediaWiki:Skinname-054155
MediaWiki:Skinname-11763
MediaWiki:Skinname-23075
MediaWiki:Skinname-amethyst1292
MediaWiki:Skinname-chick32061
Cologne Blue80642
Modern69010
MonoBook2185224
MediaWiki:Skinname-myskin16285
Nostalgia21256
MediaWiki:Skinname-on16
MediaWiki:Skinname-simple14245
MediaWiki:Skinname-standard34502
Vector166182

Details

Reference
bz52778

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:09 AM
bzimport set Reference to bz52778.
bzimport added a subscriber: Unknown Object (MLST).

effectively: DELETE FROM user_preferences WHERE up_property = 'skin' AND up_value NOT IN Skin::getSkinNames()

> var_dump( Skin::getSkinNames() );
array(4) {
  ["monobook"]=>
  string(8) "MonoBook"
  ["vector"]=>
  string(6) "Vector"
  ["modern"]=>
  string(6) "Modern"
  ["cologneblue"]=>
  string(11) "CologneBlue"
}

But the up_values need lowercasing to match

Wouldn't take too much to write a script to do something like that...

What about the Skinname- (is this just 'no preference'?), Skinname-0, Skinname-1, and Skinname-2? Are the numeric ones interpreted as actual skins? If so, they should be converted over to the names.

(In reply to comment #2)

What about the Skinname- (is this just 'no preference'?), Skinname-0,
Skinname-1, and Skinname-2? Are the numeric ones interpreted as actual
skins?
If so, they should be converted over to the names.

Skinname- was possibly "default" a long time ago. Or a database artifact. In any case, it's invalid and these users are currently getting Vector.

Skinname-[0-2] used to refer to specific skins, they do have a 1:1 mapping. The ones that point to skins that are still around can be converted, others are using Vector and can be removed.

I wrote a script that does all of this...someone should just run it across all wikis :)

Thank you for volunteering, Chad ;).

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

(In reply to comment #4)

Thank you for volunteering, Chad ;).

Get me a window and I'll be more than happy to.

(In reply to comment #5)

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance extension. Here you go: https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.git/HEAD/cleanupSkinPrefs.php

(In reply to comment #6)

(In reply to comment #5)

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance
extension. Here you go:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.
git/HEAD/cleanupSkinPrefs.php

That's not batched though...

(In reply to comment #7)

(In reply to comment #6)

(In reply to comment #5)

(In reply to comment #3)

I wrote a script that does all of this...someone should just run it across
all wikis :)

Can you post or link to this?

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance
extension. Here you go:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.
git/HEAD/cleanupSkinPrefs.php

That's not batched though...

Really the only wiki that maybe needs batching is enwiki. The rest have a minuscule number of rows to update.

There's the DELETE part too.. Both should be easily updated to use batched queries

Our delete wrapper in it's current form doesn't take options...

Ideally

			do {
				$res = $dbw->delete(
					'user_properties',
					array(
						'up_property' => 'skin',
						'up_value NOT IN (' . $dbw->makeList( $dontChange ) . ')'
					),
					__METHOD__,
					array( 'LIMIT' => 50 )
				);
				wfWaitForSlaves();
			} while ( $res->numRows() === 50 );

but in the current state... Ugh.

			do {
				$res = $dbw->query(
					"DELETE FROM user_properties WHERE up_property='skin' AND up_value NOT IN ({$dbw->makeList( $dontChange )}) LIMIT 50",
					__METHOD__
				);
				wfWaitForSlaves();
			} while ( $res->numRows() === 50 );

Logging a bug for it!

(In reply to comment #9)

There's the DELETE part too.. Both should be easily updated to use batched
queries

Our delete wrapper in it's current form doesn't take options...

"DELETE ...LIMIT" isn't part of the standard, and it's only supported in MySQL and Sqlite :)

(In reply to comment #6)

That would be the "cleanupSkinPrefs" script in the WikimediaMaintenance
extension. Here you go:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikimediaMaintenance.git/HEAD/cleanupSkinPrefs.php

Nice to see that there's a script for this (or a start to one). :-)

It's in the wrong place, though. The most natural place to find such a script would be in the maintenance directory of MediaWiki core (this is where I looked after Matt posted his question in comment 6). Other wikis will want to clean out their skin preferences as well, presumably. There's nothing Wikimedia-specific about this, as far as I can tell, so the WikimediaMaintenance extension is simply the wrong place for this script.

He7d3r set Security to None.

For enwiki alone:

mysql> select up_value, count(*) as c from user_properties where up_property = 'skin' group by up_value order by c desc;
+-------------+---------+
| up_value    | c       |
+-------------+---------+
|             | 7183420 |
| monobook    | 2182565 |
| vector      |  162135 |
| cologneblue |   83875 |
| modern      |   76036 |
| 0           |   53927 |
| standard    |   34108 |
| chick       |   31920 |
| nostalgia   |   21110 |
| myskin      |   16184 |
| simple      |   14140 |
| 2           |    3060 |
| 1           |    1752 |
| amethyst    |    1291 |
| minerva     |      39 |
| on          |      16 |
+-------------+---------+
16 rows in set (1 min 8.72 sec)

Could we get a report for all wikis? Would help in gauging how much cleanup to do.

Sure, see tin:~krenair/T54778.out. It's all in PHP serialized objects because sql.php.

Cleaned up all small and medium wikis.

Propose rejecting this outright.

Propose rejecting this outright.

Doing it would make reporting easier. E.g. https://phabricator.wikimedia.org/T47179#531339 does not group them in a fully readable way.

However, reporting is still possible without this. It would just need some additional work (T47179: Skin and active editor correlation).

ori claimed this task.
ori subscribed.

The Wikimedia Foundation database team is currently operating as a skeleton crew of one full-time DBA. There are 138 open tasks with the #database tag. The previous attempt to make progress on this task resulted in data loss. TL;DR: no.

In T54778#1821943, @ori wrote:

The Wikimedia Foundation database team is currently operating as a skeleton crew of one full-time DBA. There are 138 open tasks with the #database tag. The previous attempt to make progress on this task resulted in data loss. TL;DR: no.

I don't see how any of what you write here makes this task declined. Stalled, perhaps. There's still real value in keeping a tidy database (cf. T54777).

I don't see how any of what you write here makes this task declined. Stalled, perhaps. There's still real value in keeping a tidy database (cf. T54777).

Our DBA would disagree. "no it [the cleanup] shouldn't [be done] - dropping rows like this has 0 effect on both disk space and performance due to fragmentation"

So... the disk space concerns on T54777: user_properties table bloat are not a real problem then?

I don't see how any of what you write here makes this task declined. Stalled, perhaps. There's still real value in keeping a tidy database (cf. T54777).

Our DBA would disagree. "no it [the cleanup] shouldn't [be done] - dropping rows like this has 0 effect on both disk space and performance due to fragmentation"

For the purposes of this task, I don't care about disk space or performance. Neither are of those were mentioned when filing this task. The rows in user_properties are queried as part of generating a database report. It's possible that we could put skin usage stats in the site_stats table or something.

MediaWiki isn't the only consumer of user_properties, and even if it were, doing database cleanup has as much merit as cleanup/tidying of anything. There's no rush; it should be possible to do this type of cleanup in a safe and methodical way.

It's fine to decline the task if we think it's too hard for the benefit here and now; we can reopen in a brighter future (perhaps one where Phabricator search works! then we'll query for declined blockers of T18660).

So... the disk space concerns on T54777: user_properties table bloat are not a real problem then?

This is the important part imho. T54777 was used in the past to block development; if that's not a real concern, please decline that one as well so that we can resume improving MediaWiki's feature without needless worries for the database. :)

Niharika added a project: MediaWiki-libs-Rdbms.
Niharika subscribed.

Hi. I ran into this today while running queries to generate some metrics for a task. It would be great if we can get these rows cleaned up. Reopening this to get a revaluation from our DBAs.

wikiadmin@db1093(frwiki)>select count(*) as cnt, up_value from user_properties where up_property='skin' group by up_value;
+--------+-------------+
| cnt    | up_value    |
+--------+-------------+
| 441785 |             |
|   2516 | 0           |
|     91 | 1           |
|    167 | 2           |
|     80 | amethyst    |
|    777 | chick       |
|   3337 | cologneblue |
|     51 | minerva     |
|     28 | minervaneue |
|   4483 | modern      |
| 143377 | monobook    |
|    507 | myskin      |
|    791 | nostalgia   |
|    391 | simple      |
|    292 | timeless    |
|  12723 | vector      |
+--------+-------------+

wikiadmin@db1076(zhwiki)>select count(*) as cnt, up_value from user_properties where up_property='skin' group by up_value;
+--------+-------------+
| cnt    | up_value    |
+--------+-------------+
| 449352 |             |
|   7209 | 0           |
|    274 | 1           |
|    731 | 2           |
|     17 | amethyst    |
|    976 | chick       |
|   8492 | cologneblue |
|    160 | minerva     |
|     28 | minervaneue |
|   8002 | modern      |
| 154341 | monobook    |
|    847 | myskin      |
|   2414 | nostalgia   |
|   1169 | simple      |
|    249 | timeless    |
|  14155 | vector      |
+--------+-------------+

wikiadmin@db1080(enwiki)>select count(*) as cnt, up_value from user_properties where up_property='skin' group by up_value;
+---------+-------------+
| cnt     | up_value    |
+---------+-------------+
| 7178562 |             |
|   53957 | 0           |
|    1752 | 1           |
|    3056 | 2           |
|    1290 | amethyst    |
|   31846 | chick       |
|   87450 | cologneblue |
|    1409 | minerva     |
|   81987 | modern      |
| 2190909 | monobook    |
|   16171 | myskin      |
|   21128 | nostalgia   |
|   14097 | simple      |
|    1850 | timeless    |
|  162338 | vector      |
+---------+-------------+

I agree that this would be good to finally clean-up. If nothing else, it causes lots of confusion for people trying to analyze our preference stats (like Niharika was doing). The only reason this task was abandoned is that the maintenance script was buggy (https://gerrit.wikimedia.org/r/#/c/242374/) and should probably be rewritten from scratch (and at the time we only had 1 DBA to help with the effort).

kaldari removed ori as the assignee of this task.May 1 2018, 4:29 PM
DannyS712 removed a project: User-DannyS712.
DannyS712 subscribed.

Unlicking this cookie

What about just a general remove deprecated user preference script that can remove entries for a specified preference with the specified value, and manually running it for each skin that doesn't exist anymore?

What about just a general remove deprecated user preference script that can remove entries for a specified preference with the specified value, and manually running it for each skin that doesn't exist anymore?

Increasing the amount of human input increases the chance of error.

If we're going to do it, I guess we need a list of known bad/invalid values (reviewed and confirmed), and a mapping of what they want replacing with, or if they should just be removed outright because they fallback to the defaults

I want to take a look at this but I'm still a bit confused about T114208. Was it because it was accidentally removing monobook as skin values too (why?) or something more complex was in play. Does anyone remember more details of that?

I want to take a look at this but I'm still a bit confused about T114208. Was it because it was accidentally removing monobook as skin values too (why?) or something more complex was in play. Does anyone remember more details of that?

IIRC, the script didn't cope with all of the ludicrous multiple different ways we store the skin preference. The original attempt to fix the script lists some improvements, before it was just deleted instead.