updateCollation.php script prohibitively slow for very large wikis
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	matmarex
	Oct 23 2013, 10:57 AM

Description

updateCollation.php script is awfully slow. It took over a week for
fr.wp (T56680), it'll probably take months if we ever decide to run
it on en.wp. That kinda sucks.

I'm not sure what can be done, or if it's just a problem on WMF configuration,
or what, so I'm just filing this and asking for comments. Please resolve
as INVALID if we in fact can't do anything about this.

Possible causes:

The workaround from T47970 which makes it use an index that's not entirely perfect for the task (but likely good enough, no idea how much slower that makes the script).
Slave synchronisation, in which case maybe we can do something with ops involvement? Don't ask me.
There's just too much data being sent back-and-forth between PHP and the database, in which case we can't do anything (unless we implement collating entirely database-side, which I've been told is a bad idea).

I'm CC-ing competent people. Help?

Details

Reference: bz56041

Subject	Repo	Branch	Lines +/-
use slave for row estimate in updateCollation.php	mediawiki/core	wmf/1.27.0-wmf.23	+4 -2
Make updateCollation wait for slaves every 500 (instead of 2000)	mediawiki/core	wmf/1.27.0-wmf.23	+1 -1
Use STRAIGHT_JOIN on updateCollation.php per jcrespo	mediawiki/core	wmf/1.27.0-wmf.23	+1 -0
use slave for row estimate in updateCollation.php	mediawiki/core	wmf/1.28.0-wmf.1	+4 -2
Make updateCollation wait for slaves every 500 (instead of 2000)	mediawiki/core	wmf/1.28.0-wmf.1	+1 -1
Use STRAIGHT_JOIN on updateCollation.php per jcrespo	mediawiki/core	master	+1 -0
Use STRAIGHT_JOIN on updateCollation.php per jcrespo	mediawiki/core	wmf/1.28.0-wmf.1	+1 -0
use slave for row estimate in updateCollation.php	mediawiki/core	master	+4 -2
Make updateCollation wait for slaves every 500 (instead of 2000)	mediawiki/core	master	+1 -1
Add new index to make updateCollation.php painless	mediawiki/core	master	+54 -13
updateCollation.php: Switch back to using cl_from index for now	mediawiki/core	master	+4 -4

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		hashar	T119138 [keyresult] Migrate majority of CI jobs to Nodepool (part 2)
Resolved		hashar	T119139 [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool
Resolved		Joe	T125821 Provide a HHVM package for jessie-wikimedia matching version of trusty-wikimedia
Resolved		Legoktm	T75901 Drop PHP 5.3 support
Declined		• demon	T91590 [Spike] Try out hack (<?hh) for mediawiki-config
Resolved		Joe	T104147 can we get rid of rsvg security patch?
Resolved		Reedy	T94149 Get rid of Zend 5.5 tests for wmf branches
Resolved		None	T86081 Complete the use of HHVM over Zend PHP on the Wikimedia cluster
Open		None	T32996 Change $wgCategoryCollation values to appropriate one for each Wikimedia wiki
Open	Feature	None	T47443 Deploy language-specific "uca-xx" collations on Wikimedia wikis
Resolved		tomasz	T90689 Set $wgCategoryCollation to 'uca-hsb' on Upper Sorbian Wikipedia (hsb.wp) and rebuild category sort keys
Resolved		kaldari	T128483 Fix category headers for pages that begin with numbers
Resolved		kaldari	T8948 Natural number sorting in category listings
Declined		None	T143669 Sort Umlauts correctly (#17)
Declined		None	T128806 Switch German Wikipedia to uca-de category collation
Resolved		None	T88088 Incorrect sorting in categories on Russian-language projects
Resolved		Joe	T129411 Run `php maintenance/updateCollation.php --force` on all Russian-language projects using uca-ru collation
Resolved		None	T131748 Refresh the appservers puppet code/configs
Resolved		Joe	T131749 Make all role::mediawiki::* classes compatible with debian jessie
Resolved		None	T136281 Broken sorting and multi-page categories for Cyrillic wikis
Resolved		Joe	T86096 Switch HAT appservers to trusty's ICU (or newer)
Resolved		kaldari	T58041 updateCollation.php script prohibitively slow for very large wikis
Resolved		jcrespo	T130692 Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases
Resolved		Volans	T128353 Switchover to new s3 master
Resolved	PRODUCTION ERROR	aaron	T126436 Spikes of mediawiki in read only for job runners after altering the s2 slaves topology
Invalid		None	T126632 Scap should restart job runners to pick up new config

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

• DannyH mentioned this in T130337: Identify wikis to test numerical sorting.Mar 18 2016, 12:45 AM

• DannyH edited projects, added Community-Tech-Sprint; removed Community-Tech.Mar 22 2016, 5:32 PM

Change 272416 merged by jenkins-bot:
Add new index to make updateCollation.php painless

https://gerrit.wikimedia.org/r/272416

In T58041#2102116, @jcrespo wrote:

The patch should be approved and merged first, then I would apply the index live to the databases (on a different task, which may take some time), then we would run the script.

In T58041#2143250, @gerritbot wrote:

Change 272416 merged by jenkins-bot:
Add new index to make updateCollation.php painless

https://gerrit.wikimedia.org/r/272416

Thanks Jaime, the patch is merged now. I filed T130692 about adding the new indexes.

ReleaseTaggerBot added a project: MW-1.27-release (WMF-deploy-2016-03-29_(1.27.0-wmf.19)).Mar 23 2016, 7:02 PM

jcrespo changed the status of subtask T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases from Open to Stalled.Mar 29 2016, 7:11 PM

Testwiki has been updated with the new index. We should run updateCollation there and see how fast it is.

In T58041#2158643, @kaldari wrote:

Testwiki has been updated with the new index. We should run updateCollation there and see how fast it is.

It's probably not representative with the amount of categorylinks rows it has in comparison to other big wikis

Yeah, looks like testwiki only has 40K rows.

jcrespo changed the status of subtask T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases from Stalled to Open.May 3 2016, 10:03 AM

jcrespo closed subtask T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases as Resolved.May 3 2016, 1:31 PM

I am no longer a blocker for large-scale testing. Thanks to the new mariadb10 servers, this change is now easy (easy =/= fast, this may take days to apply to all wikis when that is required).

But you should be able to at least check its correctness. If I am around, please ping me for the maintenance so I can evaluate potential lag issues.

Sorry, resolved by mistake.

matej_suchanek updated the task description. (Show Details)May 3 2016, 2:03 PM

The following queries were found as slow (>~10 seconds to execute) on s3 master:

Hits 	Tmax 	Tavg 	Tsum 	Hosts 	Users 	Schemas
546	28	21	11,687	db1075	wikiadmin	ruwiktionary
SELECT /* UpdateCollation::execute */ cl_from, cl_to, cl_sortkey_prefix, cl_collation, cl_sortkey, cl_type+0 AS "cl_type_numeric", page_namespace, page_title FROM `categorylinks`, `page` WHERE (cl_collation > 'uca-ru' OR (cl_collation = 'uca-ru' AND cl_to > 'Nomina_feminina') OR (cl_collation = 'uca-ru' AND cl_to = 'Nomina_feminina' AND cl_type > 1) OR (cl_collation = 'uca-ru' AND cl_to = 'Nomina_feminina' AND cl_type = 1 AND cl_from > '180587')) AND (cl_from = page_id) ORDER BY cl_collation, cl_to, cl_type, cl_from LIMIT 100 /* 41bc78be50cf89cb95f818d642787e15 db1075 ruwiktionary 19s */
1	26	26	26	db1075	wikiadmin	ruwiktionary
SELECT /* UpdateCollation::execute */ cl_from, cl_to, cl_sortkey_prefix, cl_collation, cl_sortkey, cl_type+0 AS "cl_type_numeric", page_namespace, page_title FROM `categorylinks`, `page` WHERE (cl_from = page_id) ORDER BY cl_collation, cl_to, cl_type, cl_from LIMIT 100 /* 3f310ddb21c2f635a6fd8c551c76f3ee db1075 ruwiktionary 26s */

The new index is not being used:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: index
possible_keys: PRIMARY
          key: name_title
      key_len: 261
          ref: NULL
         rows: 1100761
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: categorylinks
         type: ref
possible_keys: cl_from,cl_timestamp,cl_sortkey,cl_collation_ext
          key: cl_from
      key_len: 4
          ref: ruwiktionary.page.page_id
         rows: 4
        Extra: Using where
2 rows in set (0.00 sec)

Doing this query, however, takes 0 seconds:

SELECT STRAIGHT_JOIN /* UpdateCollation::execute */ cl_from, cl_to, cl_sortkey_prefix, cl_collation, cl_sortkey, cl_type+0 AS "cl_type_numeric", page_namespace, page_title FROM `categorylinks` JOIN `page` WHERE (cl_collation > 'uca-ru' OR (cl_collation = 'uca-ru' AND cl_to > 'Nomina_feminina/ru') OR (cl_collation = 'uca-ru' AND cl_to = 'Nomina_feminina/ru' AND cl_type > 1) OR (cl_collation = 'uca-ru' AND cl_to = 'Nomina_feminina/ru' AND cl_type = 1 AND cl_from > '186127')) AND (cl_from = page_id) ORDER BY cl_collation, cl_to, cl_type, cl_from LIMIT 100;

MariaDB  db1075 ruwiktionary > EXPLAIN SELECT STRAIGHT_JOIN /* UpdateCollation::execute */ cl_from, cl_to, cl_sortkey_prefix, cl_collation, cl_sortkey, cl_type+0 AS "cl_type_numeric", page_namespace, page_title FROM `categorylinks` JOIN `page` WHERE (cl_collation > 'uca-ru' OR (cl_collation = 'uca-ru' AND cl_to > 'Nomina_feminina/ru') OR (cl_collation = 'uca-ru' AND cl_to = 'Nomina_feminina/ru' AND cl_type > 1) OR (cl_collation = 'uca-ru' AND cl_to = 'Nomina_feminina/ru' AND cl_type = 1 AND cl_from > '186127')) AND (cl_from = page_id) ORDER BY cl_collation, cl_to, cl_type, cl_from LIMIT 100\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: categorylinks
         type: range
possible_keys: cl_from,cl_timestamp,cl_sortkey,cl_collation_ext
          key: cl_collation_ext
      key_len: 291
          ref: NULL
         rows: 4992443
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: ruwiktionary.categorylinks.cl_from
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

We are literally talking more than 2600x faster. Please make sure you throttle updates at that speed (I saw lag happening even at slow speeds), otherwise, I will have to abort the maintenance. Also, should not you "SELECT FOR UPDATE", to prevent the selected rows from being deleted before you edit them?

Is anyone looking into @jcrespo's reccomendations? We will need to run this script on all wikis sooner later than later as we're still running on an unmaintained (upstream) version of libicu on all of our appservers, and we're waiting to do that since a long time (see T86096).

Joe added a parent task: T86096: Switch HAT appservers to trusty's ICU (or newer).May 5 2016, 7:51 AM

@jcrespo to clarify, you are recomending we add straight join to the query always?

Please make sure you throttle updates at that speed

The script calls wfWaitForSlaves() regularly. Are you saying it should do so more often? Or should we have additional throttling beyond waiting for slaves.

SELECT FOR UPDATE

is presumably not being used to avoid locking things. Very minor ref integrity issues are in this case considered acceptable since the data being updated all functionally depends on other data (so not canonical). If you advise me that there would be no performance/lock contention concern with select for update, we can add that.

In T58041#2266715, @Joe wrote:

Is anyone looking into @jcrespo's reccomendations? We will need to run this script on all wikis sooner later than later as we're still running on an unmaintained (upstream) version of libicu on all of our appservers, and we're waiting to do that since a long time (see T86096).

Presumably that is me and matmarex (with my volunteer hat. I assume Matmarex is also with his volunteer hat). Comm tech is also sort of interested in this feature, but i dont think they are taking responsibility for it. Personally the amount of time i have to devote to this is rather limitted right now (yadda yadda... why is there no mw core team..)

Yes, Kaldari is planning to work on this. He's away for a week, but I think he'll pick this up late next week.

The script calls wfWaitForSlaves() regularly. Are you saying it should do so more often? Or should we have additional throttling beyond waiting for slaves.

I am saying that, despite that, there was a 10-second lag at the start of the process. Why, I do not know, but if it is a one-time-thing I suppose it is ok, if the rest of the process creates no lag.

Very minor ref integrity issues are in this case considered acceptable

If the potential isses are known, and done on purpose, then it is ok.

you are recomending we add straight join to the query always

It is a workaround (index hints are evil for many reasons), but if that allows faster execution, so be it.

Change 288032 had a related patch set uploaded (by Brian Wolff):
Use STRAIGHT_JOIN on updateCollation.php per jcrespo

https://gerrit.wikimedia.org/r/288032

Change 288036 had a related patch set uploaded (by Brian Wolff):
use slave for row estimate in updateCollation.php

https://gerrit.wikimedia.org/r/288036

I am saying that, despite that, there was a 10-second lag at the start of the process. Why, I do not know, but if it is a one-time-thing I suppose it is ok, if the rest of the process creates no lag.

I took another look at the script, it seems to do a count(*) right at the beginning to generate a progress bar. I'm going to guess that's the culprit.

Very minor ref integrity issues are in this case considered acceptable

If the potential isses are known, and done on purpose, then it is ok.

OTOH, its entirely possibly the code was originally written that way in error. All things being equal, it would certainly be better to use SELECT FOR UPDATE if that won't cause any problems. Its just not horrible if minor ref integ issues happen.

It is a workaround (index hints are evil for many reasons), but if that allows faster execution, so be it.

On my local test wiki I would have to actually use FORCE_INDEX to make it use the index in all cases (Sometimes it seems to decide that its cheaper to do a full table scan and filesort the results). I'm assuming that's due to my local wiki having a) a tiny categorylinks table, b) having a really old version of mysql. I assume that doing something like FORCE_INDEX is not needed here.

Anyhow, see the above two patches.

matmarex removed a project: Patch-For-Review.May 11 2016, 4:49 PM

Change 288032 merged by jenkins-bot:
Use STRAIGHT_JOIN on updateCollation.php per jcrespo

https://gerrit.wikimedia.org/r/288032

Change 288036 merged by jenkins-bot:
use slave for row estimate in updateCollation.php

https://gerrit.wikimedia.org/r/288036

• Mattflaschen-WMF added projects: I18n, MediaWiki-Internationalization.May 11 2016, 5:41 PM

• Mattflaschen-WMF added a project: Language-Team.

Language-related folks: This was brought up in today's Scrum of Scrums by Operations. Help if you can, please (they were unsure of who to ask specifically).

In T58041#2286095, @greg wrote:

Language-related folks: This was brought up in today's Scrum of Scrums by Operations. Help if you can, please (they were unsure of who to ask specifically).

What remains to be done? Since the patches were merged this morning, at this stage the next step would be to test again which falls on operations shoulders.

ReleaseTaggerBot added projects: MW-1.28-release-notes, MW-1.28-release (WMF-deploy-2016-05-17_(1.28.0-wmf.2)).May 11 2016, 6:00 PM

@greg To me this looks like database usage optimization related stuff which isn't our area of expertise.

In T58041#2286148, @Bawolff wrote:

In T58041#2286095, @greg wrote:

Language-related folks: This was brought up in today's Scrum of Scrums by Operations. Help if you can, please (they were unsure of who to ask specifically).

What remains to be done? Since the patches were merged this morning, at this stage the next step would be to test again which falls on operations shoulders.

I am happy to help, since I'm the person who is working on what is blocked by this; how do you suggest we should test the script? I am pretty sure it's safe to run on any wiki without causing disruption if I have to stop it abruplty; is that correct?

@Joe Yeah. I suggest trying ruwikisource, ruwiktionary and ruwiki per T129411, which need this script to be ran anyway.

(The recently merged patches would have to be deployed first if you want to do this soon.)

In T58041#2288418, @Joe wrote:

In T58041#2286148, @Bawolff wrote:

In T58041#2286095, @greg wrote:

Language-related folks: This was brought up in today's Scrum of Scrums by Operations. Help if you can, please (they were unsure of who to ask specifically).

What remains to be done? Since the patches were merged this morning, at this stage the next step would be to test again which falls on operations shoulders.

I am happy to help, since I'm the person who is working on what is blocked by this; how do you suggest we should test the script? I am pretty sure it's safe to run on any wiki without causing disruption if I have to stop it abruplty; is that correct?

Yes that's correct, script can be killed at any point and it will not cause problems.

Basically, we just need to verify the current version uses the correct cl_collation_ext index (Note that current version = just committed yesterday, so not yet in the currently deployed wmf branch). jcrespo was previously using the script on ruwiktionary when he encountered issues, so that would be the ideal project to test on.

Just run updateCollation.php --force on ruwiktionary to verify that:

Slave lag does not spike
Things don't seem super slow (e.g. It shouldn't be more then a second between the script outputting Selecting next 100 rows... and processing...)

When I was reading the script earlier, I misread the code and thought it was waiting for slaves every 20 UPDATEs. Its actually doing it every 2000 UPDATEs (r97146). So if its causing slave lag we might need to increase the frequency of the wait for slaves call.

jcrespo was previously using the script on ruwiktionary

I've never run this script anywhere, you probably are confusing me with someone else?

Change 288364 had a related patch set uploaded (by Brian Wolff):
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288364

gerritbot added a project: Patch-For-Review.May 12 2016, 10:34 AM

In T58041#2288559, @jcrespo wrote:

jcrespo was previously using the script on ruwiktionary

I've never run this script anywhere, you probably are confusing me with someone else?

My apologies. I misinterpreted T58041#2262583 as saying you were running it, but T129411 clarifies that it was rkaldari who was running it, and you were monitoring.

In T58041#2288582, @gerritbot wrote:

Change 288364 had a related patch set uploaded (by Brian Wolff):
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288364

So sorry, looks like I made a real mess of this bug...

Anyways @jcrespo indicated last time around, that the process was causing some slave lag. I originally thought that didn't make sense because when I read the script, i thought it was wfWaitForSlave() every 20 writes, which seemed extremely conservative. However, I fail at reading, and It was actually doing every 2000, which actually seems kind of high. Hence yet another patch to the script, in order to bring that down to every 500 writes. (From what I understand, slave lag can also be caused by filesorting large tables, which was happening due to mysql not using the index, hence maybe it wasn't related to the number of updates. I don't really know enough about mysql replication to say, but that patch definitely shouldn't hurt).

So one more patch before being ready to test again.

@Bawolff thanks for the work on this; when you're done with the last patch I'd say we should cherry-pick changes to the current wmf branch and go on with SWATTING them: the script in its current form is broken and it needs fixing.

Reads on the master cannot cause lag. Large updates, where large means "slow", even if only a few records are updated, can; also those that generate too many writes at once. I am not really concerned at all by processes that take days to apply- I do those every day, but they must minimize disruption to user's writes. Lag is impossible to avoid- it exists by definition of having asynchronous slaves- but we should check for it every few steps (1000 updates or 1 second is the typical answers I give when asked how frequently it should be checked).

I think the index created is a good step towards also making things faster, and that is why I am giving all support I can with the schema changes and recommendations, but I an not very involved with mediawiki code itself. Fut feel free to ask and send reviews. I also offered to be present during the run of the script, but those having an NDA should be familiar with tools such as https://tendril.wikimedia.org/chart?hosts=db1047&vars=seconds_behind_master&mode=value and https://tendril.wikimedia.org/report/slow_queries?host=^db1057&user=wikiuser&schema=wik&qmode=eq&query=&hours=1 . I know this tool is awkward, but I am working on making (some of it) it available on graphite soon.

Change 288364 merged by jenkins-bot:
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288364

Change 288384 had a related patch set uploaded (by Brian Wolff):
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288384

Change 288385 had a related patch set uploaded (by Brian Wolff):
Use STRAIGHT_JOIN on updateCollation.php per jcrespo

https://gerrit.wikimedia.org/r/288385

Change 288386 had a related patch set uploaded (by Brian Wolff):
use slave for row estimate in updateCollation.php

https://gerrit.wikimedia.org/r/288386

Change 288385 merged by jenkins-bot:
Use STRAIGHT_JOIN on updateCollation.php per jcrespo

https://gerrit.wikimedia.org/r/288385

Change 288386 merged by jenkins-bot:
use slave for row estimate in updateCollation.php

https://gerrit.wikimedia.org/r/288386

Change 288384 merged by jenkins-bot:
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288384

Change 288411 had a related patch set uploaded (by Brian Wolff):
Use STRAIGHT_JOIN on updateCollation.php per jcrespo

https://gerrit.wikimedia.org/r/288411

Change 288412 had a related patch set uploaded (by Brian Wolff):
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288412

Change 288413 had a related patch set uploaded (by Brian Wolff):
use slave for row estimate in updateCollation.php

https://gerrit.wikimedia.org/r/288413

Change 288411 merged by jenkins-bot:
Use STRAIGHT_JOIN on updateCollation.php per jcrespo

https://gerrit.wikimedia.org/r/288411

Change 288412 merged by jenkins-bot:
Make updateCollation wait for slaves every 500 (instead of 2000)

https://gerrit.wikimedia.org/r/288412

Change 288413 merged by jenkins-bot:
use slave for row estimate in updateCollation.php

https://gerrit.wikimedia.org/r/288413

ReleaseTaggerBot added projects: MW-1.28-release (WMF-deploy-2016-05-10_(1.28.0-wmf.1)), MW-1.27-release (WMF-deploy-2016-05-03_(1.27.0-wmf.23)).May 12 2016, 4:00 PM

In T58041#2288620, @Joe wrote:

@Bawolff thanks for the work on this; when you're done with the last patch I'd say we should cherry-pick changes to the current wmf branch and go on with SWATTING them: the script in its current form is broken and it needs fixing.

@Joe Ok, so all patches are merged and swatted. Next step is to test the script again and see if it works quickly this time.

Just got back from vacation. I see you guys have been busy! I can go ahead and try running this on ruwiktionary and see if we have better luck this time...

Running on ruwiktionary now. It's flying through it!

@kaldari nice! So we can finally run it for every wiki once I upgrade the libicu version HHVM is linked to, I guess!

Remember to add a new request of reopen T130692 for pending reindexing work.

matmarex reopened subtask T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases as Open.May 13 2016, 6:35 PM

T130692 is reopened and updated.

What is left to do in MediaWiki core? T130692 is a Wikimedia servers task, can't block a MediaWiki task.

In core, we now need to verify that the script is no longer prohibitively slow for large wikis. The easiest way to do that is to run it on a few large Wikimedia wikis, which is blocked by that task.

Joe mentioned this in T86096: Switch HAT appservers to trusty's ICU (or newer).May 23 2016, 10:06 AM

@jcrespo which wiki should we test the script on? I see from T130692 that the schema change has been applied to quite a few shards already.

I spoke with @jcrespo on IRC and we agreed on testing how long a forced run of the script will take on some databases in s2; I will start testing on ptwiki.

Mentioned in SAL [2016-05-23T10:32:29Z] <_joe_> running updateCollations.php --force on ptwiki, T58041

The script ran on ptwiki (around 9 M rows) with --force in less than 5 hours, and with no database impact.

jcrespo closed subtask T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases as Resolved.May 24 2016, 9:36 AM

In T58041#2318772, @Joe wrote:

The script ran on ptwiki (around 9 M rows) with --force in less than 5 hours, and with no database impact.

Not bad. Assuming it scales linearly, enwiki with 105 M rows would take around two days, and frwiki with 33 M rows around 15 hours (and per T56680, it took over a week back in the day). I think that's no longer prohibitively slow, and we could close this bug?

kaldari closed this task as Resolved.May 24 2016, 4:59 PM

kaldari claimed this task.

kaldari moved this task from Ready to Q1 2018-19 on the Community-Tech-Sprint board.

• DannyH edited projects, added Community-Tech; removed Community-Tech-Sprint.May 24 2016, 8:15 PM

• DannyH moved this task from Up Next to Archive on the Community-Tech board.May 24 2016, 8:20 PM

Danny_B mentioned this in T136281: Broken sorting and multi-page categories for Cyrillic wikis.May 26 2016, 10:26 AM

• MZMcBride subscribed.May 28 2016, 1:28 PM

Krinkle mentioned this in T342697: Audit of unused indexes, 2023.Jul 26 2023, 5:52 PM

updateCollation.php script prohibitively slow for very large wikisClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

updateCollation.php script prohibitively slow for very large wikis
Closed, ResolvedPublic
Actions

Related Objects
Search...