Page MenuHomePhabricator

Increase $wgHTTPImportTimeout to a higher value on WMF wikis
Open, LowPublic

Description

To allow longer transwiki imports, let's try increasing $wgHTTPImportTimeout to, say, 50 or 55 seconds. Operations requested that it be kept below 60 seconds.

Event Timeline

TTO created this task.Jan 13 2017, 2:46 AM

Change 331946 merged by jenkins-bot:
Increase $wgHTTPImportTimeout to 50 seconds

https://gerrit.wikimedia.org/r/331946

Mentioned in SAL (#wikimedia-operations) [2017-01-18T14:39:31Z] <zfilipin@tin> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:331946|Increase $wgHTTPImportTimeout to 50 seconds (T155209)]] (duration: 00m 39s)

TTO added a comment.Jan 18 2017, 2:50 PM

The HTTP timeout has been increased to 50 seconds. I managed to import all 2,063 revisions of "Digital television" from enwiki to testwiki.

Please try some imports on your local wikis and see what this does for you. I'll leave this task open for a few days for people's feedback.

I tried to import 45k+ revisions of [[w:en:George W. Bush]] on https://test.wikipedia.org/w/index.php?title=Special:Import (which is a clearly unreasonable thing to do) and I got a blank page. We could try more tests just to document how many revisions sysops can reasonably expect to be able to import, but it's IMHO not too bad if a privileged user performing a rather extraordinary action sometimes gets an unfriendly result.

Joe added a comment.Jan 19 2017, 9:36 AM

@Nemo_bis a blank page usually means something different than a timeout has happened. Probably a memory limit was hit; if we want to be able to import tens of thousands of revisions we might want to transform that into an async job instead, too.

@Nemo_bis a blank page usually means something different than a timeout has happened.

True. On the other hand, the timeout on the API request is a de facto limitation on the quantity of data you receive (and probably the amount of memory required to process it), since I expect it to be highly correlated to processing and transfer time. Or not?

Probably a memory limit was hit; if we want to be able to import tens of thousands of revisions we might want to transform that into an async job instead, too.

Yes, this is a separate feature request to consider seriously. This task should only be considered a way to get the most out of the current system, I think.

TTO added a comment.Jan 19 2017, 11:03 AM

Probably a memory limit was hit; if we want to be able to import tens of thousands of revisions we might want to transform that into an async job instead, too.

Yes, this is a separate feature request to consider seriously. This task should only be considered a way to get the most out of the current system, I think.

The import system could really do with a substantial overhaul to provide features like warnings/confirmations, more options on where to store the imported pages, and asynchronicity. But it doesn't seem worth the effort, given how infrequently its limitations cause problems.