Page MenuHomePhabricator

Increase curl timeout for importImages.php
Closed, ResolvedPublic

Description

Per Filippo's comment below... It seems uploading large files (around the 4GB limit) via importImages.php is occasionally hitting the timeout causing the uploads to finish

Can we plumb in a larger request timeout (possibly via a separate $wg, or reusing $wgCopyUploadTimeout) for importImages.php type uploads like we already have for Copy Uploads; both of which are used for uploading/importing larger files?

/**
 * Different timeout for upload by url
 * This could be useful since when fetching large files, you may want a
 * timeout longer than the default $wgHTTPTimeout. False means fallback
 * to default.
 *
 * @var int|bool
 *
 * @since 1.22
 */
$wgCopyUploadTimeout = false;

I took a look at this, and it seems we're hitting a curl timeout on the mw side ? I'm not sure how long is the (default?) timeout, swift thinks the client disconnected before the operation was completed (status code 499)

Jun 30 19:57:39 ms-be1046 object-server: 10.64.32.221 - - [30/Jun/2019:19:57:39 +0000] "PUT /sdi1/49471/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" 499 - "PUT http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" "tx18c227e0922040c49af4c-005d191302" "proxy-server 8839" 305.1190 "-" 2166 0
Jun 30 19:57:39 ms-be1027 object-server: 10.64.32.221 - - [30/Jun/2019:19:57:39 +0000] "PUT /sdh1/49471/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" 499 - "PUT http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" "tx18c227e0922040c49af4c-005d191302" "proxy-server 8839" 305.1753 "-" 1808 0
Jun 30 19:57:39 ms-be1022 object-server: 10.64.32.221 - - [30/Jun/2019:19:57:39 +0000] "PUT /sdl1/49471/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" 499 - "PUT http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" "tx18c227e0922040c49af4c-005d191302" "proxy-server 8839" 305.7051 "-" 1796 0
Jun 30 19:57:39 ms-fe1008 proxy-server: 10.64.16.77 10.64.32.221 30/Jun/2019/19/57/39 PUT /v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm HTTP/1.0 499 - wikimedia/multi-http-client%20v1.0 AUTH_tkc4732bce3... 2860122112 89 fe6dcea4261da5ac78c912ebef159e98 tx18c227e0922040c49af4c-005d191302 - 305.0647 - - 1561924354.261385918 1561924659.326113939 0

from logstash (ignoring the fact that the message couldn't actually be indexed for unrelated reasons) (curl error: 28) Timeout was reached]

[2019-06-30T19:57:34,863][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2019.06.30", :_type=>"mediawiki", :_routing=>nil}, 2019-06-30T19:57:34.000Z mwmaint1002 HTTP 100 (Continue) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"src":"./Hurtigruten.05.11.1920x1080.NRK2.webm","dst":"mwstore://local-swift-eqiad/local-public/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm","overwrite":true,"headers":{"x-content-dimensions":""},"dstExists":false}'): (curl error: 28) Timeout was reached], :response=>{"index"=>{"_index"=>"logstash-2019.06.30", "_type"=>"mediawiki", "_id"=>"AWup9tYHABdYYWJYyYdA", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"[err] is defined as a field in mapping [mediawiki] but this name is already used for an object in other types"}}}}

Can we also clarify what protected $timeout = 'default'; resolves to?

Event Timeline

Reedy created this task.Jul 1 2019, 11:27 AM
Restricted Application added projects: Commons, Multimedia. · View Herald TranscriptJul 1 2019, 11:27 AM
Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald Transcript
Reedy updated the task description. (Show Details)Jul 1 2019, 12:52 PM

Can we also clarify what protected $timeout = 'default'; resolves to?

According to https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT.html, default is no timeout. Anyway, that class isn't used here, it goes through MultiHttpClient, which says protected $reqTimeout = 300;. That's five minutes, which doesn't look like enough.

Reedy added a comment.Jul 1 2019, 8:55 PM

Can we also clarify what protected $timeout = 'default'; resolves to?

According to https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT.html, default is no timeout. Anyway, that class isn't used here, it goes through MultiHttpClient, which says protected $reqTimeout = 300;. That's five minutes, which doesn't look like enough.

Mine took quite a bit longer than 5 minutes and succeeded, yours took only just over 5 minutes and failed...

True, but importUpload.php takes some time for prep stuff, before it actually starts a request. No idea how long that stuff takes. In T226953, I tried to change reqTimeout to 3000 (50 minutes) and it instantly succeeded. Yes, it's possible that it'd succeed anyways, but, well, we can't be 100% sure I think.

Reedy added a comment.Jul 1 2019, 10:31 PM

True, but importUpload.php takes some time for prep stuff, before it actually starts a request. No idea how long that stuff takes. In T226953, I tried to change reqTimeout to 3000 (50 minutes) and it instantly succeeded. Yes, it's possible that it'd succeed anyways, but, well, we can't be 100% sure I think.

We don't want to unconditionally raise it ideally... I'd suggest upto 10 or 15 minutes is probably enough of an increase, if it can succeed at 5 minutes sometimes, doubling or tripling it seems enough for the time being.

That wasn't mean as a permanent solution, just wanted to make sure i found the right timeouting place.

Reedy added a comment.Jul 1 2019, 10:47 PM

Sure, but it does seem like we probably need to increase it

Change 520135 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/core@master] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/520135

Was trying to find a way how to raise only for the script, but couldn't understand the method call flow. Might try that tomorrow once more, if you think it's necessary.

Reedy added a comment.Jul 1 2019, 11:26 PM

I don't think there's an easy way without building it in...

$backend = MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend();
$c = new ReflectionClass( $backend );
$p = $c->getProperty( 'backends' );
$p->setAccessible( true );
$backends = $p->getValue( $backend );
foreach( $backends as $b ) { var_dump( get_class( $b ) ); }
string(16) "SwiftFileBackend"
string(16) "SwiftFileBackend"

But basically, MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend()->backends contains the SwiftFileBackend instances if you want a starting point...

I don't think there's an easy way without building it in...

$backend = MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend();
$c = new ReflectionClass( $backend );
$p = $c->getProperty( 'backends' );
$p->setAccessible( true );
$backends = $p->getValue( $backend );
foreach( $backends as $b ) { var_dump( get_class( $b ) ); }
string(16) "SwiftFileBackend"
string(16) "SwiftFileBackend"

But basically, MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend()->backends contains the SwiftFileBackend instances if you want a starting point...

Tried, giving up for this. Do you think we can just increase the default timeout for this timebeing, so it's at least possible to upload the files? This can be left opened then for somebody to provide a better solution sometime in the future.

I have not forgotten about T202352, which converts MultiHttpClient to use Guzzle internally. If no one else gets to that first, I'll eventually get back to it, and will update that patch for this timeout change, plus any additional changes that have occurred since I last worked on that. Guzzle (on our stack) will ultimately use curl anyway, and the timeout change will still be relevant.

Change 520135 merged by jenkins-bot:
[mediawiki/core@master] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/520135

Change 522951 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/core@wmf/1.34.0-wmf.13] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/522951

Urbanecm claimed this task.Jul 14 2019, 2:47 PM
Restricted Application added a project: User-Urbanecm. · View Herald TranscriptJul 14 2019, 2:47 PM
Urbanecm triaged this task as Normal priority.Jul 14 2019, 2:47 PM
Urbanecm moved this task from Backlog to To deploy - scheduled for SWAT on the User-Urbanecm board.

Change 522951 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.13] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/522951

Mentioned in SAL (#wikimedia-operations) [2019-07-15T11:24:11Z] <urbanecm@deploy1001> Synchronized php-1.34.0-wmf.13/includes/libs/http/MultiHttpClient.php: SWAT: [[:gerrit:522951|Raise default reqTimeout in MultiHttpClient]] (T226979) (duration: 00m 51s)

Urbanecm closed this task as Resolved.Jul 15 2019, 11:26 AM