Page MenuHomePhabricator

Increase curl timeout for importImages.php
Closed, ResolvedPublic

Description

Per Filippo's comment below... It seems uploading large files (around the 4GB limit) via importImages.php is occasionally hitting the timeout causing the uploads to finish

Can we plumb in a larger request timeout (possibly via a separate $wg, or reusing $wgCopyUploadTimeout) for importImages.php type uploads like we already have for Copy Uploads; both of which are used for uploading/importing larger files?

/**
 * Different timeout for upload by url
 * This could be useful since when fetching large files, you may want a
 * timeout longer than the default $wgHTTPTimeout. False means fallback
 * to default.
 *
 * @var int|bool
 *
 * @since 1.22
 */
$wgCopyUploadTimeout = false;

I took a look at this, and it seems we're hitting a curl timeout on the mw side ? I'm not sure how long is the (default?) timeout, swift thinks the client disconnected before the operation was completed (status code 499)

Jun 30 19:57:39 ms-be1046 object-server: 10.64.32.221 - - [30/Jun/2019:19:57:39 +0000] "PUT /sdi1/49471/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" 499 - "PUT http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" "tx18c227e0922040c49af4c-005d191302" "proxy-server 8839" 305.1190 "-" 2166 0
Jun 30 19:57:39 ms-be1027 object-server: 10.64.32.221 - - [30/Jun/2019:19:57:39 +0000] "PUT /sdh1/49471/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" 499 - "PUT http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" "tx18c227e0922040c49af4c-005d191302" "proxy-server 8839" 305.1753 "-" 1808 0
Jun 30 19:57:39 ms-be1022 object-server: 10.64.32.221 - - [30/Jun/2019:19:57:39 +0000] "PUT /sdl1/49471/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" 499 - "PUT http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm" "tx18c227e0922040c49af4c-005d191302" "proxy-server 8839" 305.7051 "-" 1796 0
Jun 30 19:57:39 ms-fe1008 proxy-server: 10.64.16.77 10.64.32.221 30/Jun/2019/19/57/39 PUT /v1/AUTH_mw/wikipedia-commons-local-public.30/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm HTTP/1.0 499 - wikimedia/multi-http-client%20v1.0 AUTH_tkc4732bce3... 2860122112 89 fe6dcea4261da5ac78c912ebef159e98 tx18c227e0922040c49af4c-005d191302 - 305.0647 - - 1561924354.261385918 1561924659.326113939 0

from logstash (ignoring the fact that the message couldn't actually be indexed for unrelated reasons) (curl error: 28) Timeout was reached]

[2019-06-30T19:57:34,863][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-2019.06.30", :_type=>"mediawiki", :_routing=>nil}, 2019-06-30T19:57:34.000Z mwmaint1002 HTTP 100 (Continue) in 'SwiftFileBackend::doStoreInternal' (given '{"async":false,"src":"./Hurtigruten.05.11.1920x1080.NRK2.webm","dst":"mwstore://local-swift-eqiad/local-public/3/30/Hurtigruten.05.11.1920x1080.NRK2.webm","overwrite":true,"headers":{"x-content-dimensions":""},"dstExists":false}'): (curl error: 28) Timeout was reached], :response=>{"index"=>{"_index"=>"logstash-2019.06.30", "_type"=>"mediawiki", "_id"=>"AWup9tYHABdYYWJYyYdA", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"[err] is defined as a field in mapping [mediawiki] but this name is already used for an object in other types"}}}}

Can we also clarify what protected $timeout = 'default'; resolves to?

Event Timeline

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald Transcript

Can we also clarify what protected $timeout = 'default'; resolves to?

According to https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT.html, default is no timeout. Anyway, that class isn't used here, it goes through MultiHttpClient, which says protected $reqTimeout = 300;. That's five minutes, which doesn't look like enough.

Can we also clarify what protected $timeout = 'default'; resolves to?

According to https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT.html, default is no timeout. Anyway, that class isn't used here, it goes through MultiHttpClient, which says protected $reqTimeout = 300;. That's five minutes, which doesn't look like enough.

Mine took quite a bit longer than 5 minutes and succeeded, yours took only just over 5 minutes and failed...

True, but importUpload.php takes some time for prep stuff, before it actually starts a request. No idea how long that stuff takes. In T226953, I tried to change reqTimeout to 3000 (50 minutes) and it instantly succeeded. Yes, it's possible that it'd succeed anyways, but, well, we can't be 100% sure I think.

True, but importUpload.php takes some time for prep stuff, before it actually starts a request. No idea how long that stuff takes. In T226953, I tried to change reqTimeout to 3000 (50 minutes) and it instantly succeeded. Yes, it's possible that it'd succeed anyways, but, well, we can't be 100% sure I think.

We don't want to unconditionally raise it ideally... I'd suggest upto 10 or 15 minutes is probably enough of an increase, if it can succeed at 5 minutes sometimes, doubling or tripling it seems enough for the time being.

That wasn't mean as a permanent solution, just wanted to make sure i found the right timeouting place.

Sure, but it does seem like we probably need to increase it

Change 520135 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/core@master] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/520135

Was trying to find a way how to raise only for the script, but couldn't understand the method call flow. Might try that tomorrow once more, if you think it's necessary.

I don't think there's an easy way without building it in...

$backend = MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend();
$c = new ReflectionClass( $backend );
$p = $c->getProperty( 'backends' );
$p->setAccessible( true );
$backends = $p->getValue( $backend );
foreach( $backends as $b ) { var_dump( get_class( $b ) ); }
string(16) "SwiftFileBackend"
string(16) "SwiftFileBackend"

But basically, MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend()->backends contains the SwiftFileBackend instances if you want a starting point...

I don't think there's an easy way without building it in...

$backend = MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend();
$c = new ReflectionClass( $backend );
$p = $c->getProperty( 'backends' );
$p->setAccessible( true );
$backends = $p->getValue( $backend );
foreach( $backends as $b ) { var_dump( get_class( $b ) ); }
string(16) "SwiftFileBackend"
string(16) "SwiftFileBackend"

But basically, MediaWiki\MediaWikiServices::getInstance()->getRepoGroup()->getLocalRepo()->getBackend()->backends contains the SwiftFileBackend instances if you want a starting point...

Tried, giving up for this. Do you think we can just increase the default timeout for this timebeing, so it's at least possible to upload the files? This can be left opened then for somebody to provide a better solution sometime in the future.

I have not forgotten about T202352, which converts MultiHttpClient to use Guzzle internally. If no one else gets to that first, I'll eventually get back to it, and will update that patch for this timeout change, plus any additional changes that have occurred since I last worked on that. Guzzle (on our stack) will ultimately use curl anyway, and the timeout change will still be relevant.

Change 520135 merged by jenkins-bot:
[mediawiki/core@master] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/520135

Change 522951 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[mediawiki/core@wmf/1.34.0-wmf.13] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/522951

Urbanecm triaged this task as Medium priority.Jul 14 2019, 2:47 PM
Urbanecm moved this task from Backlog to To deploy - scheduled for SWAT on the User-Urbanecm board.

Change 522951 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.13] Raise default reqTimeout in MultiHttpClient

https://gerrit.wikimedia.org/r/522951

Mentioned in SAL (#wikimedia-operations) [2019-07-15T11:24:11Z] <urbanecm@deploy1001> Synchronized php-1.34.0-wmf.13/includes/libs/http/MultiHttpClient.php: SWAT: [[:gerrit:522951|Raise default reqTimeout in MultiHttpClient]] (T226979) (duration: 00m 51s)

An alternative way to fix this would be to have importImages.php pass reqTimeout/connTimeout as an option to LocalFile::publish(). The options are passed down to FileRepo::publishBatch(), which requires a patch to forward them to FileBackend. From there, the options array will apparently be passed down to SwiftFileBackend::doCreateInternal(), requiring only doc comment changes along the way. Parameters set in the request arrays in doCreateInternal() will be passed to MultiHttpClient.

It could also be addressed using constructor configuration. This would affect all callers of SwiftFileBackend, not specifically importImages.php. Timeout options would be added to the constructor options in $wgFileBackends which is set in WMF's filebackend.php. Then a patch to SwiftFileBackend::__construct() would pass the options through to MultiHttpClient::__construct().

Change 596535 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] In SwiftFileBackend allow HTTP timeouts to be set in the constructor

https://gerrit.wikimedia.org/r/596535

Change 596535 merged by jenkins-bot:
[mediawiki/core@master] In SwiftFileBackend allow HTTP timeouts to be set in the constructor

https://gerrit.wikimedia.org/r/596535