Page MenuHomePhabricator

Server-side upload request for D. Benjamin Miller
Closed, ResolvedPublicRequest

Description

Please upload the following file(s) to Wikimedia Commons:

My username is D. Benjamin Miller. Thank you.

Event Timeline

Note we may be deprecating the server-side upload queue as it's largely unmaintained and results in people waiting months for nothing to happen. I'd recommend trying to upload the files yourself with UploadWizard first; if you encounter any problems please report as specifically as possible any failures or errors you encounter so we can try to track them down and fix them.

I've tried numerous times with the chunked uploader. I am able to get the files to upload to the stash perfectly reliably, but every attempt to get the file to publish (via async + MW API sandbox) fails. Always just fails at the publish stage. I've tried a bunch of times over the past few days, and so I'd hope that it would work, but I just couldn't get it to... I keep getting stashfailed timeouts when trying to publish. This is probably because these files are large (close to, but not above, the upload limit).

Anyway, if there's anything you can do to help on that front, please let me know. I find that somewhere between 4 and 5 GB, files time out. I have my version of The Black Watch (1929).webm in the stash right now — 1bpco17baagc.uzyqdi.3678332.webm (4.86 GB) but it just won't publish.

Another typical error:

{
    "error": {
        "code": "stashfailed",
        "info": "Internal error: Server failed to publish temporary file.",
        "docref": "See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
    },
    "servedby": "mw-api-ext.eqiad.main-865c7549cc-whmmm"
}

Excellent, that's very helpful in narrowing it down -- the final publishing step is indeed the most likely to fail because it has the most moving parts. I'll see if I can pull some more error logs and see if I can figure out exactly how it's failing and how to get it to recover better. This absolutely should *not* be failing on people, it *should* be more reliable, but it hasn't gotten enough attention as file sizes keep growing thanks to 4K video. :)

ok here's a couple examples from the backend failing on one or another part of upload stash updates:

[42cc620f-0263-4c35-8089-af75b6979329] /w/api.php   Wikimedia\Rdbms\DBQueryError: Error 1205: Lock wait timeout exceeded; try restarting transaction
Function: UploadFromChunks::updateChunkStatus
Query: UPDATE  `uploadstash` SET us_status = 'chunks',us_chunk_inx = 248,us_size = 5218804874 WHERE us_key = '1boycbidkth0.xdulzl.3678332.webm'
Error 1205: Lock wait timeout exceeded; try restarting transaction
Function: UploadStash::stashFile
Query: INSERT INTO `uploadstash` (us_user,us_key,us_orig_path,us_path,us_props,us_size,us_sha1,us_mime,us_media_type,us_image_width,us_image_height,us_image_bits,us_source_type,us_timestamp,us_status) VALUES (12402743,'1bojseh2z1mg.51xn5j.12402743.png','/tmp/phpp2Oqz5','mwrepo://local/temp/6/69/20250330093204!phpp2Oqz5.png','a:12:{s:5:\"width\";i:557;s:6:\"height\";i:818;s:4:\"bits\";i:1;s:8:\"metadata\";a:6:{s:10:\"frameCount\";i:0;s:9:\"loopCount\";i:1;s:8:\"duration\";d:0;s:8:\"bitDepth\";i:1;s:9:\"colorType\";s:9:\"greyscale\";s:8:\"metadata\";a:2:{s:8:\"DateTime\";s:19:\"2025:03:30 09:30:35\";s:15:\"_MW_PNG_VERSION\";i:1;}}s:10:\"fileExists\";b:1;s:4:\"size\";i:9257;s:9:\"file-mime\";s:9:\"image/png\";s:10:\"major_mime\";s:5:\"image\";s:10:\"minor_mime\";s:3:\"png\";s:4:\"mime\";s:9:\"image/png\";s:4:\"sha1\";s:31:\"ce379dqi28la3jifa1cyubrb7h2o15c\";s:10:\"media_type\";s:6:\"BITMAP\";}',9257,'ce379dqi28la3jifa1cyubrb7h2o15c','image/png','BITMAP',557,818,1,'file','20250330093216','finished')

Looks like bog-standard timeouts, it's just taking a while and a connection times out. This can probably be worked around by being more careful about transactions so we can safely disconnect/reconnect, as handled in some other places with video handling.

I'll make a note to consolidate some related bug reports and dive into it shortly.

(I swear this used to be more reliable, but now I realize we increased the max significantly since last time I was seriously testing it so I think we made it fail more often for large files. :D)

Anecdotally (from uploading some large videos), I find that somewhere between 4 and 5 GB, the failure rate goes from pretty low to ~100%. Like, I have no issue uploading a 3.85 GB video. I can't upload a 4.85 GB video. Basically, this is really just a problem when I am uploading my AV1 encodes of feature films, which can really benefit from getting as close to 5 GB as possible.

Really, can't this be fixed by just making the relevant timeout longer?

I also frequently get

{
    "error": {
        "code": "stashfailed",
        "info": "Could not acquire lock. Somebody else is doing something to this file.",
        "docref": "See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
    },
    "servedby": "mw-api-ext.eqiad.main-865c7549cc-r9zlv"
}

Ok, let's follow up with the bug details on T391473 :)

For The Cocoanuts.webm, we have the following errors:

During the direct upload attempt (reqid: 447122a2-9bf1-41e0-b3c9-651a7dcc6bd0. Note there are multiple requests with same id here running concurrently which makes logs confusing to read ):

Apr 4, 2025 @ 07:22:06.588 - publish job starts
Apr 4, 2025 @ 07:30:18.084 cannot reconnect to db1244 silently: session state loss (explicit transaction)
Apr 4, 2025 @ 07:30:18.092  [447122a2-9bf1-41e0-b3c9-651a7dcc6bd0] /rpc/RunSingleJob.php   Wikimedia\Rdbms\DBQueryDisconnectedError: A connection error occurred during a query. 
Query: SELECT  actor_user,actor_name,actor_id  FROM `actor`    WHERE actor_name = 'D. Benjamin Miller'  LIMIT 1  
Function: MediaWiki\User\ActorStore::findActorIdInternal
Error: 2006 MySQL server has gone away
Apr 4, 2025 @ 07:30:18.092 Failed executing job: PublishStashedFile Special: filename=The_Cocoanuts_(1929).webm filekey=1boytgnd3d8k.9livva.3678332.webm...

So I guess we are holding an explicit transaction open for too long. It goes away. MW refuses to reconnect since its explicit. Bad things happen.

This would be fine if it was an implicit transaction, as then it would automatically reconnect. I don't think we used to hold transactions open like this, so maybe a regression.


During the upload by url attempt (req id 16d74272-7d31-4e08-b285-466020fee6da ):

Looking further in the logs, it appears that the assemble job also loses the DB connection, but there is a Wikimedia\Rdbms\Database::handleErroredQuery: lost connection to db1227 with error 2006; reconnected log, so i guess no explicit transaction is open, so there is no issue. Kind of odd that Publish has an open transaction but Assemble does not.

[Edit: See T391473 for debugging of this.]

Filed T391755 for increasing the upload by url time limit.