Page MenuHomePhabricator

[epic] large file uploads to commons
Open, Needs TriagePublic

Description

This is a tracking task to collect all subtasks related to issues with large file uploads to commons.

Event Timeline

Neverending tracking tasks should not be created as it means that they'll never get resolved. Dedicated project tags should be used instead. (In case this is Tracking-Neverending; if it is only Epic and can be resolved at some point please ignore this comment.)

right... it was so nice to see all the subtasks in the graph :-(

Feel free to close!

Ah. "Large uploads to Commons are broken and need to be fixed" seems like a good Epic, when it manifests in multiple ways (see subtasks) and will require multiple interventions to resolve (moving imginfo to bulk storage, moving OCR out of metadata, "do something" so the job queue doesn't choke on publishing these out of stash, etc.). That satisfies the "can be fixed, even if it takes time" criterion.

I presume the intent is to actually try to get this fixed, and not just document the problems' existence?

Neverending tracking tasks should not be created as it means that they'll never get resolved. Dedicated project tags should be used instead. (In case this is Tracking-Neverending; if it is only Epic and can be resolved at some point please ignore this comment.)

Technically these issues can be resolved but my impression is that nobody is capable and interested in working on them. It affects quite some people directly (the uploaders) and ultimately hurts the wikis as educational content (like a big PDF, very high resolution image or a video) can be difficult or impossible to upload. Perhaps it'll never be resolved and these subtasks have been open for way too long, but that's not because it's unresolvable. The goals are clear.

Reporting on behalf of my colleagues from WMDE Policies team
WMDE is also affected by the big size file problem. It seems it might be related to some of already related sub-tasks, but we’d leave it for the experts at the WMF to identify.
In our cooperation with the German public broadcaster Terra X (see the session at Wikimania here for more information: https://www.youtube.com/watch?v=XOrcvJjoOwI), the community uttered the demand of clips at around 15 seconds up to 2 minutes. The ZDF (broadcaster) Team has uploaded about 200 files so far, all high quality, and would continue to provide content in 2k / as high quality as possible.
You can see the files and file usage here: https://mvc.toolforge.org/index.php?category=Videos_by_Terra_X&timespan=now-30&rangestart=&rangeend=&limit=200
The community feedback is very good. The files have been used around multiple WP language versions and been featured as media of the day repeatedly. Therefore we intend to continue with the cooperation with the broadcaster, hoping to also include longer videos, and in higher quality.
At the moment Terra-X editors tend to send clips only, so they have time to solve the problem. But if they switch to UHD quality, and produce longer interview sequences, we would face a real problem in our cooperation, as no uploads would really be possible.

Few observations from our side, also based on the input from Wikipedia and Commons editors who work with us on our project:

  • Upload from about 600 MiB still allows entering info about the file, but fails later on
  • Error from file size of more than 1 GiB appears directly after the actual upload -- the upload does not even start?
  • Uploaded video quality: 4K, at a data rate of 100 MBit/s. We would reach the critical file size limit (of 600 MiB) after a film length of about 75 seconds -- we certainly intend to include longer videos too.
  • Error always appears in our manual attempts (since estimated April 2021)
  • Upload method, upload OS and browser all lead to the same result and error
  • Error can be reproduced with any sufficiently large file
  • We’ve heard from Wikipedia/Commons volunteers that besides pictures and videos also PDFs are affected
  • Apparently only the file size is the problem: gigapixel images, which are small enough in file size, can be uploaded and published
  • database "local-swift-codfw" seems to be a central problem
  • Before (earlier this year?), a big fileupload seemed to have been problematic only via the "usual" upload interface (Upload Wizard). We're told that recently system admins seem to have problems via the server-side upload as well

Having functional upload is essential for WMDE’s pilot programme with German public broadcaster. Therefore we’d appreciate looking into this problem.