Page MenuHomePhabricator

video2commons general failure
Closed, ResolvedPublic

Assigned To
Authored By
Yann
May 16 2024, 2:19 PM
Referenced Files
F56083357: image.png
Jun 30 2024, 4:27 PM
F56083319: image.png
Jun 30 2024, 4:27 PM
F56083210: image.png
Jun 30 2024, 4:27 PM
Restricted File
Jun 30 2024, 4:14 PM
F55607093: image.png
Jun 22 2024, 7:28 PM
F55577452: image.png
Jun 22 2024, 12:16 PM
F55577399: image.png
Jun 22 2024, 12:16 PM
F55574967: screenshotv2c.jpg
Jun 22 2024, 11:03 AM

Description

All the tasks failed. Clearly V2C is not ready for use, except for beta-testers. Trying to convert one file locally doesn't help. I can't find a software which produces a file ready for upload without inner knowledge of video codecs and engineering skills. VLC produces a 8.77 GB WEBM file from a 1.252 GB MP4 file.

V2C failed.jpg (882×1 px, 659 KB)

V2C failed, 2.png (833×1 px, 176 KB)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hi! Mine is also stuck and won't process anything. Doesn't seem to be a me only issue, but would like to add this to the example pile.

Screenshot 2024-05-28 135743.png (927×2 px, 50 KB)

Sdkb triaged this task as Unbreak Now! priority.May 29 2024, 5:26 PM
Sdkb subscribed.

I am likewise unable to use the tool, with the task stuck on pending (lmk how to retrieve further info if you need it).

Until we have the ability to upload MP4s, this tool is essential infrastructure for Commons to process videos. Really unfortunate situation.

JJMC89 lowered the priority of this task from Unbreak Now! to Needs Triage.May 29 2024, 5:30 PM
JJMC89 subscribed.

Priority should normally be set by product managers, maintainers, community liaisons, or developers who plan to work on the task, or by the bugwrangler or experienced community members, not by the reporter filing the bug report or by outside observers. When in doubt, do not change the Priority field value, but add a comment suggesting the change and convincing reasons for it.

@JJMC89, per the text you quoted, I would fall into the "experienced community members" bucket, not the "reporter filing the bug report" or "outside observers" buckets that you bolded. Next time, please ensure that boilerplate is applicable to the situation before quoting it.

This task should be prioritized because video upload to Commons is essentially broken without it (the only alternative being external conversion tools that are either too complex for most users or charge fees for the file sizes of most videos being uploaded). If people find it annoying to have a high-priority task sitting around, they should consider resolving the task.

I would fall into the "experienced community members" bucket

Then you know that it shouldn't be UBN given that a maintainer is aware and is not working on immediately fixing the issue. Raising the priority isn't going to change that.

Thank you for all your work. It is a shame that the WMF does not see fit to pay you for it.

Concurred. Broader discussion raised here.

Hi @Chicocvenancio I worked on a similar project called VideoCutTool (for commons)
and I feel I can help you out here, lets connect and meanwhile I'll clone the repo and try to reproduce this error

I would fall into the "experienced community members" bucket

Then you know that it shouldn't be UBN given that a maintainer is aware and is not working on immediately fixing the issue. Raising the priority isn't going to change that.

This should be at least priority "High". I agree that it is an essential tool until MP4 is allowed on Commons. Actually I advocated on several places that maintenance of this tool should be taken over by the WMF, seeing its importance and complexity, and the lack of available volunteers with the required competence. I appreciate a lot that @Chicocvenancio and other volunteers work on this, but they obviously have not the time to fix it. I am available for testing, but my coding skill is unfortunately lacking.

This should be at least priority "High"

Wanting something doesn't make it true. Priority is determined exclusively by the people working on the bug. Changing the field to high doesn't magically make people work on the bug more, at most it might just cause tge bug to not reflect reality.

In other words, arguing about priority is counter productive and meaningless.

This should be at least priority "High"

Wanting something doesn't make it true. Priority is determined exclusively by the people working on the bug. Changing the field to high doesn't magically make people work on the bug more, at most it might just cause tge bug to not reflect reality.

In other words, arguing about priority is counter productive and meaningless.

Well, I expect that bugs will a high priority are taken seriously. I don't see that it is the case here. As shown by diff, and reported by several people, the tool is completely broken for 2 weeks, and I didn't see any anyone working on it.

Well, I expect that bugs will a high priority are taken seriously.

Generally this is not true (except maybe UBN). Priority field at best documents how maintainers feel about the bug, it does not change how they feel or cause them to work on the bug more. Changing the field just makes the field incorrect; it doesn't make the bug any higher priority to the people fixing it.

Fighting over priority only serves to distract from the issue at hand, it doesn't really help anything.



Clearly V2C is not ready for use, except for beta-testers.

@Yann I'm maintaining this on my free time and have made a grand total of 0 from all this work. This comment, as written, does not motivate me to work on this at all.

@Chicocvenancio - respectfully, it sounds like you are a little burnt out maintaining this tool. That's totally understandable, being a tool maintainer is not supposed to be a death sentence; there is no obligation to be on-call for this tool for the rest of your life, nor should there be. Would you be supportive of adding additional maintainers to the tool to spread out the burden? If so, maybe we could advertise on commons for new maintainers. The tool seems popular on commons, maybe someone is willing to step up.

I'm willing to step up as a maintainer to help until the WMF takes over (I agree with everyone here, the tool is too crucial to be maintained by the community, it should be part of mediawiki itself).
I have good knowledge of toolforge and Cloud VPS but not so much of video2commons itself yet, and the problem I face is that it's difficult to step in : I don't know yet how to simply setup a local development environment. Clearly it will take weeks until I'm able to really help to solve this problem.

What I end up doing when video2commons doesn't work is that I manually convert to webm on my own device using handbrake. It isn't a full proof alternative, but I guess it can work for a bit.

What I end up doing when video2commons doesn't work is that I manually convert to webm on my own device using handbrake. It isn't a full proof alternative, but I guess it can work for a bit.

@SDudley: Can you try with https://archive.org/details/dixiana-4-k when you have time?

This is a key widely-used tool and as long as there is no equivalent alternative it really needs to work. Downloading videos manually with yt-dlp and converting them with ffmpeg or other tools is not a good or sufficient alternative. The tool saves a lot of time and makes it easy to copy the file description and so on. It's completely ununderstandable why WMF doesn't make sure this tool keeps working and improves it (e.g. see my 3 proposals here).

Moreover, here I noticed that when uploading a video manually downloaded and converted (some people doing so may reduce file quality by doing this), the subtitles in the video are not displayed on WMC (they are displayed when downloading the video) – maybe this has to do with how subtitles are imported.

When trying to upload sth, it's always pending. Aborted tasks still only show "Your task is being aborted..." so the cancellations didn't seem to have worked.

PLEASE help sort this out. 'Your task is pending...' is now stuck for the last 48 hours (2 vids: 2.7GB + 1.5GB; each 3840x2160 VP9; subtitles off). I have around a 100 4K aerial videos to upload; any suggestions please?

What I end up doing when video2commons doesn't work is that I manually convert to webm on my own device using handbrake. It isn't a full proof alternative, but I guess it can work for a bit.

May we suggest you to try out https://videocuttool.wmcloud.org/

What I end up doing when video2commons doesn't work is that I manually convert to webm on my own device using handbrake. It isn't a full proof alternative, but I guess it can work for a bit.

May we suggest you to try out https://videocuttool.wmcloud.org/

Are you suggesting that the 5GB maximum is no longer applicable? If so, what is the maximum size, please? PS It doesn't event accept 1,5GB at the moment!

Two of the files I submitted on 7th May popped on Commons in the last three days:

Does anyone else have seen one of his old submissions recently come live?

The weird thing is that the file appears as if it had been effectively imported on 7th May, which is not the case.

Two of the files I submitted on 7th May popped on Commons in the last three days:

Does anyone else have seen one of his old submissions recently come live?

The weird thing is that the file appears as if it had been effectively imported on 7th May, which is not the case.

Whatever is going on here, is probably unrelated to video2commons.

Logs indicate that writes to the database on may 7th took much longer than normal. Not sure how that's related, but definitely odd, maybe the PHP process got killed before it was done(?). Maybe some sort of race condition with cache. The fact this is basically 30 days later, strongly suggests some sort of bad cache value that fell out after 30 days.

Expectation (writeQueryTime <= 1) by ApiMain::setRequestExpectations not met (actual: 62.707350969315) in trx #a90cb44384: INSERT INTO revision (rev_page,rev_parent_id,rev_actor,rev_minor_edit,rev_timestamp,rev_deleted,rev_len,rev_sha1,rev_comment_id) VALUES '?'
Expectation (writeQueryTime <= 1) by ApiMain::setRequestExpectations not met (actual: 14.840336799622) in trx #a90cb44384: INSERT IGNORE INTO page (page_namespace,page_title,page_is_redirect,page_is_new,page_random,page_touched,page_latest,page_len) VALUES '?'
Expectation (writeQueryTime <= 1) by ApiMain::setRequestExpectations not met (actual: 32.338083982468) in trx #a90cb44384: UPDATE page SET page_latest = '?',page_touched = '?',page_is_new = '?',page_is_redirect = '?',page_len = '?',page_content_model = '?' WHERE page_id = '?' AND page_latest = '?'
[And so on]

We also have some eventbus errors: Unable to deliver all events: 503: Service Unavailable

Another odd thing about the file is the cl_timestamp on all the categories are 2024-06-05 13:18:35 despite some of them allegedly being added on 2024-05-07T16:32:24, suggesting that linksupdate did not run on the original revision.

Anyways, definitely has nothing to do with video2commons, although at this point it is a bit unlikely we'll ever be able to figure out what happened. If more things like this happen, i would suggest filing a separate bug.


Are you suggesting that the 5GB maximum is no longer applicable?

5 GB limit is the max. No tool on toolforge is able to get around it.

Hi. I'm currently trying to convert several videos from YouTube since two days ago, all of the tasks display "Your task is pending". I don't know how to produce logs, but I'd be happy to offer more information if that can help with the issue. Are the alternatives I could try? Many thanks in advance.

@Jamez42: Hi, see previous comments - thanks.

All the pending tasks disappeared (more than a dozen) without any warning or error message. New task (325 MB video) still shows ''Your task is pending...''

Two days later, the same 325 MB video still shows ''Your task is pending...'' V2C is still broken.

It seems to work now. I would be useful to know why it stopped working, and how it was fixed (and why it took so long to do it). Thanks for the fix.

Thanks for fixing it. However, there is still a problem of uploading being often very slow and some tasks failing. See the screenshot (I guess this issue can be closed and that this should go into new issue/s).

screenshotv2c.jpg (1×1 px, 378 KB)

Note that Grafana graph is on your local time zone by default. The video2commons encoder instances were rebooted as a part of https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/message/IYVYMGLPNOU6JON52PV6R6NKX2XHMK6R/ starting at 13:45 UTC, so them starting to pick up load at about 13:50 UTC matches that very closely.

Note that Grafana graph is on your local time zone by default. The video2commons encoder instances were rebooted as a part of https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/message/IYVYMGLPNOU6JON52PV6R6NKX2XHMK6R/ starting at 13:45 UTC, so them starting to pick up load at about 13:50 UTC matches that very closely.

Thanks for the information. This explains why it started working again. This should have been done one month ago.

Not sure how long the tool will remain functional, we already lost two of the four encoding instances:

image.png (121×842 px, 20 KB)

@taavi any way to have some help from WMCS team to understand what's going on with video2commons?

Hi, I managed to upload 2 films, but the third one (1.27 GB) failed with

An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(\\\'/mnt/nfs/labstore-secondary-project/gentoo-prefix/usr/bin/ffmpeg -y -i /srv/v2c/output/01f579767d0ef32c/dl.mp4 -max_muxing_queue_size 4096 -threads 16 -row-mt 1 -crf 20 -qmin 1 -qmax 51 -b:v 0 -vcodec libvpx-vp9 -tile-columns 4 -auto-alt-ref 1 -lag-in-frames 25 -f webm -ss 0 -an -pass 2 -passlogfile /srv/v2c/output/01f579767d0ef32c/dl.mp4.an.vp9.webm.log /srv/v2c/output/01f579767d0ef32c/dl.mp4.an.vp9.webm\\\\\\\\nExitcode: 137\\\')>, None)"\')'

So better, but not there yet.

I now realize that V2C (-crf 20) produces very high quality videos, which is good for recent digital videos, but may be overkill for old films. It would be useful to be able to specify this parameter. For old films (-crf 30) is certainly sufficient. This would reduce the load, and greatly speed up the process.

Many thanks for the latest troubleshooting. I've been able to upload some files, but some of the following messages I receive are the following, the latter being more common:

An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(\\\'/mnt/nfs/labstore-secondary-project/gentoo-prefix/usr/bin/ffmpeg -y -i /srv/v2c/output/5f18bb370d6a4483/dl.mkv -max_muxing_queue_size 4096 -threads 16 -row-mt 1 -crf 20 -qmin 1 -qmax 51 -b:v 0 -vcodec libvpx-vp9 -tile-columns 4 -auto-alt-ref 1 -lag-in-frames 25 -f webm -ss 0 -acodec copy -pass 2 -passlogfile /srv/v2c/output/5f18bb370d6a4483/dl.mkv.vp9.webm.log /srv/v2c/output/5f18bb370d6a4483/dl.mkv.vp9.webm\\\\\\\\nExitcode: 137\\\')>, None)"\')'
An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskAbort\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskAbort(\\\'The task has been aborted.\\\')>, None)"\')'

I wonder if the tool/system crashed on May 15th because of the number of videos submitted at the same time (more than a dozen feature films). Do you have any idea of the reasonable number of such videos the system can process at one time? Is there a queue/waiting list if too many videos are requested at once? I have more than 100 feature films to import...

Now it seems V2C crashed again. I got

Sorry, something went wrong connecting this application. Go back and try to connect your account again, or contact the application author.

even when trying to log again.

Of course, the more contemporaneous errors continue to be reported on Commons.

I wonder if the tool/system crashed on May 15th because of the number of videos submitted at the same time (more than a dozen feature films). Do you have any idea of the reasonable number of such videos the system can process at one time? Is there a queue/waiting list if too many videos are requested at once? I have more than 100 feature films to import...

Basically all the error reports have been variants of out-of-memory errors, which basically means trying to do too much stuff at once (not to mention a reboot briefly fixed it, which is also consistent), so that is a likely hypothesis. I don't know how this tool is implemented, but normally a tool like this would work on some sort of queue system to prevent things like this. Different video types have different memory requirements, and these things can change between software versions, so its plausible that the tool has some sort of internal limiter that is tuned improperly. All this is just pure speculation as i'm not familiar with the implementation.

OK, thanks. I only got

An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(\\\'Sorry, but files larger than 4GB can not be uploaded even with server-side uploading. This task may need manual intervention.\\\')>, None)"\')'

for https://archive.org/details/where-the-north-begins-1923-by-chester-m.-franklin which is weird since the file is only 1.2 GB. At least there is a meaningful error message for once.

I think that getting this fully resolved if important since Commons will definitely be hosting more video files over the coming years. I think that as more and more films are actually entering the public domain that we will see large influxes of uploads through V2C.

Don-vip triaged this task as High priority.

@Don-vip Thanks a lot for taking care of this. I have noticed https://github.com/toolforge/video2commons/pull/193 Does it apply to files currently processed?

I don't know. I'm just discovering how video2commons works, and trying to not break anything.

First issue solved regarding the puppet alert firing continuously on video-redis-buster.video.eqiad.wmflabs instance for many months (prior to the reboot of 20th May).

Puppet CA certificate (/var/lib/puppet/ssl/certs/ca.pem) was created in 2019 and expired in April 2024. I copied the CA certificate from another instance, ran puppet agent again, and the alert is gone in Grafana. I checked that Redis is still able to pickup new tasks:

{F56082033}

Second issue on encoder06 instance. Puppet agent is failing. When launching it manually:

don-vip@encoding06:~$ sudo run-puppet-agent
...
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install ripgrep' returned 100: Reading package lists...
...
E: You don't have enough free space in /var/cache/apt/archives/.
...

Indeed the instance disk is full:

image.png (296×818 px, 19 KB)

It's not great neither for the two other instances encoding04 and encoding05:

image.png (296×818 px, 23 KB)

image.png (296×818 px, 22 KB)

EDIT: the disk is filled by Celery logs in /var/log/v2ccelery

EDIT2: it seems that logrotate does nothing for celery logs; although it should:

don-vip@encoding06:/var/log/v2ccelery$ cat /var/lib/logrotate/status | grep v2c
"/var/log/v2ccelery/celery1.log" 2023-12-23-0:0:0
"/var/log/v2ccelery/celery2.log" 2023-12-23-0:0:0

don-vip@encoding06:/var/log/v2ccelery$ cat /etc/logrotate.d/v2ccelery
# THIS FILE IS MANAGED BY MANUAL PUPPET
/var/log/v2ccelery/*.log {
    daily
    missingok
    rotate 52
    compress
    delaycompress
    notifempty
    copytruncate
}

EDIT3: Found the reason:

don-vip@encoding06:/var/log/v2ccelery$ sudo grep v2c /var/log/syslog | grep logrotate
Jun 30 00:00:02 encoding06 logrotate[19088]: error: skipping "/var/log/v2ccelery/celery1.log" because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation.
Jun 30 00:00:02 encoding06 logrotate[19088]: error: skipping "/var/log/v2ccelery/celery2.log" because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation.

Status update:

  • Puppet configuration problem on video-redis-buster: fixed
  • Logrotate configuration problem that filled the disk of all encoders: fixed
  • Workers are still able to pickup tasks and transcode videos

BUT I made a mistake in the process and temporarily lost the OAuth credentials of the service account on the encoder instances. As a result all tasks fail with this error:

An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(b\\\'pywikibot.Error: NoUsernameError: Failed OAuth authentication for commons:commons: The authorization headers in your request are not valid: Invalid consumer\\\')>, None)"\')'

I need access to video2commons toolforge project to get back the credentials and fix the encoding instances. I asked @Andrew in T367599.

Credentials mistake fixed (thanks to Andrew). Now video2commons should be fully up again.

To complete my post above about V2C being overloaded, I check https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=video&var-instance=encoding04&var-instance=encoding05&var-instance=encoding06&var-instance=encoding07&from=now-3h&to=now&refresh=30s And I see that the total load jumped from 0 to over 80% after restarting 4 videos (10:10 UTC). It seems that the system can be easily overloaded, intentionally or not. And at 10:50 UTC, 3 instances are down.

Found a big problem. After connecting to the Redis instance, it appears the db1 database contains 3.6GB of data for ~6000 keys, which is a lot for a database that only contains the current status of each task (the text displayed in the frontend).
If we look at the 15 biggest keys, they are between 86 Mb and 435 Mb, which is absolutely not expected (they usually only a few bytes/kilobytes of text).
The (truncated) contents of the biggest key is:

{
"status": "FAILURE",
"result":
    {
    "exc_type": "TaskError",
    "exc_message":
        [
        "b'pywikibot.Error: APIError: titleblacklist-custom-space: \\xe2\\xa7\\xbctitleblacklist-custom-space\\xe2\\xa7\\xbd\\n[param: b\\'--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"action\"\\\\n\\\\nupload\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"token\"\\\\n\\\\nbdc93f336bc28ceddbd274eb6d826e1a6676af8c+\\\\\\\\\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"text\"\\\\n\\\\n=={{int:filedesc}}==\\\\n{{Information\\\\n|description={{en|1=Investigate Europe research lays bare Europe\\\\\\'s problem with plastics. The investigation exposes failings in EU efforts to achieve a circular economy and details how unchecked production and use is creating a plastic waste crisis across the continent.\\\\n\\\\nRead the #Wasteland investigation: https://www.investigate-europe.eu/en/2023/wasteland-plastic-recycling/\\\\n\\\\nArt Direction & Motion Graphics Design: Alexia Barakou \\\\nSound design: Panagiotis Papagiannopoulos\\\\nNarration: Daphne Kouma\\\\nProduced by Reporters United (Athens, Greece)}}\\\\n|date=2023-04-27\\\\n|source={{From YouTube|1=FAzGr7d76VE|2=Wasteland - Europe\\\\\\'s plastic disaster}}\\\\n|author=[https://www.youtube.com/@InvestigateEurope Investigate Europe]\\\\n|permission=\\\\n|other_versions=\\\\n|other_fields=\\\\n}}\\\\n\\\\n=={{int:license-header}}==\\\\n{{YouTube CC-BY|Investigate Europe}}\\\\n{{LicenseReview}}\\\\n\\\\n[[Category:Uploaded with video2commons]]\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"filename\"\\\\n\\\\nWasteland - Europe\\\\\\'s plastic disaster .webm\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"comment\"\\\\n\\\\nImported media from https://www.youtube.com/watch?v=FAzGr7d76VE\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"assert\"\\\\n\\\\nuser\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"maxlag\"\\\\n\\\\n5\\\\n--===============3378181793368291637==\\\\nContent-Type: text/plain\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"format\"\\\\n\\\\njson\\\\n--===============3378181793368291637==\\\\nContent-Type: video/webm\\\\nMIME-Version: 1.0\\\\nContent-disposition: form-data; name=\"file\"; filename=\"FAKE-NAME\"\\\\nContent-Transfer-Encoding: binary\\\\n\\\\n\\\\x1aE\\\\xdf\\\\xa3\\\\x9fB\\\\x86\\\\x81\\\\x01B\\\\xf7\\\\x81\\\\x01B\\\\xf2\\\\x81\\\\x04B\\\\xf3\\\\x81\\\\x08B\\\\x82\\\\x84webmB\\\\x87\\\\x81\\\\x04B\\\\x85\\\\x81\\\\x02\\\\x18S\\\\x80g\\\\x01\\\\x00\\\\x00\\\\x00\\\\x02\\\\x11$\\\\xdb\\\\x11M\\\\x9bt\\\\xbcM\\\\xbb\\\\x8bS\\\\xab\\\\x84\\\\x15I\\\\xa9fS\\\\xac\\\\x81\\\\xa1M\\\\xbb\\\\x8bS\\\\xab\\\\x84\\\\x16T\\\\xaekS\\\\xac\\\\x81\\\\xd8M\\\\xbb\\\\x8cS\\\\xab\\\\x84\\\\x12T\\\\xc3gS\\\\xac\\\\x82\\\\x01\\\\xa2M\\\\xbb\\\\x8eS\\\\xab\\\\x84\\\\x1cS\\\\xbbkS\\\\xac\\\\x84\\\\x02\\\\x11\"&\\\\xec\\\\x01\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00W\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x15I\\\\xa9f\\\\xb2*\\\\xd7\\\\xb1\\\\x83\\\\x0fB@M\\\\x80\\\\x8dLavf58.76.100WA\\\\x8dLavf58.76.100D\\\\x89\\\\x88A\\\\x00\\\\x11(\\\\x00\\\\x00\\\\x00\\\\x00\\\\x16T\\\\xaek@\\\\xc4\\\\xae\\\\x01\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00Y\\\\xd7\\\\x81\\\\x01s\\\\xc5\\\\x88W\\\\x01E\\\\xefb\\\\xadE\\\\xa4\\\\x9c\\\\x81\\\\x00\"\\\\xb5\\\\x9c\\\\x83und\\\\x86\\\\x85V_VP9\\\\x83\\\\x81\\\\x01#\\\\xe3\\\\x83\\\\x84\\\\x02bZ\\\\x00\\\\xe0\\\\x01\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00&\\\\xb0\\\\x82\\\\x07\\\\x80\\\\xba\\\\x82\\\\x048\\\\x9a\\\\x81\\\\x02U\\\\xb0\\\\x98U\\\\xba\\\\x81\\\\x01U\\\\xb1\\\\x81\\\\x01U\\\\xbb\\\\x81\\\\x01U\\\\xb9\\\\x81\\\\x01U\\\\xb7\\\\x81\\\\x01U\\\\xb8\\\\x81\\\\x02\\\\xae\\\\x01\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00\\\\x00Y\\\\xd7\\\\x81\\\\x02s\\\\xc5\\\\x88\\\\xf3\\\\x04\\\\xd4|UPj0\\\\x9c\\\\x81\\\\x00\"\\\\xb5\\\\x9c\\\\x83eng\\\\x86\\\\x86A_OPUSV\\\\xaa..."
        ]
    }
}

It looks like upload errors result in the whole binary content of the video being serialized in the error message stored in the redis database!

Mentioned in SAL (#wikimedia-cloud) [2024-07-02T20:16:32Z] <don-vip> Fixed puppet configuration on video-redis-buster, logrotate configuration on all encoding instances as per T365154

7 videos completed, but 2 new errors:

An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(b\\\'pywikibot.Error: APIError: missingresult: No result in status data.\\\\\\\\n[param: action=upload&filekey=1b1h202q66gc.hsty98.1.webm&checkstatus=&assert=user&maxlag=5&format=json&token=617927d6baf819142dcdcf278bcd593f6683750c%2B%5C;\\\\\\\\n servedby: mw-api-ext.eqiad.main-b98b849b7-rx6rs;\\\\\\\\n help: See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.]\\\')>, None)"\')'

for https://archive.org/details/tartuffe-1926-by-f.-w.-murnau

and

An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(\\\'Upload failed!\\\')>, None)"\')'

for https://archive.org/details/behind-the-door-film-by-irvin-willat

@ Everyone: I wasn't able to work on v2c the past two weeks, starting from right now I'm going to update the whole infra to hopefully get rid of this problem for good.

I will try to minimize the unavailibilty / downtime of this operation but please remind I work on this on benevol time and it's the first time I do it so I can't ensure that no problems will happen in the next hours/days.

You can track the update progress here: T360711

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T20:40:35Z] <don-vip> drop gfg and encoding07 instances (unused) as per T360711 + T365154

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T20:41:24Z] <don-vip> migrated redis database from video-redis-buster to video-redis-bookworm as per T360711 + T365154

Mentioned in SAL (#wikimedia-cloud) [2024-07-24T23:39:45Z] <don-vip> updated and restarted video2commons frontend to use the bookworm redis instance. Setup encoding01 as the first new bookworm instance as per T360711 + T365154

Mentioned in SAL (#wikimedia-cloud) [2024-07-25T19:05:36Z] <don-vip> setup new instances encoding02 and encoding03 as per T360711 + T365154

Status update: the system has been completely updated and restructured as follows:

  • Reduced the number of celery workers to 4 workers per encoding instance
  • Doubled the number of instances from 3 to 6

Thus video2commons is able to process 6*4 = 24 video encoding requests at once, hopefully without crashing. Let's see how it goes...

Thanks a lot to @Don-vip for fixing this. For me, the main issue is fixed. There are some remaining issues, but the tool is generally working as expected. I have reported the errors appearing now at Github: https://github.com/toolforge/video2commons/issues

OK, thank you, let's consider it fixed then. We'll try to solve the remaining issues one by one.