Page MenuHomePhabricator

V2C should be integrated into MW
Open, Needs TriagePublicFeature

Description

Feature summary:

Video2Commons ( https://video2commons.toolforge.org/ ) is an essential tool for uploading videos to Commons. It is more often broken than working properly. It should be integrated into MediaWiki, and maintained by the WMF. See https://commons.wikimedia.org/wiki/Commons_talk:Video2commons for a long list of issues.

Use case(s):

This tool is mainly used to upload videos from various websites (YouTube, Dailymotion) and for converting videos from a non-free format to WEBM.

Benefits:

This would benefit many people, mostly non-tech savvy people and those with a low bandwidth, for whom uploading videos is very difficult.

Event Timeline

The MediaWiki Upload Wizard should have a general "Upload from URL" option, importing from (free) websites is a long and tedious job. The Wikimedia Foundation (WMF) could try to maintain two (2) separate MediaAiki Homload Wizards, one (1) for "Own work" files and one (1) for imported files. That, or it could try to make the selection clear at the beginning of the upload process (like it already does for Flickr files for certain users).

Community-made tools should be adopted and integrated directly into the software, it seems like the Foundation has neglected this software for years. This is also related to the "built-in video converter tool", the MediaWiki Upload Wizard should be able to directly convert non-free file formats into free file formats (.MP4 → .webm).

[…]
the MediaWiki Upload Wizard should be able to directly convert non-free file formats into free file formats (.MP4 → .webm).

I strongly support this suggestion.

I am not sure, if this task is about adding the functionality of video transcoding to Upload Wizard, or if this task is about adding the functionality of upload-from-url to Upload wizard,

If it is about transcoding video files to webm, I propose to transcode to AV1 instead of VP9 as v2c does. AV1 now works in the MediaWiki software and it is better than VP9. For example you can keep HDR10 when transcoding to AV1.

Video transcoding takes large amounts of time. The Upload Wizard is basically a "watch while you upload a file and do some more input from time to time" tool. Video2commons is a "enter a video then log out and come back maybe later while v2c does its magic as a batch job that will take time but does not need user interaction" tool. Different approaches to UI/UX.

... Video2commons is a "enter a video then log out and come back maybe later while v2c does its magic as a batch job that will take time but does not need user interaction" tool. Different approaches to UI/UX.

Yes, and IMO, the V2C approach is better. First fill up the information, and then upload the image/video.

Sdkb renamed this task from V2C should integrated into MW to V2C should be integrated into MW.Dec 24 2023, 7:40 AM

TimedMediaHandler has $wgTmhEnableMp4Uploads. If you want to allow MP4 uploads direct to Commons, all you have to do is turn that on.

TimedMediaHandler has $wgTmhEnableMp4Uploads. If you want to allow MP4 uploads direct to Commons, all you have to do is turn that on.

Unfortunately the Commons community declined support for MP4 uploads of any kind in 2014. I would love to revisit this decision and provide good integrated support for MPEG-4 and H.264 input files instead of providing just "nothing" as we have for the last decade, assuming we can pass legal review for the patent license issues and the community is willing to overturn the RFC.

https://commons.wikimedia.org/wiki/Commons:Requests_for_comment/MP4_Video

Unfortunately the Commons community declined support for MP4 uploads of any kind in 2014. I would love to revisit this decision and provide good integrated support for MPEG-4 and H.264 input files instead of providing just "nothing" as we have for the last decade, assuming we can pass legal review for the patent license issues and the community is willing to overturn the RFC.

Yes, exactly. The requirement to upload MP4 videos via an unsupported tool is the result of the Commons community's commitment to patent purity. We can't just integrate V2C as Yann requests.

Interesting! Surprising nobody reached out to ask about or propose implementation details.

Note that both of those proposals are incompatible with how MediaWiki uploads are handled at present (where the original file is always available for download), and would require additional work to hide or delete the original file and manage the temporary placeholder for the file during conversion. We can't enable those proposals right now by flipping a switch, because no code exists to implement them.

I wouldn't recommend either of these proposals, though, as they attempt to hide the original source files which feels strongly against our principles. It would be much better for end-users as well as for devs to just enable direct upload of .mp4 files directly that contain acceptable codecs; this would preserve the original file (important for quality as well as generally avoiding destroying original data) and would be implementable very quickly.

Destroying or hiding the source data seems like an anti-pattern and should definitely be avoided if there's no legal reason not to allow download of original files.

We'll need to run anything proposed about MPEG-4 Visual, AAC, AC3, and H.264 codecs past legal first in any case, so this'll take some time.

Yes, exactly. The requirement to upload MP4 videos via an unsupported tool is the result of the Commons community's commitment to patent purity. We can't just integrate V2C as Yann requests.

How so – I think integrating a tool that converts mp4 to webm is perfectly aligned with the current stance and policy on MP4 files. Video2Commons is about more than converting mp4 to webm though, it's about making it easy and accessible to upload videos from sites like YouTube.

Yes, exactly. The requirement to upload MP4 videos via an unsupported tool is the result of the Commons community's commitment to patent purity. We can't just integrate V2C as Yann requests.

How so – I think integrating a tool that converts mp4 to webm is perfectly aligned with the current stance and policy on MP4 files. Video2Commons is about more than converting mp4 to webm though, it's about making it easy and accessible to upload videos from sites like YouTube.

I have to ask -- is that compatible with the terms of service of YouTube? We probably can't implement anything like that.

Note I've started a specific task for collecting current upload/stash failures with large video files and getting those finally fixed: T391473 -- folks having problems with their own uploads, please check if the error reponses you're seeing are in there already or if you have another one I can help track down. :)

I have to ask -- is that compatible with the terms of service of YouTube? We probably can't implement anything like that.

As far as I understand it, uploading MP4 is not permitted today because OGG, WEBM, MOV and MP4 are container formats and only OGG and WEBM guarantee that only free formats are contained in the container. MP4 can contain the currently free H264 format, but also the currently non-free H265 or a new non-free format in the future. It is considered to incomprehensible for users that there are files with the extension MP4 that are allowed and others that are not. In fact, most people (and most people who would upload a video to Commons) do not know that video formats are container formats, let alone what formats the container contains. On the other hand, it can be assumed that very large videos in particular have modern (non-free) video encoding because they are not quite as large as they would be with H264 and because smartphones now use modern (non-free) encoding by default.

About YouTube: The existing v2c works on the basis, that transcoding mp4 to webm for YouTube files is ok. Should that not be the case, v2c would either have to exclude youtube or be deactivated at all.

It looks like YouTube's terms of use do not allow automated downloads. This is unrelated to codecs completely! This is about access to a computer system to download files.

https://www.youtube.com/static?template=terms

The following restrictions apply to your use of the Service. You are not allowed to:
access, reproduce, download, distribute, transmit, broadcast, display, sell, license, alter, modify or otherwise use any part of the Service or any Content except: (a) as expressly authorized by the Service; or (b) with prior written permission from YouTube and, if applicable, the respective rights holders;
circumvent, disable, fraudulently engage with, or otherwise interfere with any part of the Service (or attempt to do any of these things), including security-related features or features that (a) prevent or restrict the copying or other use of Content or (b) limit the use of the Service or Content;
access the Service using any automated means (such as robots, botnets or scrapers) except (a) in the case of public search engines, in accordance with YouTube’s robots.txt file; or (b) with YouTube’s prior written permission;

[snip]

use the Service to view or listen to Content other than for personal, non-commercial use (for example, you may not publicly screen videos or stream music from the Service); or

[snip]

@C.Suthorn, I think you're conflating two issues here (and this makes some sense, because V2C does two separate things which might be problamtic).

One of the issues is transcoding. The other is, as @bvibber mentions, downloading from YouTube (and other sites).

V2C actually will try to avoid transcoding video if the original linked is in a format it recognizes as free. (One problem, which Yann and I have now raised at the V2C GitHub, is that it does not actually list AV1 as one of its acceptable output formats, and thus will feed any AV1 WebM that it receives into ffmpeg to transcode to VP9, which isn't ideal for... all the reasons you can imagine. That should be fairly trivial to fix, however.)

YouTube actually serves its videos in free formats (well, also in non-free ones, but the best quality ones are generally VP9/opus, which is perfectly acceptable as a format on Commons). So pulling a video from YouTube on V2C doesn't require transcoding anything at all; Google offers vp9/opus already as their preferred format. V2C uses the free program yt-dlp to (try to) download videos from YouTube. (I say "try to" because, as mentioned on V2C's page, it is currently getting flagged for bot verification by YT, and so the downloads are getting blocked from that IP.) Anyway, the thing about yt-dlp — which you can also use on your own computer — is that its main purpose, downloading YouTube videos, might be against the YouTube TOS (though this is not enforced in reality, if it is indeed against some enforceable provision of the YouTube TOS). Some videos on YouTube are marked with YouTube's CC BY 3.0 license tag (and these, I think, can also be downloaded another way); other videos are marked as free in the description or are known to be freely licensed/PD for other reasons (but aren't marked as such within YouTube's system). Downloading any of these videos is trivial — but that might be against YouTube's TOS.

The second thing that V2C does is transcode video. Whenever something isn't in one of its specifically listed free formats (which doesn't yet include AV1), it'll transcode (using ffmpeg, IIRC) into a free format (generally vp9/opus). ffmpeg reads tons of codecs that may be patent-encumbered, and can encode in tons of non-encumbered codecs. So ffmpeg can read H264 or H265 or whatever and pop out AV1 or VP9 or the like. Patent-encumbered formats are used all the time online, of course; the issue is that the WMF doesn't want to host these. What you want to do with uploading/transcoding from them is an open question.

BTW, H264 is not a free (as in, not encumbered by patents) format. MPEG-4 Part 2 video is, but H264 isn't quite yet. Indeed, MP4 can contain various different codecs, though — as can, say, MKV (of which WebM is a restricted subset). There are various free codec combinations that we don't support — most notably, we don't support AV1+FLAC, which is unfortunate. Also, I'm pretty sure that the ogg container can actually be used to store some non-free codecs (although in practice it almost never is).

I've opened a spike task T391529 for myself to sketch out more details of what it would look like to add a transform/transcode stage to the uploadstash pipeline, so we can do automatic conversions of HEIC/HEIF images and MP4 videos with H.264 or HEVC codecs _and_ retain the original file within MediaWiki's access _and_ keep it from being directly downloadable if we're shy about exposing the source files for now due to patents/format wars/etc.

If that looks reasonably feasible I'm going to try to get a little more work time to implement this, as I think it could bring a lot of the basic conversion needs for video2commons "in-house".

I'm not going to touch the downloader stuff for now though, as that's hairy from a terms of use standpoint.

In a decision on 21 November 2024, the Higher Regional Court of Hamburg (OLG Hamburg, not the highest German court, but the second highest) made a decision (blocking a website that links to yt-dlp) which effectively means that the use of yt-dlp as well as any other downloader to download YouTube videos (in Germany, but according to the same reasoning almost everywhere) is illegal.

What is the background? YouTube disguises the URLs of the individual parts that make up a video (stream). According to the OLG, YouTube does this so effectively that it constitutes secure copy protection, the circumvention of which is punishable under the laws on secure copy protection. The court also assumes that Google uses this procedure precisely to prevent the downloading of videos (and audio - the music industry was the plaintiff).

As a result, downloading videos from YouTube is always illegal, but so is downloading videos from other sites that use comparable URL obfuscation (which are very many. And which is the reason why software like yt-dlp exists in the first place).

This also means that Google is breaking the law. Some of the videos on Google are Creative Commons-licensed. This means that their use (e.g. downloading) may not be prevented or made considerably more difficult by technical measures. According to the reasoning of the OLG, Google makes it so difficult to download (also Creative Commons) videos that videos cannot be downloaded in compliance with the licence (the OLG does not go into Creative Commons, it is only about yt-dlp, it is my interpretation that the ruling has this meaning).

https://netzpolitik.org/2024/entscheidung-des-olg-hamburg-youtube-dl-org-bleibt-gesperrt/

The "according to the same reasoning almost everywhere" is doing a lot of work, because it is far from established that any other court (in a different country, with different laws) would necessarily come to the same conclusions.

I hope, that not even a higher german or european court would come to the same conclusion. But as far as I know, thw ruling has not been challenged and it is in line with the ideas of Trump and Musk.

In a decision on 21 November 2024, the Higher Regional Court of Hamburg (OLG Hamburg, not the highest German court, but the second highest) made a decision (blocking a website that links to yt-dlp) which effectively means that the use of yt-dlp as well as any other downloader to download YouTube videos (in Germany, but according to the same reasoning almost everywhere) is illegal.

That is not what your link says (it's about blocking a website, not a software).
And "which effectively means" is only one possible interpretation.
And in any case, YouTube's Terms of Service is off-topic for this task ("V2C should be integrated into MW") and was already covered in T236446, thus no news. :)

That is not what your link says (it's about blocking a website, not a software).

As I said: It is about blocking a website, that links to the software. The music industry sued against the german ISP that hosts the webwite that links to the software. The software is hosted on a site (github) outside of Germany. So the music industry would have needed to sue outside of Gemany to block the software. Never the less the intention was to block access to the software with the argument that the software is illegal (at least its use to download youtube videos). And the music industy succeeded in "proving" that yt-dlp is illegal.

As I said: It's off-topic for this task.

So what we can probably *never* do is integrating the stream-ripping feature that uses yt-dlp. While some sites may not forbid manual or automated downloads circumventing the browser session, YouTube certainly does. If we were to ever integrate something like this, it would need to be very strictly allow-listed to sites that explicitly allow downloads or at least do not explicitly forbid them, and that probably means bureaucracy and legal/partnership checks around allowing. And it won't include YouTube.

I don't think WMF can "recommend" any workflow using yt-dlp but you can always upload YouTube VP9/Opus .webm videos you downloaded at home without additional conversion. :)

I've opened a spike research task T391529 to experiment with something in the conversion-on-upload area, which is probably a lot easier for us to arrange and would allow uploading other files you might ... obtain legally in your country ... that didn't happen to be VP9/Opus or AV1/Opus.

My recommendation is to decline this task (T353659) but I won't force it closed if folks still want to comment.

What we need is MediaWiki doing the conversion from MP4 to a free format. For downloading and uploading, we can use upload-by-url which already exists. It allows uploading from a white list, so we can deny such websites which forbid using yt-dlp.