Page MenuHomePhabricator

Allow media embedding from Wikimedia Commons using oEmbed
Open, LowPublicFeature

Description

Feature summary: Currently it is impossible to share media from Wikimedia Commons directly at external sites (social media, WordPress blogs...), partially because oEmbed is not supported by us. If we want to be "the central repository of free knowledge" we must be able to share media directly, without need of third party sites.

Use case(s) :

  • Quickly share/embed videos or audio from Commons in external sites.
  • Embed videos at WordPress blogs directly (as it already works with Vimeo or YouTube links) without having to touch HTML code

Benefits (why should this be implemented?):

  • Being the central repository of free knowledge
  • Making Commons more prominent an a real site to share media
  • Making our activities less dependant on YouTube (and their advertisements)
  • Improving the process, as we only need to upload the media once if we want to share it
  • Wikimedia Commons contributors will have more correct credits to their contributions over the Internet, reducing copyright violations

Resources

See this schema by @valerio.bozzolan (feel free to replace and update - visit the file to see its source code):

Schema of oembed interaction between a blog and a generic MediaWiki.png (1×2 px, 200 KB)

Non-Resources

Triage

(↓ Small triage done during wmhack 2025 by @valerio.bozzolan)

To make this actionable, this can be logically divided in three parts (I think all of them can be done in parallel):

  1. Rendering: have more support for important content models (difficult)
    • ✅ videos: seems covered
    • 🔶 images: ...?
      • ❓ is there something already implemented to embed images in an HTML page? (e.g. with title and license credits)
      • 🔶 to implement something, we may need extension PageImages who knows thumbnail dimensions (?)
    • 🔶 text pages: ...?
      • ❓ is there something already implemented to embed a normal page in an HTML page? (e.g. with title and first paragraph and image)
      • 🔶 to implement something, we may need extension TextExtract to produce short introductions
  2. API: add (seems easy)
    • it should be implemented as a REST API not action APIs, to manage JSON (most easiest format required), but also to potentially support XML in the future as bonus point without rewriting everything (action APIs are not really intended to hack the format parameter)
  3. Backend: embed provider abstraction
    • 🔶 NICE TO HAVE: a generic MediaWiki core REST API should be able to easily interrogate the embed feature availability for a generic page, e.g. possibly not hardcoding "is TimedMediaHandler installed?", so that a new API REST could ask "is there anything giving embed support for this page (that is a video)?" and TimedMediaHandler could register to that new hook and reply with something that embeds https://commons.wikimedia.org/wiki/File:The_Mechanical_Cow_(1927).webm?embedplayer=1 and with the desired dimensions, etc. (see point 3. generalizing embed support).
    • ✅ This is not a blocking point, since in a first prototype we can just check "is TimedMediaHandler installed?" and that would work to know that videos have ?embedplayer=1, to add the "oembed discovery" tag in the HTML there.

Pitfalls

  • it's still nowhere clear where this feature should be introduced since it seems it should rely on multiple already-existing extensions, so I would personally encourage the future lazy hacker to go ahead with a little early dirty prototype in the core of MediaWiki, so, creating a prototype to be potentially put in the trash bin later, but veeery useful to quickly highlight absurdities in this path and highlight rabbit holes
  • probably a first release should just concentrate on videos (so, relying on the ?embedplayer=1 thing), but this seems not a good reason to expand TimedMediaHandler directly, IMVHO, since the overall goal is also to support also normal images and pages, and it would be confusing to do that in TimedMediaHandler

In short

The page (e.g. the video page) should have this in the HTML (or very similar), pointing to our new REST API to be created:

<link rel="alternate"
  type="application/json+oembed"
  href="https://example.com/w/rest.php/oembed/?format=json&url=https%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile%3AThe_Mechanical_Cow_%281927%29.webm"
/>

So, this REST API should exist:

https://example.com/w/rest.php/oembed/?format=json&url=

Example REST API call:

https://example.com/w/rest.php/oembed/?format=json&url=https%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile%3AThe_Mechanical_Cow_%281927%29.webm

And that REST API should return something like this:

{
  "version": "1.0",
  "type": "video",
  "provider_name": "Wikimedia Commons",
  "provider_url": "https://commons.wikimedia.org",
  "html": "<iframe src=\"https://commons.wikimedia.org/wiki/File:The_Mechanical_Cow_(1927).webm?embedplayer=yes\" width=\"512\" height=\"384.9624060150376\" frameborder=\"0\" loading=\"lazy\" allow=\"autoplay; picture-in-picture\" allowfullscreen></iframe>",
  "width": 512,
  "height": 384.9624060150376,
  "title": "The Mechanical Cow (1927)",
  "url": "https://commons.wikimedia.org/wiki/File:The_Mechanical_Cow_(1927).webm"
}

So the client can consume the proposed HTML, that is:

<iframe src="https://commons.wikimedia.org/wiki/File:The_Mechanical_Cow_(1927).webm?embedplayer=yes" width="512" height="384.9624060150376" frameborder="0" loading="lazy" allow="autoplay; picture-in-picture" allowfullscreen></iframe>

Then, profit $$$.

Event Timeline

Thanks. Very useful.

In my knowledge, WMF has a WordPress VIP account, so they should have privileged contact with upstream, in evaluating such features. It might make sense to get this budgeted. Or, anyway, trying to get any implementation tip.

Also because there is a "contact us" section here exactly for the above goal:

https://wordpress.com/support/wordpress-editor/blocks/embed-block/#supported-services

Commons is turning 20 years in September. Imagine that, after 20 years, we are able to share media externally. That would be something interesting to have.

Mentioning T27854 as the oldest task I can find related to this task. This might even be a duplicate. ¯\_(ツ)_/¯

I don't think this is a duplicate, as that was an intent to do an extension, which might not be the solution we need.

Oh, and I can't forget this plugin developed by Sam that might be useful.

https://wordpress.org/plugins/embed-wikimedia/

Indeed, this is proof that it is needed and unsolved. The circular reasonement of asking volunteers to knock doors just to find that the door was closed, it was another department or it was indeed the correct door but they don't have any plans for solving issues is what creates frustration and disconnection between our written strategy and reality.

Well, the feature request is open. I'll hope it is resolved before 2030, when we will be "the central infrastructure of free knowledge"

The priority is the Universal Code of Conduct

Those are not exclusive: both can be done, because they should be developed by different departments.

T31242 asks for a consumer, not a provider. It shouldn't be a subtask.

T27854 asks for a provider and is apparently identical to this task.

I don't think this is a duplicate, as that was an intent to do an extension, which might not be the solution we need.

The task description doesn't specify such details.

Small question. Has anybody including the kind @CKoerner_WMF already tried to contact WordPress VIP to get a cost estimation of this feature from them?

If "probably no", well, no problem but I would probably then proceed in contacting them (I don't ask as behalf of WMF indeed - just as behalf of myself in case), reporting here what I'm allowed to report.

It doesn't seem that the issue is with WordPress, but with Commons not providing the oEmbed API. If it did, then WordPress, Mastodon, Discourse, and many other applications could embed Commons links without any changes required.

I think so... I've found a couple of useful links. Added in description. Having said that this is weird and I hope they are wrong in 2025 lol: https://github.com/Cyken-Zeraux/Mediawiki-oEmbed

Edited: it was about an oEmbed consumer, not oEmbed API

Apparently it should be very simple to provide oEmbed support into MediaWiki because it already supports this embed mode for videos:

https://commons.wikimedia.org/wiki/File:Swallowtail_Jig_-_Irish_Fiddle_Tune!.webm?embedplayer=yes

Example usage:

<iframe src="https://commons.wikimedia.org/wiki/File:Swallowtail_Jig_-_Irish_Fiddle_Tune!.webm?embedplayer=yes" width="512" height="288" frameborder="0" loading="lazy" allow="autoplay; picture-in-picture" allowfullscreen></iframe>

I put under the kind observation of a local next event. Maybe a patch can be proposed under TimedMediaHandler, with an opt-in option to keep backward compatibility.

Things you would have to do:

  1. Get preliminary support by WMF to get support on deploying a new extension as described below (because someone needs to maintain it)
  2. Create an extension to provide a new Rest (or action) API endpoint "oembed"
  3. This endpoint should allow anonymous access
  4. This endpoint should return json (and? xml ?) responses
  5. When this endpoint is approached, collect and return the minimal required information to comply with the oembed provider specification
  6. Always return of the 'rich' type.
  7. Collect media metadata information for the file (retrieve from the CommonsMetaData extension, just as MultimediaViewer does)
  8. Add 'html generators' for the blob that should be embedded. This should include crediting information, a link back to commons and the media itself (in the case of audio/video, this should use embedplayer mode, to return an embeddable player.
  9. Get everything security reviewed and WMF approved
  10. Deploy to labs
  11. Deploy to prod
  12. Get all wikipedia and wikimedia domains added to the oembed provider index.

Small question. Has anybody including the kind @CKoerner_WMF already tried to contact WordPress VIP to get a cost estimation of this feature from them?

If we are able to do what TheDJ outlines above, I would be able to work with WordPress folks to get support for Wikimedia oEmbed added to WordPress core. I'd probably be able to make the argument that we could spend a little of the budget we have for Diff (which is itself a little budget!) to help get this implemented.

I am not a developer, but I believe the change to WordPress is more social than technical. It's adding an entry in their allowed list of oEmbed providers (Step 11 in the outline above).

https://github.com/WordPress/WordPress/blob/2fda1e4d8c14ac20ef016cd1ef5fae159d4e3181/wp-includes/class-wp-oembed.php#L53

I'm trying to boost this during the cute small event wmhack Palermo 2025, Italy.

I've created a small schema to visualize how MediaWiki should interact with a generic Blog/Website, when a BlogAuthor would like to embed a generic media file available in a generic MediaWiki (including Wikimedia Commons), so, pasting an URL like https://commons.wikimedia.org/wiki/File:Lol.JPG in their blog, and expecting to embed it in that blog quickly. The flow seems this:

mermaid-diagram-2025-03-15-145407.png (795×1 px, 78 KB)

P.S. above schema generated thanks to Mermaid.js - here source:

sequenceDiagram
    BlogAuthor->>+Blog: «Bla bla see this File:Lol.JPG»
    Blog->>+MediaWiki: oEmbed Discovery<br/>https://commons.wikimedia.org/wiki/File:Lol.JPG
    MediaWiki->>+Blog: oEmbed API discovered from<br/><link rel="alternate" type="application/json+oembed" href="..."><br/><br/>where the href=<br/>https://commons.wikimedia.org/w/api.php?action=oembed&page=Lol.JPG
    Blog->>+MediaWiki: oEmbed API call<br/>https://commons.wikimedia.org/w/api.php?action=oembed&page=Lol.JPG
    MediaWiki->>+Blog: oEmbed result, with:<br/><iframe>
    Blog->>+BlogAuthor: media Lol.JPG embedded \o/

So, here what we need, very generally:

  1. a new dedicated MediaWiki API for the oEmbed
    • good news: it can have whatever URL, really
  2. having an URI suitable for iframe embedding
  3. a new HTML tag <link rel="alternate" type="application/json+oembed" href="{{ $THAT API }}" >
    • good news: very simple

Working on...

The REST API is better for endpoints that need to control the full response including the content type. The action API is opinionated about output formats. There are ways to override it, like getCustomPrinter(), but in new code it's better to use the REST API which was made for this kind of thing.

OK thanks. Premising that it just needs to return a very simple and small JSON. I will take a look at designing a new REST API...

@tstarling Do you still personally recommend this API REST approach, premising that in that way it would be something like this?

http://localhost/w/rest.php/v1/oembed/?url=http%3A%2F%2Flocalhost%2Fwiki%2FFile%3AAsd.gif&maxwidth=300&maxheight=400&format=json

I ask this since from my small perspective it seems such GET HTTP API calls would be cache-unfriendly, since the parameters must be named in that way to follow OEmbed specs.

An action=oembed response would be something like the following wouldn't it? i.e. not just provide an iframe with the ?embedplayer=yes but the actual HTML that needs to be embedded?

The spec says that "it is recommended that consumers display the HTML in an iframe, hosted from another domain." But that is meant to be implemented by the consumer, not the provider, I think.

{
	"version": "1.0",
	"type": "video",
	"provider_name": "Wikimedia Commons",
	"provider_url": "https://commons.wikimedia.org/",
	"width": 3840,
	"height": 2160,
	"title": "File:Landwasserviadukt, aerial video.webm",
	"author_name": "Video: Capricorn4049 Audio: Kevin MacLeod",
	"html":
		"<div id=\"videoContainer\"><span><video id=\"mwe_player_0\" poster=\"https://upload.wikimedia.org/wikipedia/commons/thumb/7/72/Landwasserviadukt%2C_aerial_video.webm/3840px--Landwasserviadukt%2C_aerial_video.webm.jpg\" controls=\"\" preload=\"auto\" data-mw-tmh=\"\" class=\"mw-tmh-inline\" width=\"3840\" height=\"2160\" data-player=\"fillwindow\" playsinline=\"\" data-durationhint=\"170\" data-mwtitle=\"Landwasserviadukt,_aerial_video.webm\" data-mwprovider=\"local\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.480p.vp9.webm\" type=\"video/webm; codecs=&quot;vp9, opus&quot;\" data-transcodekey=\"480p.vp9.webm\" data-width=\"854\" data-height=\"480\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.720p.vp9.webm\" type=\"video/webm; codecs=&quot;vp9, opus&quot;\" data-transcodekey=\"720p.vp9.webm\" data-width=\"1280\" data-height=\"720\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.1080p.vp9.webm\" type=\"video/webm; codecs=&quot;vp9, opus&quot;\" data-transcodekey=\"1080p.vp9.webm\" data-width=\"1920\" data-height=\"1080\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/7/72/Landwasserviadukt%2C_aerial_video.webm\" type=\"video/webm; codecs=&quot;vp8, vorbis&quot;\" data-width=\"3840\" data-height=\"2160\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.144p.mjpeg.mov\" type=\"video/quicktime\" data-transcodekey=\"144p.mjpeg.mov\" data-width=\"256\" data-height=\"144\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.240p.vp9.webm\" type=\"video/webm; codecs=&quot;vp9, opus&quot;\" data-transcodekey=\"240p.vp9.webm\" data-width=\"426\" data-height=\"240\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.360p.webm\" type=\"video/webm; codecs=&quot;vp8, vorbis&quot;\" data-transcodekey=\"360p.webm\" data-width=\"640\" data-height=\"360\"><source src=\"https://upload.wikimedia.org/wikipedia/commons/transcoded/7/72/Landwasserviadukt%2C_aerial_video.webm/Landwasserviadukt%2C_aerial_video.webm.360p.vp9.webm\" type=\"video/webm; codecs=&quot;vp9, opus&quot;\" data-transcodekey=\"360p.vp9.webm\" data-width=\"640\" data-height=\"360\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=de-at&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"de-AT\" label=\"Österreichisches Deutsch ‪(de-at)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=de-ch&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"de-CH\" label=\"Schweizer Hochdeutsch ‪(de-ch)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=de-formal&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"de-x-formal\" label=\"Deutsch (Sie-Form) ‪(de-formal)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=de&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"de\" label=\"Deutsch ‪(de)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=en-ca&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"en-CA\" label=\"Canadian English ‪(en-ca)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=en-gb&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"en-GB\" label=\"British English ‪(en-gb)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=en&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"en\" label=\"English ‪(en)‬\" data-dir=\"ltr\"><track src=\"/w/api.php?action=timedtext&amp;title=File%3ALandwasserviadukt%2C_aerial_video.webm&amp;lang=rm&amp;trackformat=vtt\" kind=\"subtitles\" type=\"text/vtt\" srclang=\"rm\" label=\"rumantsch ‪(rm)‬\" data-dir=\"ltr\"></video></span></div>",
}

(It looks like there are a few unqualified URLs in there, not sure if that's an issue.)

@Samwilson I think that's exactly what we should do, yep!

Do you have any particular recommendation between "traditional" mw API, vs "more" REST API?

Giving this context, I think a "traditional" mw API would be better, since I guess that REST APIs are more covered by a little more aggressive cache layer - https://wikitech.wikimedia.org/wiki/File:MediaWiki_infrastructure_2022.png

If it were just JSON (as above) I'd say the Action API would be good, but there's also a need to return the same structure as XML with a root <oembed> element… I don't think that's possible with the Action API (the top level will be an <api> element). So maybe it has to be REST?

OK. Premising that I don't think we really need to expose XML. Having just JSON fits our known use-cases AFAIK

Edited: I mean, we can return HTTP status code 501 Not Implemented for XML, and never announce XML in our discovery.

@tstarling Do you still personally recommend this API REST approach, premising that in that way it would be something like this?

http://localhost/w/rest.php/v1/oembed/?url=http%3A%2F%2Flocalhost%2Fwiki%2FFile%3AAsd.gif&maxwidth=300&maxheight=400&format=json

Yes.

I ask this since from my small perspective it seems such GET HTTP API calls would be cache-unfriendly, since the parameters must be named in that way to follow OEmbed specs.

I think the main drawbacks of the REST API are: less documentation, no self-documentation, no ApiSandbox and fewer examples to follow. Caching is not significantly different for your purposes. In the action API, the format query parameter belongs to ApiMain, you're not meant to interpret it yourself. For this URL format, it's better to use the REST API where handlers own all the query parameters and can interpret them freely.

Ouch the wmhack 2025 is already terminated and I've invested 70% of my time on Phorge, and the rest for sleeping and eating and recovering lol, so I've terminated my free time for this too. Un-claiming the task to attract new hackers.

Hoping to be useful to newcomers (but also to myself in the future), here what I know:

  • for content pages, oembed MAY need extension TextExtract - e.g. to produce short page introduction
  • for videos, oembed MAY need the extension TimedMediaHandler - e.g. it supports the ?embedplayer=1 parameter
  • for images, oembed MAY need the extension PageImages (who provides stuff for the OpenGraph protocol) - e.g. to get thumbnail dimensions
  • as already said, oembed SHOULD be implemented with REST APIs, not action APIs, to manage JSON (most easiest format required), but also to potentially support XML in the future as bonus point without rewriting everything

Pitfalls:

  • it's still nowhere clear where this feature should be introduced since it seems it should rely on multiple already-existing extensions, so I would personally encourage the future lazy hacker to go ahead with a little early dirty prototype in the core of MediaWiki, so, creating a prototype to be potentially put in the trash bin later, but veeery useful to quickly highlight absurdities in this path and highlight rabbit holes
  • probably a first release should just concentrate on videos (so, relying on the ?embedplayer=1 thing), but this seems not a good reason to expand TimedMediaHandler directly, IMVHO, since the overall goal is also to support also normal images and pages, and it would be confusing to do that in TimedMediaHandler.

P.S. ouch, the wmhack is concluded and I've done priority to Phorge lol (T392411) and I was barely able to survive the ~11 Phorge tasks I've claimed https://we.phorge.it/tag/wikimedia_hackathon_2025/ ), true fun, but this means this oembed thing still needs an hero, but I'm not that hero loool :D

TLDR new hacker welcome on oembed, or new hackathon needed lol

P.S. it may be good to introduce a very abstract oembed provider in the core REST APIs (that does nothing as default), so that extensions can implement their own embedprovider(s). E.g. so that if you installed TimedMediaHandler it can give oembed support to their videos.