Page MenuHomePhabricator

Commons videos not indexed by Google
Closed, ResolvedPublic5 Estimated Story Points

Description

Google Search Console says that of 482K crawled file description pages for videos, only 74 are indexed as videos. Indexing almost always fails with the error message "Video isn't on a watch page".

The linked documentation says:

The video doesn't seem to be on a watch page. A watch page's main purpose is to show a user a single video; only videos that are on a watch page are eligible for indexing. Here are some examples of page types where the video is supplementary to the textual content, and not a watch page:

  • A blog post where the video is complementary to the text rather than the primary content of the page
  • A product details page with a complementary video
  • A video category page that lists multiple videos of equal prominence

For more details, learn how to create a dedicated watch page.

If you're sure your page is designed to focus on a single video, use the URL inspection tool to check to make sure the video is showing up in the rendered HTML. Try moving the video container to a higher position in the HTML.
If you’re using a paywall, add structured data for paywalled content to prevent crawl issues.

I inspected a few of the 74 indexed videos, but they seem unremarkable. Like other Commons videos, they abuse the hProduct microformat and so Google detects them as product pages. Like other Commons videos, Google detects two videos on each page, because the file history thumbnail counts as a video. Some have non-canonical URLs, despite the rel=canonical tag being present in the crawled HTML. Some of them are not even videos, like this image of an S-Video socket.

Solutions can be tested on a separate domain, there's no indication that Google is treating Commons specially.

More specific docs:

Tests on people.wikimedia.org:

FileDescriptionWatch page?
rendered.htmlpost-load outerHTML of File:Wikipedia User Name MEDIUM.ogv with some resource locations adjustedNo
strip-filetoc.htmlAs above with filetoc box removedNo
strip-namespaces.htmlAs above with all tabs removedNo
strip-header.htmlAs above with search bar etc. removedNo
rich.htmlrendered.html with valid JSON-LD VideoObjectNo
source.htmlAs above but <video> has <source>No
rs-strip-extreme.htmlAs above with almost all content and navigation removedYes
strip-extreme.htmlAs above without <source>Yes
strip-content.htmlrendered.html with content area heavily stripped, nav areas intactNo

Event Timeline

CommonsMetadata extension outputs the ImageObject JSON+LD for images, but not the VideoObject for our video pages. That might be an improvement. Additionally, the open graph for all our pages designates type:website, which might reduce the chance that Google is able to recognize this as a video watch page as well I'm guessing.

they abuse the hProduct microformat

Do they? Or does Google fail to understand it?

Each media file is a product that Commons offers to the world, albeit for a price of $0. It is a product of the person or organisation who made it.

There is nothing in the microformat specification that required the product to have a price; or to be sold by a commercial organisation.

they abuse the hProduct microformat

Do they? Or does Google fail to understand it?

I was just saying I don't think it's relevant to this task. Let's keep the discussion on that at T54647 or some other more relevant place.

CommonsMetadata extension outputs the ImageObject JSON+LD for images, but not the VideoObject for our video pages. That might be an improvement. Additionally, the open graph for all our pages designates type:website, which might reduce the chance that Google is able to recognize this as a video watch page as well I'm guessing.

WikibaseMediaInfo is generating a VideoObject on video pages, in a link tag rather than a script tag. But it contains none of the three properties required by Google, namely name, thumbnailUrl and uploadDate.

three properties required by Google, namely name, thumbnailUrl and uploadDate.

Sounds like more trouble, as wikibasse of course doesn't track some of those specific properties, specifically thumburl.. I also doubt if the Google indexer actually descends into a linked json+ld document, their documentation pages don't describe it anywhere and validator.schema.org doesn't find it either..

Flickr ImageObject

Sounds like more trouble, as wikibasse of course doesn't track some of those specific properties, specifically thumburl..

Look at MediaInfoSpecificComponentsRdfBuilder::addFileMetadataFromFile(). It's basically a reimplementation of CommonsMetadata. It has a File object and uses it to get things like width and height, those aren't coming from Wikibase. It uses File::getMediaType() to decide what object type to use, just like CommonsMetadata. It could get the thumbnail URL.

The CommonsMetadata extension page says it's a temporary solution, to be replaced by "Wikidata on Commons".

I also doubt if the Google indexer actually descends into a linked json+ld document, their documentation pages don't describe it anywhere and validator.schema.org doesn't find it either..

Confirmed. I used Google's rich results test on an editable copy of a Commons page. With appropriate JSON-LD embedded in the page in the manner recommended by the spec, the video is detected. With the same JSON linked with <script src=...> or <link>, the video is not detected.

Unfortunately, the live test tool doesn't tell you whether the page is a watch page, I think you would have to wait for indexing for that. The documentation doesn't say that structured data affects watch page detection, so I'm skeptical about that.

I copied a Commons video file description page, made a few variants, uploaded them to people.wikimedia.org, and requested indexing. I got the error "Video isn't on a watch page" on all variants, including the one with valid JSON-LD (rich.html).

My test pages are based on the outerHTML of the Commons page after it has finished loading. Based on Google's documentation, I think this is roughly what the indexer would see for a Commons page. The scripts aren't functional, so it doesn't actually have a functioning video player, but the indexing stage doesn't run scripts or click on things so I think that is realistic.

I'll submit a few more variants.

Per the latest tests reported in the task description, it would seem that Vector navigation alone is enough to disqualify the page from being a watch page. We could have a separate watch page, like [[File:Video.webm/watch]] or [[Special:Watch/Video.webm]], but it would need to override the skin to strip out the navigation.

I don't know how anything on YouTube could be classified as a watch page with such rules.

tstarling changed the task status from Open to Stalled.Jul 24 2025, 9:26 PM

I'm not working on this anymore.

FYI: I mentioned this task at Mobile domain sunsetting/status § 2025-07-25:

[…] We met with Google Search folks on Thursday, where we learned that T396168 (Google Search not indexing Commons videos in "Videos" search) is likely a bug in Google instead of an issue with our markup, and Google is now looking into it.

tstarling changed the task status from Stalled to Open.Aug 10 2025, 11:38 PM

It looks like Google has made a tweak and videos are now being indexed. Let's wait a couple of weeks and then check the stats.

tstarling claimed this task.
DateVideos indexed by Google
2025-06-0274
2025-06-0970
2025-06-1674
2025-06-2374
2025-06-30362
2025-07-07475
2025-07-14546
2025-07-21630
2025-07-28625
2025-08-0417171
2025-08-1154258
2025-08-1867418

The number of videos (img_media_type='VIDEO') on Commons is about 364k.

There are 112k pages with the status "Video not processed yet", so the numbers should continue to improve.

There are still 524k pages with the status "video isn't on a watch page", but scrolling through the first 500, it looks like most are not file description pages, they are mostly categories.

A 1000x improvement is good enough. This task is resolved. Thanks @Krinkle for raising it with Google.

Yes good to see this improved so much ! amazing work.

It might be worth it to put out a User-notice for this

something like: "Wikimedia discovered that Commons videos were not indexed by Google. This was reported to and subsequently fixed by Google" ?

https://meta.wikimedia.org/w/index.php?title=Tech/News/2025/35&diff=prev&oldid=29165848

Shall we make a new subtickets for the JSON-LD part of this, so that this ticket is fully wrapped up ?
This part is now split off into T411108: Add VideoObject JSON-LD to video File pages

Shall we make a new subtickets for the JSON-LD part of this, so that this ticket is fully wrapped up ?
This part is now split off into T411108: Add VideoObject JSON-LD to video File pages

This task is closed already. JSON-LD turned out to be unrelated.