Page MenuHomePhabricator

Structured data not visible in structured data tab on a lot of files
Closed, ResolvedPublicBUG REPORT

Description

I noticed when browsing through https://commons.wikimedia.org/w/index.php?title=Special:ListFiles/GeographBot&ilshowall=1 that the structured data tab is empty.

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
The structured data statements are not visible, the interface is empty

What should have happened instead?:
All the added statements should be visible so a user can update

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

Event Timeline

I was able to reproduce this on https://commons.wikimedia.org/wiki/File:Pheasant_in_the_stubble_-_geograph.org.uk_-_3185682.jpg, and a purge fixed it. Sounds like T299356: Depicts statements are shown in WCQS but not in SDC is more widespread than it might have seemed at first.

Plenty of other files at https://commons.wikimedia.org/wiki/Special:ListFiles/GeographBot that have the same problem. This bot uploads the file, adds the structured data and does a null edit to make it show up in the template (the wikitext is just one template).

@Multichill is the bot just using wbsetclaim then a null edit? Are you getting this with any of your other bots?

@Multichill is the bot just using wbsetclaim then a null edit? Are you getting this with any of your other bots?

No, using wbeditentity, see https://github.com/multichill/toollabs/blob/master/bot/commons/geograph_uploader.py#L119 . I did notice it on other files too. Looking at the recent contributions of my other bot, I found https://commons.wikimedia.org/wiki/File:Casco_Hist%C3%B3rico_de_Santiago_28IX2007_8.JPG which had the first structured data added in 2020. I do see the structured data in the source html

@LucasWerkmeister is this something that might be on the wikibase side?

It might be, but it wouldn’t be my top guess at this point… if the bug is that edits to non-main slots don’t (always) purge the parser cache, then that feels more like a MediaWiki core bug. But I don’t know enough about WikibaseMediaInfo to be sure that the parser cache is even relevant (I don’t know where this empty UI comes from).

This also affects statements added by the upload wizard – I accidentally added another “depicts” to this file because I didn’t see the existing one (so I thought it hadn’t been added, as a result of a different bug, because I didn’t remember this bug in time).

Actually, no, it’s probably not the upload wizard directly. On this photo, the depicts statements were visible initially (i.e. after UploadWizard) was done – but then I added a category with HotCat, and then the structured data suddenly disappeared.

The issue did not happen on this photo, where I used HotCat’s “modify several categories” mode, which made the edit through action=submit. So it’s looking like it’s limited to API wikitext edits.

Hello. This issue is very troublesome for me (at least I think it is the same issue). It happens just about every time I add a "depicts" property to Commons using QuickStatements. I add a caption at the same time - that works fine. The file page history is correct. Then I look interactively at the "Structured data" tab in Commons and there is nothing there under "depicts" or anything (really there are many claims). A current example showing the problem is File:Baltazaria octopodites (10.3897-mycokeys.37.26303) Figure 3.jpg.
I don't know how to make a null edit interactively, but I find that if I next add a "depicts" claim interactively, it suddenly starts working and I can see all the old claims. So I suppose it must be a bug in the "Structured data" display page(?).

AIUI, this has thus far been observed to occur via:

  • HotCat (API wikitext edits)
  • GeographBot & DPLA bot (wbeditentity)

It has not yet been observed via direct edits on the "Structured data" tab of file pages (via wbsetclaim), but it's entirely possible that the issue happens there as well, but has gone unnoticed because the interface doesn't refresh. But we do know that it, if it does happen there as well, it's certainly not consistent: it does work fine at least sometimes; for sure all time's I've manually checked.

Has anyone witnessed this happen when there was pre-existing structured data?
If yes, did it then render a version with the previous data, or did that also blank out the entire page (until purge)?

I have tried a couple dozen edits locally, in various setups, via wbeditentity (like GeographBot), but failed to reproduce this issue (for reference, somewhere between 10-20% of GeographBot's upload seem to exhibit this in prod)

This kind of reminds me of T237991, where MW core fires a hook pre-save, which via Extension:TemplateData (or potentially any other) triggers a render of the unsaved revision, which could result in lookups for (and caching of those results) data that has not yet been persisted. I failed to reproduce this particular issue under those circumstances, though, so it's probably not directly related.

It is almost certainly a (parser) cache issue, because a simple purge seems to consistently fix the problem.
Post-save null edits don't seem to adequately remedy this issue, though. Has anyone tried running those null edits like a minute or so after the edit rather than immediately after? did that changed anything?

I'm out of ideas for now.

I wasn't using HotCat, GeographBot or DPLA bot. I think it happens to me consistently when I use AutoWikiBot.

In my case there was pre-existing structured data and the page was ENTIRELY blanked, the older data were not visible. Now File:Baltazaria octopodites (10.3897-mycokeys.37.26303) Figure 3.jpg is OK, I don't know why.

I would be interested to know how to do a simple purge (apart from adding a dummy claim and then deleting it, which always works for me).

Has anyone witnessed this happen when there was pre-existing structured data?
If yes, did it then render a version with the previous data, or did that also blank out the entire page (until purge)?

Yes, and it rendered a blank structured data tab, not a previous version:

On this photo, the depicts statements were visible initially (i.e. after UploadWizard) was done – but then I added a category with HotCat, and then the structured data suddenly disappeared.

I've noticed this as well. I edited the structured data on https://commons.wikimedia.org/wiki/File:GLASER-DIRKS_100G.jpg with QuickStatements, and everything looked fine. I then edited the wikitext with the normal editor (though it was opened by https://add-information.toolforge.org) and the structured data disappeared. Purging and null editing didn't help, and another editor could see it fine. Only after saving another edit to the same page did it reappear.

I also noticed this issue on files I uploaded, for example at https://commons.wikimedia.org/wiki/File:Sw-ke-angaza.flac

If I go on "History", I see the structured data on previous revisions, but it no longer shows up in the latest revision by "VRTS Migration Bot", even though that bot did not change the structured data at all.

Previous revision (shows structured data): https://commons.wikimedia.org/w/index.php?title=File:Sw-ke-angaza.flac&oldid=532398159
Current revision (no more structured data): https://commons.wikimedia.org/w/index.php?title=File:Sw-ke-angaza.flac&oldid=565770723

Interestingly, when I asked about this issue on Discord, someone else could see the structured data on that file. But that user ran into the same issue on another file later (see AntiCompositeNumber 's comment above)

As a note, I would like to add that I am seeing this occur with AC DC (https://commons.wikimedia.org/wiki/Help:Gadget-ACDC). After adding a depicts statement, the structured data tab on the file is empty. An example is (https://commons.wikimedia.org/wiki/File:Calvin_Coolidge,_head-and-shoulders_portrait,_right_profile_LCCN2005676159.jpg). The file had a date added to structured data in April 2020, and the depicts added by myself in April 2022.

As a note, the data appears to be there, as the file is found when using the search function "haswbstatement:P180=Q36023" (Q36023 is the statement depicts added).

I also just ran into it today with an edit from the default SDC interface.
We're "suddenly" seeing this so frequently and from so many sources now, that this must be a recent-ish regression somewhere.

The only change in MediaInfo that I can find that comes anywhere near to touching how data is handled or rendered, would be support for references. It seems implausible that that broke it, though.

The raw ParserOutput for affected pages doesn't contain any content. It has a <h1 class="mw-slot-header"><mw:slotheader>mediainfo</mw:slotheader></h1> (so the slot is known to exist), but no content to follow it (we would expect a <mediainfoview style="display: none"> tag where all content is wrapped inside)

That missing content is supposed to be provided via MediaInfoView::getContent(), so this one either:

  • is not called,
  • throws an InvalidArgumentException because !( $entity instanceof MediaInfo ), or
  • end up failing hard (not simply producing empty content, because we'd still see the wrapper node in that case) somewhere the code that is supposed to render the code

I can't find any traces of the last 2 cases in logstash (and they would probably also affect the slotheader node anyway, which isn't rendered until after the content rendering has executed)

MediaInfoView is a VIEW_FACTORY_CALLBACK, and that getContent method is supposed to be called from inside FullEntityParserOutputGenerator::addHtmlToParserOutput().
This call does not happen if:

  • it's a redirect, which is not the case here,
  • !$content->getEntityHolder() (in EntityHandler::fillParserOutput()),
  • $generateHtml is false (in FullEntityParserOutputGenerator::getParserOutput)_, where it's been passed down to from EntityHandler::fillParserOutput()) - this seems rather unlikely since I believe this originates from RevisionRenderer::combineSlotOutput(), where that same value is used to generate the mw:slotheader tag that does exist.

Trying to render the mediainfo slot now certainly does work (but we already knew that: purge the page and it rerenders just fine...)
Trying to render a content object without holder indeed results in a completely empty string, like we're seeing here.

$page = WikiPage::newFromId(...);
$revisionRecord = $page->getRevisionRecord();
$parserOptions = $page->makeParserOptions( 'canonical' );
MediaWiki\MediaWikiServices::getInstance()->getContentRenderer()->getParserOutput(
    new Wikibase\MediaInfo\Content\MediaInfoContent(), // vs $revisionRecord->getContent('mediainfo'), for actual content, or Wikibase\MediaInfo\Content\MediaInfoContent::emptyContent() for empty content with holder
    $page,
    $revisionRecord->getId(),
    $parserOptions,
    true
)->getRawText();

AFAICT, $content->getEntityHolder() can only be null when it was explicitly created as such via MediaInfoHandler::makeEmptyContent(), which could happen from multiple places. Those all seem to target the main slot, though, so should be irrelevant...

A lot of this ContentHandler/ParserOutput-related code was refactored (T287158) in Oct-Nov 2021, which isn't too long before this issue started becoming apparent. I suspect this may be related to this issue (either directly, or indirectly via now-wrong assumptions elsewhere)

Quick recap:

  • RevisionRenderer::combineSlotOutput() gets called since it is able to generate the <mw:slotheader> node
    • That means that the mediainfo role is known to exist on the page and part of $slots
    • That also means it was called with $hints['generate-html'] (or missing attribute), but of which will cause $withHtml to be true (which is required to render above node)
  • In addition to that slotheader node, combineSlotOutput is also meant to append the slot content, which appears to be an empty string (or at the very least something that casts to it)
    • That content is supposed to come from RenderedRevision::getSlotParserOutput() (which gets the same $withHtml value, i.e. true)
    • Unless there already is a $this->slotsOutput[ $role ] with empty string (not null!) getText() result, this will get its content from RenderedRevision::getSlotParserOutputUncached()
      • NOTE: it is possible to generate empty string $this->slotsOutput[ $role ] via generate-html => false; subsequent calls (even with different hints) will result in the same empty string result! Need to investigate further to see whether this also actually happens (seems unlikely because RevisionRenderer::getRenderedRevision creates a new RenderedRevision instance every time; though then again it might via $hints['known-revision-output'])
    • That gets its result from ContentRenderer::getParserOutput()
    • That gets it from MediaInfoHandler extends EntityHandler extends ContentHandler::getParserOutput()
    • That gets it from MediaInfoHandler extends EntityHandler::fillParserOutput()
    • Unless the page is a redirect or has no entity holder, that'll get its thing from MediaInfoHandler extends EntityHandler::getParserOutputFromEntityView()
    • Which will get it from FullEntityParserOutputGenerator::getParserOutput()
    • Unless $generateHtml is false, this one will get it from FullEntityParserOutputGenerator::addHtmlToParserOutput()
    • That'll invoke MediaInfoView::getContent() (via DispatchingEntityViewFactory::newEntityView()), which is guaranteed to return *some* content (wrapper div)

To be continued.

I have been seeing this issue happen on most of my Wikimedia Commons uploads; this is my first time trying to add structured data manually so at first I thought this might be normal and there was some kind of waiting period before you could enter it. Seeing this made me realize it really is a bug.

I suspect a common factor in the ones which do not show the structured data page is that they are ones for which I elected to "skip" entry at first through the initial upload interface, then tried to add using the tab on the image page after upload. I have not tested this explicitly yet though. Some of the bot additions of structured data show up in the tab for images I have uploaded, but some do not.

Structured data is supposed to be the replacement of wikitext editing and the future. It's very broken for months now. Is nobody responsible for Commons anymore these days?

Looks like this could have the same root cause as T299896.

I am hopeful that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/785247 (cc @LucasWerkmeister) will soon resolve this.
That, in combination with presumptuous object caching in RenderedRevision would lead to html-less content being stored in cache.
Here's a quick breakdown of how this would happen:

  • PageUpdater::saveRevision calls onMultiContentSave with $renderedRevision (= $this->derivedDataUpdater->getRenderedRevision())
  • PageUpdater::saveRevision calls PageUpdater::doModify, which creates a deferred that'll execute PageUpdater::getAtomicSectionUpdate, which calls DerivedPageDataUpdater::doUpdates, which in turn calls DerivedPageDataUpdater::triggerParserCacheUpdate, which then calls DerivedPageDataUpdater::doParserCacheUpdate, then DerivedPageDataUpdater::getCanonicalParserOutput, which is the result of $this->getRenderedRevision()->getRevisionParserOutput
    • This one is supposed to default to generate-html === true, but because it already had cached output and that $this->revisionOutput->hasText() returned true, it would end up using the output that was initially generated through 'generate-html' => false

Since an above comment stated that logs were not available, linking to another occurrence where a "normal" (non-data) edit led to existing data being seemingly lost, just two days ago: https://commons.wikimedia.org/wiki/File_talk:Lauren_Holly_-_FEBRUARY_*World_Premiere*.jpg

The issue seems to be fixed in commons beta. The fix is scheduled to be deployed wmf.11 (wmf.12) next week.

Testing notes to check in production:

  • go to any file
  • switch to Structured data - add, delete, modify any of the statements
  • Save
  • click to reload the page - the Structured data will disappear

File:Cape Blanco - JUNE *Lighthouse fog*.jpg- shows that SD info was added, but it is not displayed.

I was not yet able to reproduce the issue with non-Structured data changes.

The issue seems to be fixed in commons beta. The fix is scheduled to be deployed wmf.11 (wmf.12) next week.

Is this "fixed" in the sense that it will no longer occur, or is there a way to also make sure all files being affected at the time of deployment are also fixed?

@Dominicbm as I read it, this issue is because a 'bad' render state is being stored in the parser cache. The parser cache should automatically expire after 3 weeks, so any file page affected should get updated at the very latest, 3 weeks from deploy (deploy should normally be in 24 hours from now) as I understand it. The new uploads should no longer show this problem directly after the deploy (because they are not yet in a cache).

In the mean time any sort of template change or a manual purge should also clear the cache of an affected file within within those 3 weeks (Same as right now).

@TheDJ Thanks for the info. That is helpful just to know this shouldn't persist beyond 3 weeks after a fix!