Page MenuHomePhabricator

File captions sometimes do not display
Closed, ResolvedPublic

Description

On some files with German captions, the caption is displayed (as a collapsed language):

Screen Shot 2019-01-16 at 4.24.38 PM.png (726×645 px, 230 KB)

On other files with a German caption, it is not:

Screen Shot 2019-01-16 at 4.24.12 PM.png (680×651 px, 345 KB)

Event Timeline

Strangely, I now do not see a German caption for the image in my original screenshot:

Screen Shot 2019-01-16 at 4.32.22 PM.png (679×665 px, 225 KB)

But I still do in another image:
Screen Shot 2019-01-16 at 4.32.39 PM.png (727×668 px, 194 KB)

Abit renamed this task from I see German captions on some but not all files to I see German captions on some but not all files with German captions.Jan 17 2019, 12:55 AM

Initial testing shows this one to be kind of odd. It seems sometimes this works and sometimes it doesn't, and it's not entirely clear why, even with files that were uploaded by the same user around the same time.

Perhaps the best way to test this is to look at all the files uploaded by GPSLeo on Jan 13 (https://commons.wikimedia.org/wiki/Special:Log?type=upload&user=GPSLeo&page=&wpdate=2019-01-13&tagfilter=&subtype=upload)

All of those files display the problem in question EXCEPT the Lippendorf Power Station XX series. For all of those, the German caption appears just fine. Only difference I can initially see between the sets is that the Lippendorf files are used by a page, and the others are not.

As far as I can tell, this problem does not occur with newly uploaded files.

Pinging @Cparle and @Jdforrester-WMF for thoughts?

There's something up with the parser cache, afaics - if you purge the cache on any page with the caption missing then it shows up. Was actually talking to Matthias about this earlier - it's the same for captions I uploaded last week, in some cases the captions I uploaded are missing, but they show up when you do ?action=purge

Here's an example (please don't purge the cache on this page, or nobody else will be able to see what's wrong) https://commons.wikimedia.org/wiki/File:Landscape_around_surface_mine_Schleenhain_10.jpg - if you click on 'history' you can see that a caption was uploaded, but it's not getting shown. The fact that it works for new captions means the parser cache is being invalidated when something is added first, which suggests that the cache is being overwritten somehow, by some process that is not MCR-aware

We had a similar problem before with RefreshLinksJob, which @daniel fixed with this patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/465157

Could there be some other job that's not MCR-aware?

I think i it themes fixed now?
So only as additional information (I do not know if it is relevant): When I uploaded the files the field for the caption in UploadWizard was the same type as the description field and not the one-line field like it is now.

Ok I purged all the Landscape_around_surface_mine_Schleenhain pages *except* for _10, and they all display properly now.

Let's see if they stay in an ok state - if not then there's something weird going on

Cparle renamed this task from I see German captions on some but not all files with German captions to File captions sometimes do not display.Feb 7 2019, 1:51 PM
Cparle updated the task description. (Show Details)

Another thing - I wonder could @GPSLeo 's action on Jan 18 (see here https://commons.wikimedia.org/w/index.php?title=File:Lippendorf_power_station_04.jpg&action=history - according to @Abit 's comment above this image used to be ok) have triggered a non-MCR-aware job that corrupted the parser cache? @Jdforrester-WMF @MarkTraceur @matthiasmullie do you know of any jobs that might be triggered by such an action?

Not off the top of my head. I'm minded to mark this as Resolved unless we're getting on-going issues.

I don't think this is a matter of parser cache getting corrupted.

Right now, I'm getting a JS notice that's essentially caused by node .wbmi-entityview-content not being found.
We recently changed that classname: that used to be .filepage-mediainfo-entityview.
.filepage-mediainfo-entityview is what is still being output, because that's still in cache.

RWD to a few weeks ago, this is the diff of what got deployed.
Part of those changes were the .caption -> .filepage-mediainfo- renames. So, exact same scenario.

I believe it's just a matter of our new JavaScript (expecting new classnames) being served immediately after deploy, while ignoring that we have caches that won't expire for another while (24 days, apparently)
We should 1/ try to minimize markup changes (in PHP output), and 2/ when we do, make our relevant JS backwards-compatible for a little while longer.
And ideally, to be able to close this ticket, 3/ purge all existing pages with captions, just to make sure they're good right now.

We changed the name in d1738c877111e273c2c6e33b4223c640fb66a82a which just missed wmf.13; wmf.14 was deployed to Commons on Wednesday, 23 January so the Varnish cache should theoretically by now be emptying and replaced with new renders which the DOM labelling it wbmi-entityview-content and not filepage-mediainfo-entityview, unless our transforms are entering the parser cache (which I thought we avoided with the hooks we were using).

Agreed about the DOM changes, I didn't think about the way the JS interacts with stale content.

All existing caches have been purged, which should solve the effects we were seeing here.
It appears there may not have been an actual issue at play, so there's nothing left to do (just need to be aware of different versions of markup in cache)

Unfortunately, we're still getting sporadic instances of this bug on some pages. Either all the caches didn't actually get purged, or this is a more insidious bug.

All existing caches have been purged, which should solve the effects we were seeing here.
It appears there may not have been an actual issue at play, so there's nothing left to do (just need to be aware of different versions of markup in cache)

I think we'll need to do another purge - markup has changed again because of depicts, and then when that stuff was deployed for the CC0 issue it'll have caused this all over again

(sighs)

we're gonna have to manage markup changes a bit better than this ...

@matthiasmullie could you kick off another purge?

Ran purge again on same dataset - a few pages between when that was generated & deploy was executed might be missed: I suggest manually purging those, on the off-chance you stumble upon one (before it falls out of cache anyway)