Page MenuHomePhabricator

hascaption includes files that have had their captions removed
Closed, ResolvedPublicBUG REPORT

Description

User story: N/A

We have this:
hascaption (including hascaption:*) currently returns all files that ever had a caption, even if that caption has been removed via reversion or edit.

As an example, see the history of https://commons.wikimedia.org/wiki/File:ESRB_2013_Mature.svg as an example. Despite no longer having captions, it shows up in the results for this hascaption search

We want this:
The index needs to properly updated when data is removed, and hascaption/inlabel/incaption need to reflect those changes.

Screenshots (if possible):

Acceptance Criteria:

During development, please test the following:

  • Test this feature while logged in AND logged out
  • Test this feature on at least one mobile browser
  • Test that this feature works on the file page AND the Add Data step on UploadWizard (if applicable, some features only exist on one or the other)

Details

Related Gerrit Patches:
mediawiki/extensions/WikibaseMediaInfo : masterAlways report indexable fields on NS_FILE

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 22 2019, 6:33 PM
Ramsey-WMF triaged this task as High priority.Aug 22 2019, 6:33 PM
EBernhardson added a comment.EditedAug 29 2019, 5:06 PM

Not sure the right way to go about it, but the problem is essentially here:

https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/src/WikibaseMediaInfoHooks.php#L597

After the label was removed there is no longer a MediaInfo slot on the page, so the MediaInfo indexing code doesn't run. That code needs to provide CirrusSearch with an emtpy array as the value for the stored document to empty out the field.

A reasonable way forward might be to always provide the MediaInfo data for appropriate namespaces, giving a sane empty-value when no MediaInfo exists.

@dcausse thoughts?

@EBernhardson sorry missed your ping.
Yes we need to do like what we do for GeoData: always set an empty array when the data is not available.
My main concern is to avoid triggering a reindex for all the current Pages. Perhaps the super noop has some feature to help with this?

Change 538066 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikibaseMediaInfo@master] Always report indexable fields on NS_FILE

https://gerrit.wikimedia.org/r/538066

Change 538066 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Always report indexable fields on NS_FILE

https://gerrit.wikimedia.org/r/538066

debt closed this task as Resolved.Sep 30 2019, 4:20 PM
debt added a subscriber: debt.

closing as this will go out into production this week, yay! :)