Page MenuHomePhabricator

Usage and linking of a page or file through a redirect is not reported by API query for linkshere and fileusage
Closed, InvalidPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
A page that clearly uses and links the file isn't reported in an API query for linkshere and fileusage.

What should have happened instead?:
Pretty sure the API usually reports pages that link or use the file..

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:
https://commons.wikimedia.beta.wmflabs.org/

https://commons.wikimedia.beta.wmflabs.org/w/api.php?action=query&prop=fileusage&fuprop=title&fulimit=500&format=json&titles=File%3AGg76f2dbef4cc9_24_43.jpg:

{"batchcomplete":"","query":{"normalized":[{"from":"File:Gg76f2dbef4cc9_24_43.jpg","to":"File:Gg76f2dbef4cc9 24 43.jpg"}],"pages":{"100000":{"pageid":100000,"ns":6,"title":"File:Gg76f2dbef4cc9 24 43.jpg"}}}}

https://commons.wikimedia.beta.wmflabs.org/w/api.php?action=query&prop=linkshere&lhlimit=500&format=json&titles=File%3AGg76f2dbef4cc9_24_43.jpg:

{"batchcomplete":"","query":{"normalized":[{"from":"File:Gg76f2dbef4cc9_24_43.jpg","to":"File:Gg76f2dbef4cc9 24 43.jpg"}],"pages":{"100000":{"pageid":100000,"ns":6,"title":"File:Gg76f2dbef4cc9 24 43.jpg","linkshere":[{"pageid":104137,"ns":6,"title":"File:Gg76f2dbef4cc9 24 39.jpg","redirect":""},{"pageid":104138,"ns":6,"title":"File:Gg76f2dbef4cc9 24 40.jpg","redirect":""},{"pageid":104139,"ns":6,"title":"File:Gg76f2dbef4cc9 24 41.jpg","redirect":""},{"pageid":104168,"ns":6,"title":"File:Gg76f2dbef4cc9 24 42.jpg","redirect":""}]}}}}

The normalization doesn't seem to affect this.

Event Timeline

AlexisJazz renamed this task from Usage and linking of a particular file on a particular page is not reported by API query for linkshere and fileusage to Usage and linking of a file through a redirect is not reported by API query for linkshere and fileusage.Aug 26 2021, 4:41 PM
AlexisJazz updated the task description. (Show Details)

This appears to affect only files that are linked/used through a redirect.

Here's an example from production:

https://en.wikipedia.org/w/api.php?action=query&prop=fileusage&fuprop=title&fulimit=500&format=json&titles=File%3AGDS3003_FAM83A_(microarray_expression_data_for_FAM83A_when_exposed_to_house_dust_mite_extract).png

{"batchcomplete":"","query":{"normalized":[{"from":"File:GDS3003_FAM83A_(microarray_expression_data_for_FAM83A_when_exposed_to_house_dust_mite_extract).png","to":"File:GDS3003 FAM83A (microarray expression data for FAM83A when exposed to house dust mite extract).png"}],"pages":{"35733101":{"pageid":35733101,"ns":6,"title":"File:GDS3003 FAM83A (microarray expression data for FAM83A when exposed to house dust mite extract).png","fileusage":[{"ns":0,"title":"FAM83A"},{"ns":2,"title":"User:Alexis Jazz/sandbox2"}]}}}}

Here the usage through a redirect (User:Alexis Jazz/sandbox2) is reported, different from beta cluster.

AlexisJazz renamed this task from Usage and linking of a file through a redirect is not reported by API query for linkshere and fileusage to Usage and linking of a page or file through a redirect is not reported by API query for linkshere and fileusage.Aug 26 2021, 5:19 PM
AlexisJazz updated the task description. (Show Details)

Multiple levels of redirects are not supported.

Note how File:Gg76f2dbef4cc9 24 37.jpg redirects to File:Gg76f2dbef4cc9 24 38.jpg, which redirects to File:Gg76f2dbef4cc9 24 39.jpg, which again redirects to File:Gg76f2dbef4cc9 24 43.jpg.

I think this is the expected behavior.

Multiple levels of redirects are not supported.

Note how File:Gg76f2dbef4cc9 24 37.jpg redirects to File:Gg76f2dbef4cc9 24 38.jpg, which redirects to File:Gg76f2dbef4cc9 24 39.jpg, which again redirects to File:Gg76f2dbef4cc9 24 43.jpg.

I think this is the expected behavior.

That's not what's happening here. https://commons.wikimedia.beta.wmflabs.org/wiki/User:AJ/Sandbox4 uses https://commons.wikimedia.beta.wmflabs.org/w/index.php?title=File:Gg76f2dbef4cc9_24_42.jpg&redirect=no which is a redirect to the actual file. No double redirects. I am aware double redirects don't work, if I used those the image wouldn't load on Sandbox4. Since it does load, I'm clearly not using a double redirect.

For a less convoluted example, see the last bullet point: Special:WhatLinksHere/Bert reports the link on User:AJ/sandbox while the API ignores the use through the Berta redirect.

brennen triaged this task as Unbreak Now! priority.Aug 26 2021, 8:14 PM
brennen subscribed.

This is currently marked as a train blocker. I'm not equipped to evaluate it. Does this warrant a rollback to group0?

This is currently marked as a train blocker. I'm not equipped to evaluate it. Does this warrant a rollback to group0?

I don't think so. At worst this will be corruption of the links tables that'll fix themselves over time.

brennen lowered the priority of this task from Unbreak Now! to Needs Triage.Aug 26 2021, 8:45 PM

Thanks!

I think backlinks missing is train blocker worthy because of its impact on deletion-related processes, especially in cases where usage is factor (e.g. fair use images).

That said, I can't reproduce the issue on real Commons, suggesting this is just a Beta Cluster issue (job queue??). https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Mediawiki&namespace= shows the newly created User:Legoktm/sandbox page, which links to [[Mediawiki]]. It's usage also appears on https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=MediaWiki&namespace=

Then I added an image that was recently moved by linking to the redirect and https://commons.wikimedia.org/wiki/File:McLaren_765LT_in_B%C3%B6blingen_02.jpg shows its file usage properly.

This is currently marked as a train blocker. I'm not equipped to evaluate it. Does this warrant a rollback to group0?

I don't think so. At worst this will be corruption of the links tables that'll fix themselves over time.

I encountered this while testing a new function for LuckyRename which resolves file redirects to the current filename on pages that still use the redirect. This would generally be the result of using Special:Move without updating usage. I have been testing this stuff rigorously on both production and betacommons in the past few months. Only now did I suddenly notice a change in behavior. This will screw with things as existing usage won't be reliably replaced by LuckyRename. I can't quite predict what exactly will happen in every possible circumstance. I'm also not sure about the impact on the Commons global replacement tool. Any bots that utilize linkshere may also get confused and no longer work reliably.

This doesn't fix itself. If a bot would erroneously tag a page as being orphaned because the linkage through a redirect can't be detected, that'll create a mess. If we suddenly have to rely on file redirects because we can't trust existing usage to be reliably replaced it creates a maintenance job (for humans) for the future. If this somehow results in image usage breaking, there will be a lot of ranting.

I encountered this while testing a new function for LuckyRename which resolves file redirects to the current filename on pages that still use the redirect. This would generally be the result of using Special:Move without updating usage. I have been testing this stuff rigorously on both production and betacommons in the past few months.

Can you clarify as to whether you have a reproduction case in production? (I don't mean to be difficult, but as so often happens this is not an area of the code that we're familiar with in RelEng.) It sounds from Legoktm's comment above that this is not currently happening live.

I think backlinks missing is train blocker worthy because of its impact on deletion-related processes, especially in cases where usage is factor (e.g. fair use images).

That said, I can't reproduce the issue on real Commons, suggesting this is just a Beta Cluster issue (job queue??)

I encountered this while testing a new function for LuckyRename which resolves file redirects to the current filename on pages that still use the redirect. This would generally be the result of using Special:Move without updating usage. I have been testing this stuff rigorously on both production and betacommons in the past few months.

Can you clarify as to whether you have a reproduction case in production? (I don't mean to be difficult, but as so often happens this is not an area of the code that we're familiar with in RelEng.) It sounds from Legoktm's comment above that this is not currently happening live.

I never claimed this problem exists on production. On the opposite, in T289792#7312582 I gave an example from production to demonstrate that it does work on production.

On 19 August I resolved a redirect on betacommons, replacing File:Gg76f2dbef4cc9 24 39.jpg with File:Gg76f2dbef4cc9 24 40.jpg. That means I ran LuckyRename from File:Gg76f2dbef4cc9 24 40.jpg, it requested linkshere and fileusage and obtained User:AJ/Sandbox4 from that. So it was working last week, but now it's not.

The idea of a train blocker is that you don't introduce breaking bugs on production, right? Last time I reported a breaking bug (T278579) it was assumed to be a problem with the beta environment. The result was a Mediawiki rollback.

The idea of a train blocker is that you don't introduce breaking bugs on production, right?

Please see https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Issues_that_hold_the_train
What criteria is this under?

The idea of a train blocker is that you don't introduce breaking bugs on production, right? Last time I reported a breaking bug (T278579) it was assumed to be a problem with the beta environment. The result was a Mediawiki rollback.

The 1.37.0-wmf.20 train is the version deployed in production for all wikis as of today; T281161 is the task for that deployment. If this is an issue present on master that was introduced after 1.37.0-wmf.20 was branched, it should block next week's 1.37.0-wmf.21 (T281162).

Sorry I misunderstood the issue at first. In the future please clarify if you know that it was working correctly before, and is broken now.

Data loss

The idea of a train blocker is that you don't introduce breaking bugs on production, right? Last time I reported a breaking bug (T278579) it was assumed to be a problem with the beta environment. The result was a Mediawiki rollback.

Yes, although if this is reproducible on beta cluster but no on production group0 wikis, then it would be a blocker for the next train.

Does the issue occur on group0 wikis, like test.wikipedia.org?

It sounds like we're clear here for this week. @AlexisJazz, thanks for the report, and apologies for not parsing out sooner that this is likely not an issue affecting wmf.20. The deployment train is a somewhat opaque process at best for people not involved in it day-to-day.

The idea of a train blocker is that you don't introduce breaking bugs on production, right? Last time I reported a breaking bug (T278579) it was assumed to be a problem with the beta environment. The result was a Mediawiki rollback.

The 1.37.0-wmf.20 train is the version deployed in production for all wikis as of today; T281161 is the task for that deployment. If this is an issue present on master that was introduced after 1.37.0-wmf.20 was branched, it should block next week's 1.37.0-wmf.21 (T281162).

The versions always confuse me. I'm not sure how all the versioning works. I just added it as a task to the most oldest open deployment blocker task. Before T278579 I didn't touch deployment blocker tasks at all. Usually I'd just comment and let a developer add the subtask. As it seemed a new version was going to be rolled out today or tomorrow I decided to add the task myself to prevent a repeat of the experience of T278579.

Thanks for connecting this to the right task.

Sorry I misunderstood the issue at first. In the future please clarify if you know that it was working correctly before, and is broken now.

Data loss

The idea of a train blocker is that you don't introduce breaking bugs on production, right? Last time I reported a breaking bug (T278579) it was assumed to be a problem with the beta environment. The result was a Mediawiki rollback.

Yes, although if this is reproducible on beta cluster but no on production group0 wikis, then it would be a blocker for the next train.

Does the issue occur on group0 wikis, like test.wikipedia.org?

https://test.wikipedia.org/w/api.php?action=query&prop=linkshere&lhlimit=500&format=json&titles=Bert fails to report https://test.wikipedia.org/wiki/Erno which links Berta, a redirect to Bert. So the API fails while on Special:WhatLinksHere/Bert the usage is visible, same as beta. So yes reproducible on test.wikipedia.org.

So https://en.wikipedia.org/w/api.php?action=query&prop=fileusage&fuprop=title&fulimit=500&format=json&titles=File:Cumulus%20Networks%20(computer%20network%20business)%20logo.png shows the usage on https://en.wikipedia.org/wiki/User:Alexis_Jazz/sandbox2 through a redirect, as expected. (fileusage on enwiki production)

However, https://en.wikipedia.org/w/api.php?action=query&prop=linkshere&lhlimit=500&format=json&titles=File:Roman%20Wall%20Blues%20(1969%20Alex%20Harvey%20album).jpg doesn't show the link to a redirect on https://en.wikipedia.org/wiki/User:Alexis_Jazz/sandbox3. (linkshere on enwiki production)

https://test.wikipedia.org/w/api.php?action=query&prop=linkshere&lhlimit=500&format=json&titles=Bert doesn't show the link to a redirect on https://test.wikipedia.org/wiki/Erno. (linkshere on testwiki)

https://test.wikipedia.org/w/api.php?action=query&prop=fileusage&fuprop=title&fulimit=500&format=json&titles=File:(Pennsylvania)%20Mershon,%20John%20H%20-%20124th%20Infantry,%20Company%20C%20-%20DPLA%20-%203f4f43309eefa6df44f34965f9c618a5.jpg does report the usage on Erno through a redirect. (fileusage on testwiki)

Maybe the listing of a redirect on linkshere in the API was never a thing and this bug is restricted to fileusage? In that case it's not reproducible on testwiki.

I am an idiot

Aw crap.

https://commons.wikimedia.beta.wmflabs.org/wiki/User:AJ/Sandbox4 still showed an old version of the page, parsed when the file actually existed at https://commons.wikimedia.beta.wmflabs.org/wiki/File:Gg76f2dbef4cc9_24_42.jpg . While https://commons.wikimedia.beta.wmflabs.org/wiki/Special:WhatLinksHere/File:Gg76f2dbef4cc9_24_43.jpg reports it correctly as being used through the redirect at 42.jpg, API fileusage is apparently based on whatever pages looked like when they were last parsed. So after a quick null edit User:AJ/Sandbox4 showed right up.

Mea culpa.

Aklapper renamed this task from Usage and linking of a page or file through a redirect is not reported by API query for linkshere and fileusage to Usage and linking of a page or file through multiple levels of redirects is not reported by API query for linkshere and fileusage.Aug 29 2021, 11:19 AM

@Aklapper can you revert the title change? This was never about double redirects, I am well aware those aren't supported. The issue ended up being that fileusage is based on parsed pages and the page had to be re-parsed (did a null edit) to get it to show up in the API fileusage. Interestingly Special:WhatLinksHere isn't based the parsed state of a page, causing a discrepancy between fileusage API and Special:WhatLinksHere and causing me to think it had to be a bug. While strange and possibly something that could be improved, not strictly a bug and unrelated to double redirects.

Almost forgot, in addition there is a difference between fileusage and linkshere. Fileusage will report usage through redirects. Linkshere on the other hand will not report links to redirects to the target, though Special:WhatLinksHere will report links to redirects to the target. It's because of this that I made my tool go through all existing redirects to query linkshere for each one.

AlexisJazz renamed this task from Usage and linking of a page or file through multiple levels of redirects is not reported by API query for linkshere and fileusage to Usage and linking of a page or file through a redirect is not reported by API query for linkshere and fileusage.Aug 30 2021, 12:00 PM