Page MenuHomePhabricator

GETting a non-existing or deleted page with action=raw returns a 404 error
Closed, DeclinedPublic

Description

On itwiki, it was reported that a tool didn't work due to a 404 found in the console. The error itself is caused in this script (getMessage function), and the error comes from https://it.wikipedia.org/w/index.php?title=MediaWiki%3Anextdiff&action=raw&usemsgcache=true
I suspect this has to do with https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/463378/ (T193271), but I guess this is not intended behaviour. Also, I read that the usemsgcache parameter has been removed, and I guess it could be a good idea to inform users via Tech/News.

Event Timeline

How to reproduce the problem? (Which tool do use and how?)

Also, I read that the usemsgcache parameter has been removed, and I guess it could be a good idea to inform users via Tech/News.

Is that https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/463378/ ? Adding User-notice. (Other docs probably also welcome updates...)

Being the problem a 404 error, clicking the link above is enough. The tool using it is the linked one, but again, given the nature of the error, it is not related to the script itself.
And yep, thanks for the user-notice part. I'm not writing it myself because I wouldn't really know what to write about it.

Aklapper renamed this task from GETting a page with action=raw returns a 404 error to GETting a non-existing or deleted page with action=raw returns a 404 error.Oct 5 2018, 2:17 PM

URLs such as https://en.wikipedia.org/w/index.php?title=Berlin&action=raw work.
https://it.wikipedia.org/wiki/MediaWiki:Nextdiff used in the example above is a deleted page.

Which behavior is expected for action=raw when used to get pages that do not exist?

And yep, thanks for the user-notice part. I'm not writing it myself because I wouldn't really know what to write about it.

Me neither, to be honest, right now, so I'm thinking we might want to wait until next week to make sure we actually have something to tell the Tech News readers, even if faster news are better news.

I would expect it to be consistent with what's shown without any parameters, and it used to work this way when action=raw was paired with usemsgcache=true. The MediaWiki NS currently shows the localized message from the i18n cache if the custom one is deleted. I.e., https://it.wikipedia.org/wiki/MediaWiki:Nextdiff shows "Differenza successiva →".

Yay, now I got it. AFAICS, action=raw for deleted pages in the MediaWiki ns (i.e. messages) only used to work with usemsgcache=true, since the content was grabbed from message cache. Removing this opportunity (patch linked in task description) makes the original problem (parent task) re-appear.
As for Tech/News, I guess we can wait, this doesn't seem to be that urgent.

Krinkle claimed this task.
Krinkle triaged this task as High priority.
Declining

The usemsgcache parameter was an internal feature from before 2010 and ResourceLoader for efficiently exposing CSS and JS pages via action=raw&ctype=text/javascript.

The way in which it is used by Gadget-MultiDiffConsecutive.js was not officially supported. This means that the feature was not documented at doc.wikimedia.org, and not described in Manuals we verified on www.mediawiki.org.

As a script author you have the flexibility and unlimited power to still make the tools you want to build, that rely on internals we don't officially support.

The cost of doing that is that when the internals change, in most cases they do not follow a deprecation policy or Tech-News announcement. We make hundreds of such internal changes almost every day. I hope you understand :)

If you find this breaks a tool you wrote and are unsure how to address it, there are various avenues to ask for help. Such as https://meta.wikimedia.org/wiki/Tech and https://www.mediawiki.org/wiki/Talk:ResourceLoader/Migration_guide_(users).

It is also fine to report to Phabricator, but note that it does not mean it will be considered a regression. Hence, I am declining this report, but now that I'm here, I've looked into this particular gadget, and below is my suggestion for how to make it work again. I hope that helps!

Repair

The action=raw by default returns HTTP 200 OK when the page you ask for exists on the wiki, and it returns HTTP 404 Not Found when the message does not exist.

The exception to this rule, is for compatibility with script imports. When using ctype=text/javascript or ctype=text/css, it will always return HTTP 200, even if the page does not exist.

This has been true for many years, has not changed recently, and will not change without announcement.

In your code, the $.ajax() method returns a "Promise" that will resolve for HTTP-200 and reject for HTTP-404. You can use jQuery then() or always() to handle the rejection use case.

function getMessage() {
    return $.ajax( ... );
}

getMessage()
    .then( function resolved( text ) {
        // use text of page that exists
        mytext = text;
    }, function rejected() {
        // page does not exist
        mytext = ''; // default?
    } )
    .then( proceedWithMytext );

and not described in Manuals we verified on www.mediawiki.org.

Thanks for looking into this! Does that mean it should be removed from https://www.mediawiki.org/wiki/Manual:Parameters_to_index.php#Raw ?

and not described in Manuals we verified on www.mediawiki.org.

Thanks for looking into this! Does that mean it should be removed from https://www.mediawiki.org/wiki/Manual:Parameters_to_index.php#Raw ?

Thanks (I've updated it now.)

This page documented the parameter controlling the source of the content (message cache vs database). It did not specify a variance in HTTP status behaviour.

The server responded with the same content both with and without this parameter. The reason it existed was to reduce load on WMF database servers and to make pages load faster. The most common use of action=raw was loading scripts from the MediaWiki-namespace, so MW was given a special cache just for that. Pages in the User namespace or old revisions of pages, were not part of this special cache. Rather than automatically detecting when the cache can be used, the old code create this parameter to switch between the two modes.

This parameter is no longer needed. Today, all revisions are cached dynamically and this additional layer of caching became redundant. Any old urls that still specify it continue to work, and they still receive the same content.

@Krinkle thanks for your help! When I reported this bug, I thought it could be due to an actual regression. Then I realized, and yeah, it makes sense. There are lots of scripts to be updated, and using APIs as suggested in the parent task would really be the best thing to do.