Page MenuHomePhabricator

Special:Export broken: always generates an empty file
Closed, ResolvedPublic

Description

Expected behavior: Valid XML is presented, as per production: https://en.wikipedia.org/wiki/Special:Export/Main_Page (works)

Actual behaviour: XML Parsing error on beta: https://en.wikipedia.beta.wmflabs.org/wiki/Special:Export/Main_Page

XML Parsing Error: no root element found
Location: https://en.wikipedia.beta.wmflabs.org/wiki/Special:Export/Main_Page
Line Number 1, Column 1:

It yields a 0 byte output!

Event Timeline

Reedy subscribed.

This might just be a beta issue in some form or another. WFM locally on master on my dev wiki

Downloading as a file is similarly broken. dumpBackup.php on CLI seems fine

Downloading as a file is similarly broken. dumpBackup.php on CLI seems fine

I think the XML error is generated by the browser because the response from the server is empty. (hence no root element)

< Content-Length: 0
< Content-Type: application/xml; charset=utf-8
< Age: 0
< X-Cache: deployment-cache-text06 miss, deployment-cache-text06 pass

I've tried this for User:ArielGlenn over there, current revs only, no templates, linked pages to depth of 0, with the same result: a 0-byte xml file saved.
Erm, there is no such page. Woops. Trying again with a page that exists: Badger

Entries in the log for the request below (with mwsession and ip addr redacted), and an empty file:

ariel@deployment-mwlog01:/srv/mw-log$ grep YF7nAKwQBHcAAAmCzR4AAAAF *log
CentralAuthVerbose.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha CentralAuthVerbose INFO: Loading CentralAuthUser for user ArielGlenn from cache object {"user":"ArielGlenn"} 
export.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha export DEBUG: Special:Export POST, dir: [], offset: [], limit: [] {"dir":null,"offset":null,"limit":null} 
MessageCache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha MessageCache DEBUG: MessageCache using store APCUBagOStuff {"class":"APCUBagOStuff"} 
MessageCache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha MessageCache DEBUG: MessageCache::load: Loading en... got from local cache  
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to fetch enwiki:MWSession:XXXX : (500)  {"code":500,"error":""} 
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to fetch enwiki:MWSession:XXXX : (500)  {"code":500,"error":""} 
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to delete enwiki:MWSessionXXXX : (500)  {"code":500,"error":""} 
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to store enwiki:MWSession:XXXX : (500)  {"code":500,"error":""} 
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to store enwiki:MWSession:XXXX : (500)  {"code":500,"error":""} 
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to store enwiki:MWSession:XXXX : (500)  {"code":500,"error":""} 
objectcache.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha objectcache ERROR: Failed to store enwiki:MWSession:XXXX : (500)  {"code":500,"error":""} 
xff.log:2021-03-27 08:04:16 [YF7nAKwQBHcAAAmCzR4AAAAF] deployment-mediawiki-07 enwiki 1.36.0-alpha xff INFO: Sat, 27 Mar 2021 08:04:16 +0000	https://en.wikipedia.beta.wmflabs.org/wiki/Special:Export	X.X.X.X, 127.0.0.1, 172.16.1.181	 {"date":"Sat, 27 Mar 2021 08:04:16 +0000","uri":"https://en.wikipedia.beta.wmflabs.org/wiki/Special:Export","xff":"X.X.X.X, 127.0.0.1","remoteaddr":"172.16.1.181","wpSave":""}

Downloading as a file is similarly broken. dumpBackup.php on CLI seems fine

I think the XML error is generated by the browser because the response from the server is empty. (hence no root element)

< Content-Length: 0
< Content-Type: application/xml; charset=utf-8
< Age: 0
< X-Cache: deployment-cache-text06 miss, deployment-cache-text06 pass

Yes. I was just confirming that XML export wasn't completely, but apparently silently broken on the mw app servers (like missing packages)

There's been various hosts replaced etc in beta in the last week or two, so it's possible it was fallout from that. Unfortunately, due to T278599 I can't easily do a local HTTP request to the server from itself

hashar triaged this task as Unbreak Now! priority.Apr 2 2021, 9:34 AM
hashar added subscribers: Tizio, hashar.

That should have been flagged as a train blocker for T278343 since that has now hit production and broke Special:Export.

I gave it a try on https://en.wikipedia.org/wiki/Special:Export using:

Mathematics
Philosophy

And that yields a 0 byte XML file

taavi renamed this task from Special:Export broken on beta cluster: XML Parsing Error: no root element found to Special:Export broken: XML Parsing Error: no root element found.Apr 2 2021, 9:36 AM
hashar renamed this task from Special:Export broken: XML Parsing Error: no root element found to Special:Export broken: always generates an empty file.Apr 2 2021, 9:37 AM
hashar updated the task description. (Show Details)

Change 676557 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] Rollback group1 and group2 wikis to 1.36.0-wmf.36

https://gerrit.wikimedia.org/r/676557

Change 676557 merged by jenkins-bot:

[operations/mediawiki-config@master] Rollback group1 and group2 wikis to 1.36.0-wmf.36

https://gerrit.wikimedia.org/r/676557

I have rolled Mediawiki deployment from 1.36.0-wmf.37 to 1.36.0-wmf.36 and Special:Export works again: https://en.wikipedia.org/wiki/Special:Export/Main_Page

Good news: it is reproducible on Beta-Cluster-Infrastructure which should help with the investigation.

Change 676559 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/mediawiki-config@master] Revert "group0 wikis to 1.36.0-wmf.37"

https://gerrit.wikimedia.org/r/676559

Change 676559 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "group0 wikis to 1.36.0-wmf.37"

https://gerrit.wikimedia.org/r/676559

Update: removing wfResetOutputBuffers from SpecialExport fixes the issue. Don't know why it has broken in wmf.37 the function itself hasn't changed for years.

It's caused by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/674196 and it has also broken using gzip in api requests (as reported in IRC "since 1.36/wmf.37 I can't use gzip compression in api requests"). The faulty patch has been backported to 1.35 (cc @Reedy), I'm not comfortable reverting that massive patch but at least I can tell it has broken wfResetOutputBuffers (and possibly more).

That should have been flagged as a train blocker for T278343 since that has now hit production and broke Special:Export.

Unfortunately it was working locally so it seemed like it might've just been a beta only issue :/. When in fact it sounds like it's potentially WMF setup/config of php/webserver/cache ontop of MW that's upset...

It's caused by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/674196 and it has also broken using gzip in api requests (as reported in IRC "since 1.36/wmf.37 I can't use gzip compression in api requests"). The faulty patch has been backported to 1.35 (cc @Reedy), I'm not comfortable reverting that massive patch but at least I can tell it has broken wfResetOutputBuffers (and possibly more).

I'll potentially revert it out of 1.35 for 1.35.2 if it's not fixed quickly then

Change 676567 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@master] Revert "Move logDataPageOutputOnly() call to outputResponsePayload()"

https://gerrit.wikimedia.org/r/676567

Change 676568 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@master] Revert "Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable"

https://gerrit.wikimedia.org/r/676568

I thought the revert would be worse than it was as I thought https://gerrit.wikimedia.org/r/c/mediawiki/core/+/661452 had landed

Change 676569 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_35] Revert "Move logDataPageOutputOnly() call to outputResponsePayload()"

https://gerrit.wikimedia.org/r/676569

Change 676570 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_35] Revert "Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable"

https://gerrit.wikimedia.org/r/676570

Change 676571 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@wmf/1.36.0-wmf.37] Revert "Move logDataPageOutputOnly() call to outputResponsePayload()"

https://gerrit.wikimedia.org/r/676571

Change 676572 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@wmf/1.36.0-wmf.37] Revert "Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable"

https://gerrit.wikimedia.org/r/676572

Change 676569 merged by jenkins-bot:

[mediawiki/core@REL1_35] Revert "Move logDataPageOutputOnly() call to outputResponsePayload()"

https://gerrit.wikimedia.org/r/676569

Change 676570 merged by jenkins-bot:

[mediawiki/core@REL1_35] Revert "Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable"

https://gerrit.wikimedia.org/r/676570

Change 676571 merged by jenkins-bot:

[mediawiki/core@wmf/1.36.0-wmf.37] Revert "Move logDataPageOutputOnly() call to outputResponsePayload()"

https://gerrit.wikimedia.org/r/676571

Change 676572 merged by jenkins-bot:

[mediawiki/core@wmf/1.36.0-wmf.37] Revert "Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable"

https://gerrit.wikimedia.org/r/676572

Mentioned in SAL (#wikimedia-operations) [2021-04-02T13:08:54Z] <reedy@deploy1002> Synchronized php-1.36.0-wmf.37/includes/MediaWiki.php: T278579 (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2021-04-02T13:10:07Z] <reedy@deploy1002> Synchronized php-1.36.0-wmf.37/includes/OutputHandler.php: T278579 (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-04-02T13:11:17Z] <reedy@deploy1002> Synchronized php-1.36.0-wmf.37/load.php: T278579 (duration: 00m 58s)

Change 676567 merged by jenkins-bot:

[mediawiki/core@master] Revert "Move logDataPageOutputOnly() call to outputResponsePayload()"

https://gerrit.wikimedia.org/r/676567

Change 676568 merged by jenkins-bot:

[mediawiki/core@master] Revert "Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable"

https://gerrit.wikimedia.org/r/676568

Confirmed to be fixed on production as well. Congratulations and thank you!

That should have been flagged as a train blocker for T278343 since that has now hit production and broke Special:Export.

There's nothing I could have done about this, is there? I've learned the hard way that I'm not allowed to set the priority of anything, I doubt I can do much beyond reporting.

That should have been flagged as a train blocker for T278343 since that has now hit production and broke Special:Export.

There's nothing I could have done about this, is there? I've learned the hard way that I'm not allowed to set the priority of anything, I doubt I can do much beyond reporting.

If you feel it's relevant to the train, you can certainly leave a comment on the train blocker bug mentioning the task in question might be a potential issue

That should have been flagged as a train blocker for T278343 since that has now hit production and broke Special:Export.

There's nothing I could have done about this, is there? I've learned the hard way that I'm not allowed to set the priority of anything, I doubt I can do much beyond reporting.

If you feel it's relevant to the train, you can certainly leave a comment on the train blocker bug mentioning the task in question might be a potential issue

Thanks, I'll try to remember that next time.

I've tried this for User:ArielGlenn over there, current revs only, no templates, linked pages to depth of 0, with the same result: a 0-byte xml file saved.
Erm, there is no such page. Woops. Trying again with a page that exists: Badger

Actually, even if the page doesn't exist you should get valid XML. For example https://en.wikipedia.beta.wmflabs.org/wiki/Special:Export/This_will_return_valid_XML_output. You just only get <siteinfo> and no <page>.

A wholesale revert is not really a resolution. Maybe we can reopen with lower priority and without the train blocker parent task?

tstarling lowered the priority of this task from Unbreak Now! to Medium.

The problem is that the ob_end_clean() in wfResetOutputBuffers() results in a call to OutputHandler::callback(), so Content-Length: 0 is sent. Previously the buggy code was only activated if the request was HTTP/1.0, but Aaron's patch exposed it for other kinds of requests.

The Content-Length handling in OutputHandler seems to duplicate MediaWiki::outputResponsePayload(), and it's unclear to me why we need both.

The problem is that the ob_end_clean() in wfResetOutputBuffers() results in a call to OutputHandler::callback(), so Content-Length: 0 is sent. Previously the buggy code was only activated if the request was HTTP/1.0, but Aaron's patch exposed it for other kinds of requests.

The Content-Length handling in OutputHandler seems to duplicate MediaWiki::outputResponsePayload(), and it's unclear to me why we need both.

Maybe that functionality used to be in OutputHandler but was at some point relocated to outputResponsePayload. Just guessing.

Change 677684 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Don't send headers on ob_end_clean()

https://gerrit.wikimedia.org/r/677684

Change 677684 merged by jenkins-bot:

[mediawiki/core@master] Don't send headers on ob_end_clean()

https://gerrit.wikimedia.org/r/677684

Change 699286 had a related patch set uploaded (by Reedy; author: Tim Starling):

[mediawiki/core@REL1_36] Don't send headers on ob_end_clean()

https://gerrit.wikimedia.org/r/699286

Change 699287 had a related patch set uploaded (by Reedy; author: Tim Starling):

[mediawiki/core@REL1_35] Don't send headers on ob_end_clean()

https://gerrit.wikimedia.org/r/699287

Change 699287 merged by jenkins-bot:

[mediawiki/core@REL1_35] Don't send headers on ob_end_clean()

https://gerrit.wikimedia.org/r/699287

Change 699286 merged by jenkins-bot:

[mediawiki/core@REL1_36] Don't send headers on ob_end_clean()

https://gerrit.wikimedia.org/r/699286

Krinkle assigned this task to Reedy.