Page MenuHomePhabricator

Special:Undelete viewing large deleted files OOM and 500 Internal Server Error
Closed, ResolvedPublic

Description

@Jdx suspected two files on Commons, Pdfkwl.pdf and Kwl-file2.pdf are part of WP0 abuse (T129845) due to their unexpectedly large size. I tried to download them to see what they are about, but fails with 500 Internal Server Error. Inspection of the returned data shows:

<!DOCTYPE html>
<html lang="en" dir="ltr">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645AD; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: inherit; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf.png" srcset="https://www.wikimedia.org/static/images/wmf-2x.png 2x" alt=Wikimedia width=135 height=135></a>
<h1>Error</h1>
<p>Our servers are currently under maintenance or experiencing a technical problem. Please <a href="" title="Reload this page" onclick="location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p><p>See the error message at the bottom of this page for more&nbsp;information.</p>
</div>
<div class="footer">
<p>If you report this error to the Wikimedia System Administrators, please include the details below.</p>
<p class="text-muted"><code>
  PHP fatal error: <br/>
  request has exceeded memory limit</code></p></div>
</html>

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I've just tried to download them and got the same error.

What is their size? "Download" = in the browser I assume?

My assumption on OOM: PHP is attempting to read all the contents into memory due to buffering or other reasons.

I setup a vagrant instance at commonsarchive-testmw.wmflabs.org, and generated a 1-gigabyte png with:

vagrant@mediawiki-vagrant:~$ < /dev/urandom rawtoppm -rgb 1024 1024 | pnmtopng > out.png
vagrant@mediawiki-vagrant:~$ cat out.png /dev/urandom | head -c 1073741824 > gigantic.png

then imported the png with:

vagrant@mediawiki-vagrant:~$ mkdir temp
vagrant@mediawiki-vagrant:~$ mv gigantic.png temp/
vagrant@mediawiki-vagrant:~$ mwscript importImages.php --wiki=wiki --extensions=png ~/temp/
Import Images

Importing gigantic.png...done.

Found: 1
Added: 1

and finally deleted http://commonsarchive-testmw.wmflabs.org/w/index.php?title=File:Gigantic.png&action=edit via web UI.

Googling a bit, I found http://php.net/manual/en/function.ob-start.php PHP output buffering, and attempted to have a small non-zero output buffer size, but it did not work.

I then checked whether output buffering actually is enabled or not:

diff --git a/includes/libs/filebackend/HTTPFileStreamer.php b/includes/libs/filebackend/HTTPFileStreamer.php
index a7d064b6e3..8386554bb5 100644
--- a/includes/libs/filebackend/HTTPFileStreamer.php
+++ b/includes/libs/filebackend/HTTPFileStreamer.php
@@ -169,7 +169,7 @@ class HTTPFileStreamer {
                                return false;
                        }
                } else {
-                       return readfile( $this->path ) !== false; // faster
+                       $headerFunc( 'X-Debug-Info: ' . ob_get_level() );
                }
 
                return true;

the HTTP headers indicate a X-Debug-Info value of 0, indication that output buffering is disabled, on contrary to my initial assumptions.

I then found http://php.net/manual/en/function.flush.php flush(), which suggest another layer of buffering, so I tried to flush-all-the-time, with:

diff --git a/includes/libs/filebackend/HTTPFileStreamer.php b/includes/libs/filebackend/HTTPFileStreamer.php
index a7d064b6e3..d8d0abf33c 100644
--- a/includes/libs/filebackend/HTTPFileStreamer.php
+++ b/includes/libs/filebackend/HTTPFileStreamer.php
@@ -169,7 +169,18 @@ class HTTPFileStreamer {
                                return false;
                        }
                } else {
-                       return readfile( $this->path ) !== false; // faster
+                       $handle = fopen( $this->path, 'rb' );
+                       if ( $handle ) {
+                               $ok = true;
+                               while ( $ok ) {
+                                       $data = fread( $handle, 8 * 1024 );
+                                       $ok = ( $data !== false );
+                                       print $data;
+                                       flush();
+                               }
+                       } else {
+                               return false;
+                       }
                }
 
                return true;

and the file was successfully starting-to-be-downloaded (I don't have the interests to download the full 1gigabyte randomness).

Continuing to Google, I found upstream HHVM bug https://github.com/facebook/hhvm/issues/7444, which indicate HHVM's buffer handling is somewhat broken, and it will not flush until the response end. I then tested the workaround of ob_implicit_flush mentioned in https://github.com/facebook/hhvm/issues/7444#issuecomment-256753321, with:

diff --git a/includes/libs/filebackend/HTTPFileStreamer.php b/includes/libs/filebackend/HTTPFileStreamer.php
index a7d064b6e3..be83e05a0b 100644
--- a/includes/libs/filebackend/HTTPFileStreamer.php
+++ b/includes/libs/filebackend/HTTPFileStreamer.php
@@ -169,7 +169,11 @@ class HTTPFileStreamer {
                                return false;
                        }
                } else {
-                       return readfile( $this->path ) !== false; // faster
+                       ob_start( NULL, 1048576 );
+                       ob_implicit_flush( true );
+                       $ret = readfile( $this->path ) !== false; // faster
+                       ob_end_flush();
+                       return $ret;
                }
 
                return true;

It continued to work. I'll file a patch to call ob_implicit_flush for all file backends.

What is their size?

1,902,426,188 bytes and 792,533,697 bytes.

"Download" = in the browser I assume?

Yes, but that should be irrelevant. 500 is a server side issue.

Change 370792 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[mediawiki/core@master] FileRepo: create output buffer and set ob_implicit_flush for file streaming

https://gerrit.wikimedia.org/r/370792

Change 370792 merged by jenkins-bot:
[mediawiki/core@master] FileRepo: create output buffer and set ob_implicit_flush for file streaming

https://gerrit.wikimedia.org/r/370792

matmarex removed a project: Patch-For-Review.
matmarex subscribed.

This will be deployed next week with the train. If you need it sooner, we can backport it.