Page MenuHomePhabricator

Pygmentize times out on Windows
Open, Needs TriagePublic

Description

Running SyntaxHighlight on windows ends up with a timeout from Pygmentize, more or less as described here on MWwiki (the only difference is that I'm testing it with CACHE_NONE and still doesn't work). I found out this problem after T199789. Building the command via concatenation actually makes the process run, but the browser starts loading eternally without any error being returned.
I tried to run pygmentize via cmd directly, i.e.

path/to/pygmentize -l php -f html -O cssclass=mw-highlight,encoding=utf-8,startinline=1

Then added a sample string, like 1==1 via stdin, then Ctrl+Z to terminate input and the output is indeed correct (and returned instantly). Together with the fact that, at this point, the final command is the one I wrote above, this makes me think of some problem in Command::input; maybe it fails to properly terminate input sequence?

Also, I don't think it'll be really relevant but here are my specs: running XAMPP with PHP 7.2.2 on Windows 7, python 3.6.3.

Event Timeline

Daimona created this task.Jul 19 2018, 10:57 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 19 2018, 10:57 AM

I guess this isn't easy at all. I could experimentally determine what causes the script to hang: it all happens on this line, when reading the first pipe (should be stdin). I also read on the web that stream management is quite bugged on windows from PHP's end, although it's not clear whether those bugs are actually fixed.

Wellll yes, sounds like this is due to the many bugs of streams on windows reported on the web (just search for something like "php windows proc_open" or similar stuff to read a bunch of them). Trying to read from the stream before entering the while loop results in eternal loading for pipes[1] and pipes[2] while pipes[1] is readable, whatever function is used. Instead, if I change $desc to use files instead of pipes for stdout and stderr, everything works fine, with the output being sent to a file as expected. Sounds like a proper solution isn't easy to implement; maybe some workaround would be possible?

Daimona renamed this task from Pygmentize times out on Windows (due to failure of Command::input?) to Pygmentize times out on Windows.Jul 20 2018, 1:59 PM

I think I figured out the reason: stream_set_blocking doesn't work for pipes on windows, see doc and a linked bug report. I also printed the return value for these calls, and indeed they're all false. Apparently this won't be fixed on PHP's end since there actually isn't a good solution available on windows. I'm wondering whether there's a workaround that we can apply locally.

I can reproduce this in eval.php using echo_333333_stars.php from https://gerrit.wikimedia.org/r/c/mediawiki/core/+/471885/ . The problem is that stream_select() returns 3, with all pipes supposedly ready, but they're not really ready. Reading from stdout will give you as many bytes as you ask for, but reading from stderr blocks forever. If you fully drain stdout, then reading from stderr gives you an empty string instead of blocking.

The converse is also true: if you run a command with a long stderr and no stdout, reading from stdout blocks unless you first fully drain stderr.

stream_set_blocking() shouldn't matter if the reads are small enough. All we need is for stream_select() to tell us the correct order to drain the streams.

The solution used in Phabricator, which I found via https://secure.phabricator.com/T11105 , is to redirect stdout and stderr to temporary files.

stream_select() is implemented using WaitForSingleObject, but pipes are not on the list of handle types that this function can wait for.

The anonymous pipe documentation says: "Anonymous pipes are implemented using a named pipe with a unique name. Therefore, you can often pass a handle to an anonymous pipe to a function that requires a handle to a named pipe." This is interesting since named pipes do in fact support async I/O. You can even set non-blocking mode on a named pipe instance after creation, although this comes with a deprecation warning.

There is also PeekNamedPipe, which says it can take a "handle to the read end of an anonymous pipe, as returned by the CreatePipe function." So one possibility for a fix inside stream_select() would be to poll PeekNamedPipe and to return pipes which have non-zero bytes available.

Change 656519 had a related patch set uploaded (by Skizzerz; owner: Skizzerz):
[mediawiki/core@master] Redirect shell output to temp files on Windows

https://gerrit.wikimedia.org/r/656519