Page MenuHomePhabricator

Add support for Pygments ran as CGI application via HTTP
Open, Needs TriagePublic

Description

Hello! On some shared hostings Pygments can't be run because the shell execution was limited by "security" reasons and I can't enable it. However, I have built my own workaround which turns Pygments into CGI-BIN application I can run via HTTP.

It's the patch to SyntaxHighlight.php which adds the flag $wgPygmentsUseCgiProxy. When you'll set it into true, the wgPygmentizePath will be interpreted as HTTP link and will be executed via cURL.

It's the CGI script which will execute Pygments and will forward input data into it, and will return back the result.

Latest patchset is here: https://gerrit.wikimedia.org/r/538404

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Hi @Wohlstand, thanks for taking a look at the code!

You are very welcome to use developer access to submit the proposed code changes as a Git branch directly into Gerrit which makes it easier to review them quickly and provide feedback.
If you don't want to set up Git/Gerrit, you can also use the Gerrit Patch Uploader. Thanks again!

Oh, nice! I have sent the raw patch because I hadn't any sort of understanding where to submit this, I'll try out this...

Sent just now:
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/474518/

Note: this https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/474517/ that I have sent previously, is junk, contains a typo in README I have fixed quickly and I have sent an update. Discard this, please.

Change 474518 had a related patch set uploaded (by TheDJ; owner: Wohlstand):
[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Allow Pygments to be run via CGI

https://gerrit.wikimedia.org/r/474518

Hello! I want to ask you how are you? I have seen your Jenkins bot said "Verify -1" because of some weird "quibble-vendor-mysql-hhvm-docker" failure... Can anyone poke that bot to re-check the patch?

Hello!
Almost half of the year has been passed. Are you too busy and can't review this patch and give a rate on it?

Change 474518 had a related patch set uploaded (by Aklapper; owner: Wohlstand):
[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Allow Pygments to be run via CGI

https://gerrit.wikimedia.org/r/474518

I rebased the patch, that's why there was another notification here for https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/474518/

Hi @Wohlstand, sorry that this has not seen a review yet. In theory, see https://www.mediawiki.org/wiki/Gerrit/Code_review/Getting_reviews for more information.
According to https://www.mediawiki.org/wiki/Developers/Maintainers , "SyntaxHighlight_GeSHi" is stewarded by the Editing Team so they'd be the one to ask if your proposed patch is in scope and could be accepted.

The reason for that V-1 was NOT my issue, it happened because of another issue that was in the state that was in a moment where I have submitted my patch. I think I'll try to re-poke it to trigger the re-check...

I see here some sort of code style / static analysis failure... Okay, will re-check this and will submit an update...

Sorry for waiting, I did the change, however, I can't submit it with no way:

via git review -R I getting next reply:

$ git review -R
remote: 
remote: Processing changes: refs: 1
remote: Processing changes: refs: 1, done            
remote: Pushing to refs/publish/* is deprecated, use refs/for/* instead.        
To ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/SyntaxHighlight_GeSHi.git
 ! [remote rejected] HEAD -> refs/publish/master/474518 (cannot add patch set to 474518.)
error: failed to push some refs to 'ssh://wohlstand@gerrit.wikimedia.org:29418/mediawiki/extensions/SyntaxHighlight_GeSHi.git'

// Idk how to fix the "remote: Pushing to refs/publish/* is deprecated, use refs/for/* instead." warning, is this a reason of failure?

I have checked everything, but I can't find a reason for this failure... :

git version 2.17.1
git-review version 1.26.0

Gerrit Patch uploader also fails, it won't log in and drops a 500 error on attempt to login:
https://tools.wmflabs.org/gerrit-patch-uploader/

So, I posting my updated patch here as a file:

Okay, I have installed git-review 1.28.0 by pip, however, it still fails:

$ git review -R
remote: 
remote: Processing changes: refs: 1
remote: Processing changes: refs: 1, done            
To ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/SyntaxHighlight_GeSHi.git
 ! [remote rejected] HEAD -> refs/for/master (cannot add patch set to 474518.)
error: failed to push some refs to 'ssh://wohlstand@gerrit.wikimedia.org:29418/mediawiki/extensions/SyntaxHighlight_GeSHi.git'
$ git branch
  master
* review/gerrit_patch_uploader/474518

Hi @Wohlstand, sorry that you run into problems. ;( I don't know which exact steps you performed before running git review -R or what created 474518, so it's hard to debug without a full list of steps. As this problem is unrelated to the topic that this task is about, I recommend to ask in a Wikimedia developer support forum, for example https://discourse-mediawiki.wmflabs.org/

Hi, @Aklapper , I did next:

# re-clonned repo from Gerrit
git clone ssh://wohlstand@gerrit.wikimedia.org:29418/mediawiki/extensions/SyntaxHighlight_GeSHi.git
cd SyntaxHighlight_GeSHi
# applied my patch with a change (added a new commit)
git am --signoff < 0001-Allow-Pygments-to-be-run-via-CGI.patch
# Then
git review -r origin -s
# and finally
git review -r origin -R

I'll try to ask the question on the forum, but this whole thing looks like a weird mess...

@Aklapper I think it could be due to security measures: the patch was authored by the uploader tool, and only people in the trusted contributors group can amend others' patches.

Thanks, @Daimona for help!
Anyway, I see it does a test of old errors I have fixed in an updated patch:

made some of those changes for you, but notice the raw curl usage, which instead should be https://doc.wikimedia.org/mediawiki-core/master/php/classMWHttpRequest.html

Ok, @TheDJ, will try to change the CURL with MWHttpRequest thing and then, will post an updated patch. I have pulled your minor fixes, thanks for them!

@Wohlstand if you're familiar with CLI git, I suggest you to upload a brand new patch owned by yourself, so that you can later amend it. I believe all you have to do is:

cd path/to/SyntaxHighlight
git review -d 474518
git commit

Then delete the Change Id line so that it'll be recognized as a new patch, and then git review -R. If the upload is successful, you can then abandon https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/474518/ using the "Abandon" button.

Ok, thanks for a tip, @Daimona. I'm pretty familiar with CLI git as I using it for a long time (since 2014) for my own projects. Once I'll make a proper thing, I'll try to upload it again.

Okay, I have finally found a time window to replace CURL with MWHttpRequest which I have tested on my end and it works:
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SyntaxHighlight_GeSHi/+/538404/

Just now I made a fix of notices:

  • added config option into "extension.json"
  • changed that new option's type into just bool with default "false"

P.S. Can someone nuke this garbage (these three reviews I have post), IDK how to do that myself, or I just have no permission to do that:

You should be able to hit "abandon" on the patches yourself

The problem I don't see any "abandon" at all... All patches I have uploaded by Gerrit Uploader, and that just probably that robot is only who can do that, but it just can't. Therefore is only privileged person can help me to do that...

Change 474518 abandoned by Aklapper:
Allow Pygments to be run via CGI

Reason:
Abandoned in favor of https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SyntaxHighlight_GeSHi/ /538404/

https://gerrit.wikimedia.org/r/474518

Okay, all 3 first tests are now passing, however, this, 4'th unit test, just looks odd: https://integration.wikimedia.org/ci/job/mwext-php72-phan-seccheck-docker/11181/console, it fails because of rsync:

16:57:24 rsync: failed to set times on "/cache/.": Operation not permitted (1)
16:57:30 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]

Okay, all 3 first tests are now passing, however, this, 4'th unit test, just looks odd: https://integration.wikimedia.org/ci/job/mwext-php72-phan-seccheck-docker/11181/console, it fails because of rsync:

16:57:24 rsync: failed to set times on "/cache/.": Operation not permitted (1)
16:57:30 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]

No it doesn't, you can ignore that. It fails because of this (click Show Details under setup quibble prepare and scroll down):

<?xml version="1.0" encoding="ISO-8859-15"?>
14:58:18 <checkstyle version="6.5">
14:58:18   <file name="includes/SyntaxHighlight.php">
14:58:18     <error line="366" severity="warning" message="Calling method \wfWarn() in \SyntaxHighlight::highlight that outputs using tainted argument $[arg #1]. (Caused by: ../../includes/GlobalFunctions.php +1066) (Caused by: includes/SyntaxHighlight.php +337; includes/SyntaxHighlight.php +355; includes/SyntaxHighlight.php +366)" source="SecurityCheck-DoubleEscaped"/>
14:58:18   </file>
14:58:18 </checkstyle>

I'm guessing possibly because of your line 337

line 337

So, the question is how I can give the proper message string instead of using $status->getMessage()->toString()?

It's complaining because $error is double escaped, hence wfWarn would show double-escaped entities. According to the error message, that's caused by lines 337 and 355. First of all, we'd have to figure out whether double escaping is really taking place.

355

is a condition "is Ok?" and when it's NOT ok, the 337 works

$error = 'Failed to invoke Pygments: ' . $status->getMessage()->toString();

Do you mean I should un-escape the output of $status->getMessage()->toString() to stop this thing blame? Or use a different function to get non-escaped string?

Okay, I have replaced the "toString()" with "text()", I hope this will work...

I tested it locally. The error is only caused by line 337; line 355 appears in the report just because of another assignment. Message::toString is a bit confusing for the plugin, so it's probably just a false positive.

Hence, in theory it should be enough to suppress the warning. However, calling toString() without parameters is deprecated, so I'd suggest an alternative, which is concatenating the whole Status object. Status objects have a __toString() method which will return a formatted list of errors, so it's definitely fine to do that (at line 337).

Change 538404 had a related patch set uploaded (by Krinkle; owner: Vitaly Novichkov):
[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Allow Pygments be run via CGI

https://gerrit.wikimedia.org/r/538404

pygmentize-cgi.py (Gerrit patch)
args = form.getvalue("params")
argsArr = args.split()
argsArr.insert(0, "./pygmentize")
argsArr.insert(0, "python")

p = Popen(argsArr, stdin=PIPE, stdout=PIPE, stderr=PIPE)
output, err = p.communicate(form.getvalue("indata"))

I'm can understand a hosting provider supporting CGI without allowing sub processes to be spawned. But, the above does the same as what the PHP did. It doesn't call the Python module directly but also spawns a sub process, using Popen.

It seems odd I think for a host to allow Popen in Python but not in PHP? Should we ship and maintain this as built-in functionality for that use case?

Also, it's worth thinking about abuse vectors and security. We would normally expose this although through the API so that we can rate limit it within a session or by other means. Not sure how to do that in this case.

It seems odd I think for a host to allow Popen in Python but not in PHP? Should we ship and maintain this as built-in functionality for that use case?

The fact, host provider just don't care about python/ruby/Perl/etc. which just runs under CGI as well as ran via SSH as most of the users are using PHP which doesn't need to use CGI. It's just the case of my hosting provider who is a http://masterhost.ru.
I don't think other providers doing the same for all other languages. The odd fact IS that provider does limiting of Popen on PHP but just takes no care about other languages are can work. At the same time, this workaround can be used to run the same pygments on a different server when python on cgi-bin here also can't work. However, I know no hostings are limiting python on cgi-bin with such of crap like this.

Also, it's worth thinking about abuse vectors and security. We would normally expose this although through the API so that we can rate limit it within a session or by other means. Not sure how to do that in this case.

I think that would be done on the side of this python script to check the source of input and don't allow everyone except self (a Wiki page) to use this. For example, using the IP address or... some sort of secret key that will be written on both sides: LocalProperties.php and that python script. The key can be accepted by script like one of the post keys (here are already POST data sent with keys). The output shouldn't be any sort of encoding, it's just raw output of pygments.

So, my idea is:

  • The extension sends the data to CGI and gives request data and secret key
  • Python reads arguments and gets the secret key.
  • If a key is matching, it works
  • If a key is not matching, printing an error message and don't run pygments.py

At the same time, the whole purpose of CGI-ran pygments is only a "Lifebuoy" for users of hosting providers where Popen is blocked in the PHP config. Users with own server can freely set up their PHP as intended and use Pygments normally.

Just my small trivia:
Anyway, it's just sad to drop the support for GeSHi as it worked anywhere without any Popen requirement. I understand that Pygments gives much better support and much more set of languages to highlight, however, it:

  • won't work if Popen is blocked (A reason why I made this feature)
  • it's python code which requires also python interpreter be installed in the system
  • Python can't work as fast-cgi to serve many queries, and every need runs an extra python instance.

The best way is would keep GeSHi as a fallback, however, it's too late as it was dropped away a long time ago.

Thanks, I understand. I've had success asking hosting providers to enable these functions in PHP. Especially if they are expert friendly (like yours) and have support for Cron and limited SSH, they might be willing to consider this.

Note that the extension is designed to gracefully fallback to showing the text pre-formatted without highlighting. If you find user-visible errors when popen is blocked, that should not be the case and reported as a high priority bug.

Providing this via CGI as option could work, but indeed will not be very fast, and should at least require a secret key and MW-side rate limiting of some sort.

MW-side rate-limiting of some sort.

The same fact of re-running of python via Popen is also slower than just using native PHP like GeSHi was. Isn't page caching already solves this? So, re-running of Pygments shouldn't be done until the page will be changed for any reason.

Note that the extension is designed to gracefully fallback to showing the text pre-formatted without highlighting. If you find user-visible errors when popen is blocked, that should not be the case and reported as a high priority bug.

The fact I have installed an extension (originally I have used GeSHi-equipped variant), and now it doesn't work - it DOESN'T highlights anything. Then, I have dug internals and I have found a reason why it won't work - Popen is blocked. So, it's just a sad: I wanna have syntax highlighting, but It's no more reason to stick ancient extension which is no more compatible with modern and the only way to workaround. So, when Popen is blocked and when an extension is installed and no effect shown, so, it's a BIG problem and panic of users who do not know why it won't work. Especially for the case, the user isn't a programmer like you or me, and the user just can't figure out for a reason.

a secret key

So, I'll provide that. The first copy of secret key will be stored in the LocalSettings.php, and second - inside that python script (require a user to edit it until begin to use it).

Okay, just now I have added the secret key requirement. Without a secret key, it will don't work.

  • In Wiki settings it's $wgPygmentsCgiSecret
  • On python side it's inline secret_key = "<your key here>"
This comment was removed by Wohlstand.

Can anyone poke a verifying again? https://gerrit.wikimedia.org/r/538404
A week ago I have sent a 7'th patch to that, and which isn't tested yet.

Just now I have sent the 8'th patch to fix an error that was reported by that bot.

Hello!
Do you have any news? Here is a patch I have sent more than one month ago https://gerrit.wikimedia.org/r/538404
Can you review and merge it, or comment it out if any questions are still?

Do you have any news? Here is a patch I have sent more than one month ago https://gerrit.wikimedia.org/r/538404
Can you review and merge it, or comment it out if any questions are still?

Thanks for the ping and sorry that this has not received a review. https://www.mediawiki.org/wiki/Developers/Maintainers states that SyntaxHighlight_GeSHi is stewarded by Editing-team hence explicitly adding their tag here to get attention.

Hello again!
What the current state of the patch review? I reminding again after two months waiting.

Change 538404 had a related patch set uploaded (by Aklapper; owner: Vitaly Novichkov):
[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Allow Pygments to be run via CGI

https://gerrit.wikimedia.org/r/538404

Meh, there was an empty line so the latest patch was never linked here. :(

Just now I had to resolve conflict and I rebased my commit to the top.

Just now I sent a new patch to resolve the given notices.

Esanders removed a project: Editing-team.
Esanders subscribed.

The editing team won’t be working on this.

@Wohlstand Which hosting provider is this? It it surprising to me to find there is a provider of shared hosting that supports PHP with popen blocked, but does alllow running separate CGI web servers in Python. Have you tried reaching out to them to ask if the disable_functions restriction may be lifted?

The fact I have installed an extension (originally I have used GeSHi-equipped variant), and now it doesn't work

Sometimes software changes. Things cannot remain the same forever, unfortunately. You are welcome to fork and support an alternate version of the original Gesh-php extension for MediaWiki, or to fork and support an alternate version with Pygments-CGI. These could also be hosted in Wikimedia Gerrit and promoted on mediawiki.org.

If you can find one other person to maintain it with you, then we could also consider providing these within the SyntaxHighlight extension as configuration options, like your current patch does. However some amount of testing and approval might be needed still (e.g. from Editing-team, or someone else).

@Krinkle, the shared hosting at the Masterhost company (https://masterhost.ru/), gives the system-wide PHP installation and doesn't allow to give custom php.ini completely.

Have you tried reaching out to them to ask if the disable_functions restriction may be lifted?

Had, many times, they won't do anything.

but does allow running separate CGI web servers in Python

For any raw CGI applications, if I will write the script in bash and will give the proper print, it will work. Anyway, if hosting won't allow running CGI apps, then, do the same as with the Parsoid - run it on a different server. The same is possible with my Pygments-CGI: it can work even at a different server (the HTTPS will be needed together with the secret key written on both sides).

Pygments-CGI

At me is not a mod of Pygments: it's the CGI overlay that passes input into regular Pygments. The Pygments itself is the same 1:1 unmodified.


I wanted to just send this patch into the mainstream to just stop re-apply the same patch every next release of MediaWiki (that also needs to re-tweak changes if the code majorly changes between versions).

Also, I didn't expect a long painful bureaucratic process for two years to just send a small patch into the little extension...

The editing team won’t be working on this.

@marcella, @Esanders: As https://www.mediawiki.org/wiki/Developers/Maintainers lists the Editing Team as code stewards, does that mean this ticket and its patch should be declined? Or how to proceed here?