Page MenuHomePhabricator

No syntax highlight for large JavaScript/CSS/Lua/API pages
Open, Needs TriagePublic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
He7d3r set Security to None.Jun 28 2015, 1:10 PM
matmarex added a subscriber: ori.Jun 28 2015, 3:49 PM

This is actually not a Pygments limitation, but a new hardcoded limit in the SyntaxHighlight extension, added by Ori in 6484894497f59080eb3e62da4d11b658c84e19f4. The limit is 100 kB (102400 bytes), your page is 200290 bytes.

@ori, is Pygments known not to perform well for pages this large, or was this "just in case"? Would it be unwise to raise the limit? (Or possibly just remove it, leaving the length constrained only by the 2 MB page size limit?)

ori added a comment.Jun 28 2015, 9:16 PM

@ori, is Pygments known not to perform well for pages this large, or was this "just in case"? Would it be unwise to raise the limit? (Or possibly just remove it, leaving the length constrained only by the 2 MB page size limit?)

It was just in case. Would syntax highlighting be useful for https://meta.wikimedia.org/wiki/User:He7d3r/global.js ? Syntax highlighting is meant to make code more readable to humans, but it seems to me that a 200 kB blob of partially-minified JavaScript is basically forfeiting readability anyhow. Not to mention: the highlighted HTML for that page would be approximately 1.2 megabytes in size.

He7d3r added a comment.EditedJun 29 2015, 12:28 AM

My global.js page is like that (minified manually, and with tons of scripts copied in a single page) just because I needed a workaround for the current limitations of ResourceLoader and gadgets (mainly T36958: User-level gadget repositories). Otherwise, I would not minify anything, or copy anything every time I update some code.
So, no the highlighting is not useful there. But it is useful on other giant pages, such as
https://en.wikipedia.org/wiki/MediaWiki:Gadget-popups.js
https://commons.wikimedia.org/wiki/MediaWiki:Gadget-HotCat.js
https://pt.wikipedia.org/wiki/MediaWiki:Gadget-APC.js/List.js

He7d3r updated the task description. (Show Details)Jun 29 2015, 12:32 AM
ori added a comment.Jun 29 2015, 3:23 AM

The requirements for code that runs on the Wikimedia cluster are not merely that it behave well here and now, but that it continue to behave well even as it is ignored, misused, or substantially repurposed. The ratio of number of maintainers to code surface mandates being conservative in this way.

@He7d3r, don't take this the wrong way, but I am going to reject the report, even though your use-case is valid. I hope you understand.

ori added a comment.Jun 29 2015, 3:23 AM
This comment was removed by ori.
ori closed this task as Declined.Jun 29 2015, 3:24 AM
ori claimed this task.

Yeah, no problem. Maybe it is worth to add a note on Tech News?

MZMcBride reopened this task as Open.Jun 29 2015, 12:28 PM

Hmmm, I'm re-opening this for further consideration.

I don't like the current behavior as it's unintuitive to users and provides no feedback about why highlighting has stopped working. Why would any user know or expect that a page with a length greater than 102400 bytes suddenly stops being highlighted? I'd like to see better error reporting investigated here.

I vaguely remember a similar issue coming up when Scribunto/Lua was deployed. As I recall, we tweaked the lexer to have less verbose HTML output. I'd like to see solutions like that explored before we impose a hard and arbitrary limit.

Finally I'd like to know the impact of this change. How many pages will be affected by this new limit? It's a little embarrassing that pages such as https://en.wikipedia.org/wiki/MediaWiki:Gadget-popups.js with 71 page watchers (a relatively high number for an obscure MediaWiki page) no longer have syntax highlighting on the view action. (It's even weirder that CodeEditor still activates on the edit action.)

MZMcBride removed ori as the assignee of this task.Jun 29 2015, 12:28 PM
matmarex added a comment.EditedJun 29 2015, 12:29 PM

Another practical use case would be https://en.wikipedia.org/wiki/MediaWiki:Gadget-popups.js , which is 250k of code on one page or https://en.wikipedia.org/wiki/User:Cacycle/wikEd.js , which is a whooping 650k. I'm not sure if these are the master versions of these scripts, or built from some saner form. (One thing I expected to hit the limit but doesn't is Twinkle, which is reasonably spread over several files/pages.)

(edit: mid-air collision with MZMcBride)

Those are the master versions =/

The 100k limit also applies to large modules like https://en.wikipedia.org/wiki/Module:Citation/CS1 and https://en.wikipedia.org/wiki/Module:Convert. Both of these have also been copied to and are in use on other wikis. The old version of whatever highlighting code we had worked. The new and improved version should also work.

He7d3r renamed this task from No syntax highlight for large .js pages to No syntax highlight for large JavaScript/CSS/Lua pages.Jun 30 2015, 10:57 AM
TheDJ added a subscriber: TheDJ.Jun 30 2015, 2:13 PM

The new and improved version should also work.

While a nice thing to strive for, it's not a goal in itself. The goal of having a library that was being supported and in active development was much higher and if there are some small trade offs that we have to make, then so be it.

I vaguely remember that ori said that save times for large modules is significant higher with pygments. So removing this limit might increase save times for those pages beyond reasonable limits. Is that correct @ori ?

ori removed a subscriber: ori.Jun 30 2015, 4:03 PM
ori added a subscriber: ori.

I vaguely remember that ori said that save times for large modules is significant higher with pygments. So removing this limit might increase save times for those pages beyond reasonable limits. Is that correct @ori ?

Yep.

ori removed a subscriber: ori.Jun 30 2015, 4:04 PM

I vaguely remember that ori said that save times for large modules is significant higher with pygments. So removing this limit might increase save times for those pages beyond reasonable limits.

What kind of page save time increase are we talking about? Writing a large article with a lot of links and templates also takes more time (to save, parse, and render the page), but users are typically willing to make this trade-off and it's easily explained.

The error reporting here isn't adequate, in my opinion. Maybe auto-adding yet another tracking category would be a reasonable solution here? That would at least give some indication in the user interface of what's going on.

Izno added a subscriber: Izno.Jul 6 2015, 3:32 PM
matmarex added subscribers: ori, csteipp.EditedJul 6 2015, 4:28 PM

I did some unscientific measurements on a local installation of MediaWiki, looking at the value of wgBackendResponseTime after purging a .js page (curl -s -L "http://localhost/w/index.php/...?action=purge" | grep BackendResponseTime), repeating each test a few times. This was done inside a VM on a rather slow machine with unoptimized MediaWiki installation, so it's likely to be several times faster running on WMF infrastructure. For example, MediaWiki:Gadget-popups.js has wgBackendResponseTime of ~300 ms right now (with Pygments enabled, but not actually highlighting things).

I changed the Pygments version of the code not to separately cache the highlighting results, to make the purging actually redo the highlighting (otherwise I'd have to edit the page every time), and set $wgGroupPermissions['*']['purge'] = true; to allow anonymous purge.

Time (no highlighting)Time (GeSHi)Time (Pygments)
MediaWiki:Gadget-popups.js (250 kB)~700~1900~2800
User:Cacycle/wikEd.js (650 kB)~300~7100~11000

So, according to my unscientific measurements, Pygments is at worst two times slower for large pages than GeSHi. I'm fairly sure the servers can take it :) (especially with the aforementioned additional caching of the Pygments version, making it impossible to generate large load by doing action=purge repetitively like in this test – one would have to edit a page, which would be immediately noticeable to everyone that something is amiss).

Change 223053 had a related patch set uploaded (by Bartosz Dziewoński):
Remove the 100 kB code size limit for highlighting

https://gerrit.wikimedia.org/r/223053

So, according to my unscientific measurements, Pygments is at worst two times slower for large pages than GeSHi.

The Geshi project is no longer maintained. It's implementation encourages bad performance.

matmarex added a comment.EditedJul 6 2015, 8:05 PM

These statements do not necessarily contradict each other :) Pygments needs only one CSS stylesheet for all languages, while GeSHi needs one stylesheet per language. So while actual highlighting might be slower (the difference is probably small for most cases, only visible to naked eye on stupidly large pages like these), the Pygments version requires less code to be sent to the user and is lighter on ResourceLoader (only requires one module to be defined rather than hundreds), hopefully resulting in quicker-loading cached pages.

So long as pygments respects the timeouts that we set for external programs ($wgMaxShellTime, $wgMaxShellWallClockTime), then someone shouldn't be able to use this as a DoS beyond what we accept for normal pages.

Instead of https://gerrit.wikimedia.org/r/223053, I'd prefer that the code keeps enforcing a limit that's easy to configure, and we choose one that is big enough for most of our use cases. Then if we see it being abused, we can lower it quickly.

He7d3r renamed this task from No syntax highlight for large JavaScript/CSS/Lua pages to No syntax highlight for large JavaScript/CSS/Lua/API pages.Jul 7 2015, 12:09 PM
He7d3r updated the task description. (Show Details)

So long as pygments respects the timeouts that we set for external programs ($wgMaxShellTime, $wgMaxShellWallClockTime), then someone shouldn't be able to use this as a DoS beyond what we accept for normal pages.

Hmm, it currently doesn't. We use the Kzykhys Pygments wrapper, which uses Symfony ProcessBuilder for shelling out, which presumably has its own limits (at a quick glance there's some sort of 60 seconds timeout). It might be configurable.

Instead of https://gerrit.wikimedia.org/r/223053, I'd prefer that the code keeps enforcing a limit that's easy to configure, and we choose one that is big enough for most of our use cases. Then if we see it being abused, we can lower it quickly.

Done, amended the patch.

The error reporting here isn't adequate, in my opinion. Maybe auto-adding yet another tracking category would be a reasonable solution here? That would at least give some indication in the user interface of what's going on.

Auto-adding a tracking category is happening now when the input is "too large." However, a generic tracking category is being used to cover both invalid language selections and overly large inputs, making the user experience even more awful.

It took me a few minutes to figure out that this edit was made not because <source lang="text"> is invalid markup, but because the input is larger than the secret and unreasonable threshold of 102,400 bytes.

jayvdb added a subscriber: jayvdb.Jul 15 2015, 9:51 PM

Change 223053 abandoned by Bartosz Dziewoński:
Raise the code size limit for highlighting from 100 kB to 2000 kB

Reason:
I would like to pursue this at some point in the future.

https://gerrit.wikimedia.org/r/223053

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 15 2015, 1:06 AM
He7d3r updated the task description. (Show Details)Apr 11 2016, 12:01 PM
Krinkle removed a subscriber: Krinkle.May 10 2016, 4:11 PM
Rical added a subscriber: Rical.EditedApr 10 2017, 8:12 AM

I am not sure to be in the right task.
For some days the edit panel to change Lua modules has lost its highlight style and its lines numbers.
Perhaps in fr.wikisource.Mediawiki:
1.29.0-wmf.18 (rMW9ab639f85dc6) UTC-20170403-15:20
1.29/wmf.19 (e4784c6) UTC-20170407-20:52
1.29.0-wmf.19 (e4784c6) UTC-20170407-20:52
1.29.0-wmf.19 (6765327) UTC-20170408-00:13
My code size is 1.2 Mo on disk, 580 k UTF8, stable for 3 months.
Also when I reduce it at 70% the result is the same: no lines numbers, no highlight.

To reproduce, edit any module of any size in fr.wikisource.org or vi.wikipedia.org
use last Firefox 52.0.2 (64 bits) or Chromium 57.0.2987.98
on Ubuntu 16.04 LTS updated each day, and ASUS N56JR,
also in Firefox 52.0.2 (32 bits), Windows 8 on another ASUS type.

Rical added a comment.Apr 11 2017, 7:20 PM

Have you the same bug?
Where could this effect coming from?

  • My user:vector.css or common.css? But I have them only in fr.wikisource.org and not in vi.wikipedia.org
  • A step to prepare T121470 Central Global Repository?
  • A step to prepare T156048 syntax highlighting?
  • A change in my Module:Central itself, which forms many viewers, always inside <div>...</div> ? But I edit with the edit-panel first, up in the page, and the result of the module below, later. Also these <div> tags are, in the page, in other tags which protect them to interfere.
  • A mix of these origins?
Rical added a comment.EditedApr 13 2017, 7:08 AM

Small or large modules edit panels lost their syntax highlighting and line numbers.
Tech News: 2017-15 could explain that by replacement of HTML_Tidy or Extension:ParserMigration, see https://meta.wikimedia.org/wiki/Tech/News/2017/15

@Rical Are you talking about the edit mode or the read mode. Those are two completely different highlighters. This ticket is about the read mode.

Rical added a comment.EditedApr 13 2017, 4:38 PM

From my first post here, I always talk about the edit state.

In this state, the content of the edit panel is always in continuity with other tags.
In other words, a search at page level searches in the title, in the edit panel, in options of edit and in the content of the preview page where the module insert its results.

This happens before the first edit and between succesive edits without record.

I see also that Anomie merged the task T162908 which could be linked with my posts.

Rical added a comment.Apr 26 2017, 8:40 PM

The MW version 1.29.0-wmf.21 (e826acd) UTC-20170425-15:31 repair this bug for Lua modules.
I don't close it because I don't know what happens for JavaScript/CSS/API pages.

TheDJ added a comment.Apr 27 2017, 6:21 PM

@Rical in that case, then this ticket was not your problem, but T162877: CodeEditor default changed if you have never used the toggle was your problem.

matmarex removed a subscriber: matmarex.Apr 27 2017, 6:55 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 2 2017, 10:07 AM

@ori created https://people.wikimedia.org/~ori/bad.js as a test case, and 'Python' failed badly (likely Pygments with all of its regexes, but maybe one of its dependencies). That is an Upstream failure. See his comment on https://gerrit.wikimedia.org/r/#/c/223053/ . But that was two years ago. Perhaps this should be retested, and a bug files Upstream if there still a problem.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptJan 1 2018, 1:19 PM
rafidaslam added a comment.EditedJan 6 2018, 2:41 AM

I've just retested this bug.

By using mediawiki-vagrant:
MediaWiki 1.31.0-alpha (827c6bf) 23:57, 5 January 2018
SyntaxHighlight 2.0 (578bb79) 11:38, 2 January 2018

Steps to reproduce:

  1. Create a page in the mediawiki, that contains the same wikitext from https://fr.wiktionary.org/wiki/Utilisateur:LmaltierBot/pt1.py .
  2. Submit the page.
  3. The code doesn't get highlighted.

And yeah, like have said at the previous comments, the reason is const HIGHLIGHT_MAX_LINES = 1000; and const HIGHLIGHT_MAX_BYTES = 102400; are defined, this causes codes that contain more than 1000 lines or have size than 100kB don't get highlighted.

Increasing the limit and resubmit the page makes the code get syntaxhighlighted again.

And about https://people.wikimedia.org/~ori/bad.js created by @ori , pygments can syntaxhighlight it under 10 seconds now, at http://pygments.org/demo/6689276/ .

vagrant@vagrant:/vagrant$ ./mediawiki/extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -V
Pygments version 2.2.0, (c) 2006-2017 by Georg Brandl.
vagrant@vagrant:/vagrant$ wc bad.js
      0   17390 1000000 bad.js
vagrant@vagrant:/vagrant$ time { ./mediawiki/extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -o bad.html bad.js; }

real	0m8.078s
user	0m7.380s
sys	0m0.136s
jayvdb added a comment.Jan 6 2018, 4:13 AM

The suggested limit on https://gerrit.wikimedia.org/r/#/c/223053/2/SyntaxHighlight_GeSHi.class.php was 2000 kb.
Could you try similar JS file of that size?

rafidaslam added a comment.EditedJan 6 2018, 4:37 AM

@jayvdb yes I can.

I've copied and pasted the bad.js so the filesize increases 2 times from before.

vagrant@vagrant:/vagrant$ du bad.js
1956	bad.js
vagrant@vagrant:/vagrant$ time { ./mediawiki/extensions/SyntaxHighlight_GeSHi/pygments/pygmentize -o bad.html bad.js; } | mpstat 1 12
Linux 3.16.0-4-amd64 (vagrant) 	01/06/2018 	_x86_64_	(4 CPU)

07:56:16 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:56:17 AM  all   29.60    0.00    0.27    0.27    0.00    0.53    0.00    0.00    0.00   69.33
07:56:18 AM  all   20.11    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   79.62
07:56:19 AM  all   18.26    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   81.74
07:56:20 AM  all   32.32    0.00    0.28    0.00    0.00    0.28    0.00    0.00    0.00   67.13
07:56:21 AM  all   16.67    0.00    0.56    0.00    0.00    0.00    0.00    0.00    0.00   82.78
07:56:22 AM  all   30.06    0.00    0.28    0.00    0.00    0.28    0.00    0.00    0.00   69.38
07:56:23 AM  all   20.27    0.00    0.53    0.00    0.00    0.00    0.00    0.00    0.00   79.20
07:56:24 AM  all   17.13    0.00    1.93    0.00    0.00    0.55    0.00    0.00    0.00   80.39
07:56:25 AM  all   35.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   65.00
07:56:26 AM  all   19.41    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   80.32
07:56:27 AM  all   34.14    0.00    0.54    0.00    0.00    0.00    0.00    0.00    0.00   65.32
07:56:28 AM  all   17.31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   82.69
Average:     all   24.23    0.00    0.41    0.02    0.00    0.14    0.00    0.00    0.00   75.20

real	0m14.305s
user	0m13.680s
sys	0m0.188s

But when I try to submit it to mediawiki, I got Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 40466569 bytes) in /vagrant/mediawiki/includes/parser/Sanitizer.php on line 1553.

When the file size was still around 1MB, I was still able to submit it to mediawiki.

This is the vagrant CPU usage when I trying to submit bad.js (with 2MB size) to mediawiki:

vagrant@vagrant:/vagrant$ mpstat 1 50
Linux 3.16.0-4-amd64 (vagrant) 	01/06/2018 	_x86_64_	(4 CPU)

05:53:25 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
05:53:27 AM  all   17.76    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   81.97
05:53:28 AM  all   32.08    0.00    2.96    1.62    0.00    0.27    0.00    0.00    0.00   63.07
05:53:29 AM  all   35.08    0.00    0.52    0.00    0.00    0.00    0.00    0.00    0.00   64.40
05:53:30 AM  all   18.70    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   81.03
05:53:31 AM  all   35.96    0.00    0.26    0.26    0.00    0.26    0.00    0.00    0.00   63.25
05:53:32 AM  all   27.32    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   72.41
05:53:33 AM  all   18.48    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   81.25
05:53:34 AM  all   38.08    0.00    0.27    0.00    0.00    0.00    0.00    0.00    0.00   61.64
05:53:35 AM  all   18.46    0.00    0.28    0.00    0.00    0.00    0.00    0.00    0.00   81.27
05:53:36 AM  all   20.74    0.00    3.46    1.60    0.00    0.27    0.00    0.00    0.00   73.94
05:53:37 AM  all   29.57    0.00    2.69    1.34    0.00    0.00    0.00    0.00    0.00   66.40
05:53:38 AM  all    2.80    0.00    1.02    1.78    0.00    0.00    0.00    0.00    0.00   94.40
^C
Average:     all   24.54    0.00    1.05    0.56    0.00    0.07    0.00    0.00    0.00   73.79