Page MenuHomePhabricator

Consider switching to TextMate-based language grammars
Closed, DeclinedPublic

Description

Geshi seems under maintained, its overall model not very suitable for integration in modern web applications, and with questionable performance.

Looking at various other popular and industry standard syntax highlighters, most feature a generic parser. They make use of grammars to map characters to language-neutral concepts. The output html uses generic class names for themes to use in their stylesheets. Some languages only implement some concepts. Other concepts only apply to a small subset of languages. Once abstracted into what needs (different) colouring, only very few separate entities remain.

TextMate themes provide format for language grammars and css classes. Lots of software has adopted this standard (editors like TextMate, Panic Coda, Sublime Text, and Atom). As well as ports into other languages for use server-side in web applications (e.g. GitHub uses it as well).

Themes:

Language bundles:

See also T85794: Convert SyntaxHighlight_Geshi from Geshi to Pygments (was: Don't register 250+ modules)

Event Timeline

Krinkle raised the priority of this task from to Medium.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Aklapper, Krinkle.

@Krinkle, is this a project that a seasoned contributor could finish in 2-3 weeks?

Very important: are you still aiming to have a GSoC / Outreachy intern working in the next round (deadline for applicants in 7-9 days), or are you proposing this for the next round in 6 months? Assuming that you would be the mentor, we would need a co-mentor, basic description of skills required, and some microtasks (good first task tasks) to be completed as part of the evaluation.

I note that we also have CodeMirror and CodeEditor which both already use TextMate-like language grammers, but are unfortunately fully client side solutions.

I note that we also have CodeMirror and CodeEditor which both already use TextMate-like language grammers, but are unfortunately fully client side solutions.

Doesn't have to be unfortunate. @ori and I were also evaluating making ours fully client-side. Although come to think of it, that would't help T85794 (Don't register 250+ modules client side). The reason behind using TextMate language grammars is that we'd be able to use a single unified stylesheet and only need the per-language logic server-side. If we do the language handling client-side we need to have "everything" client-side again. Which would lead to the same module bloat (but this time with JS instead of CSS). Though there are ways to make this work by e.g. combining languages in larger buckets.

But yeah, for the purpose of this task I'd like to see if we can process TextMate language grammars server-side in PHP and providing a theme on the client. This will likely be a lot easiest to implement as a new MediaWiki extension that wraps around a re-usable php library for parsing TextMate lang grammars, and then loads the relevant stylesheet (similar to how SyntaxHighlight_GeSHi is a relatively lightweight wrapper around geshi).

Once stable, we may add a small extra boolean flag to support Geshi language codes to support <source lang=".."> attributes in case the identifiers are slightly different (e.g. Geshi html4strict to TextMate html or something like that). And then switch deployment to use that extension instead of SyntaxHighlight_GeSHi.

Often client-side JS can also be run as a simple node service.

For now we've switched from Geshi to Pygments which alleviates most of our immediate concerns. In the future we can consider migrating to a library that support TextMate grammers.

Note that we did consider Linguist from GitHub, but that library doesn't actually contain the sytnax highlighting logic. GitHub migrated from Pygments to a proprietary library that has not been open-sourced as of yet.
https://github.com/github/linguist/issues/1984#issuecomment-69497419
https://github.com/github/linguist/issues/1717#issuecomment-63681177

The Linguist library focusses on detecting the language a given file is written in, but the syntax highlighting is handled elsewhere.

This is a message posted to all tasks under "Re-check in September 2015" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

@Krinkle - Is this still an appropriate Outreachy/GSoC project ?

@Krinkle If you see this project as a potential project for Outreachy, kindly list the microtasks to be completed by applicants as part of evaluation.

Have we done analysis on language support of TextMate grammars vs Pygments?

TheDJ removed a project: Performance Issue.

We have since gone the route of pygments. We can always revisit, but it doesn't seem likely in the short term.