Page MenuHomePhabricator

Combine infrequently-used language modules into a single module
Closed, ResolvedPublic

Description

Usage of SyntaxHighlight_GeSHi language modules follows a classic 80-20 pattern, with the top 20 languages accounting for 81.78% of usage. We could dramatically reduce the footprint of SyntaxHighlight_GeSHi by taking this into account.

The following statistics are based on two hours' worth of requests, collected on cp1056 (bits eqiad) via varnishncsa:

varnishncsa geshi_usage -F "%q" -m RxURL:ext.geshi.language 2>&1 | grep --line-buffered -Po '(?<=ext.geshi.language.)\w+' > geshi.log
  • The top 5 languages account for 45% of usage.
  • The top 10 languages account for 65.28% of usage.
  • The top 50 languages account for 94.50% of usage.
  • The top 100 languages account for 99.45% of usage.
  • The top 125 languages account for 99.83 of usage
  • The top 142 languages account for 100% of usage.

GeSHi supports 215 languages. I would like to

  1. drop support for the 73 languages that were not seen in the logs (done on WMF wikis in gerrit 197449)
  2. strategically group the remaining 142 languages.

Languages ordered by usage:

c
cpp
bash
html4strict
text
java
latex
javascript
python
xml
csharp
php
css
asm
sql
pascal
matlab
html5
haskell
vb
lisp
ruby
ada
oracle11
dos
rsplus
fortran
d
bnf
ocaml
pcre
perl
vhdl
actionscript
lua
bibtex
go
bf
cobol
ini
delphi
arm
scheme
objc
prolog
actionscript3
mysql
qbasic
asp
algol68
groovy
erlang
abap
email
powershell
ecmascript
glsl
sas
apache
yaml
java5
vbnet
reg
cfm
fsharp
scala
applescript
gwbasic
clojure
pli
robots
tsql
whois
freebasic
verilog
llvm
visualfoxpro
sparql
tcl
plsql
coffeescript
scilab
dot
autoit
boo
mirc
lolcode
gnuplot
eiffel
j
teraterm
oorexx
diff
smalltalk
cmake
avisynth
perl6
xpp
typoscript
basic4gl
make
awk
e
gml
jquery
zxbasic
systemverilog
6502acme
properties
oracle8
q
purebasic
pic16
ldif
rexx
unicon
urbi
modula3
mpasm
locobasic
progress
visualprolog
vala
octave
winbatch
oz
autohotkey
cadlisp
euphoria
pycon
oobas
povray
thinbasic
68000devpac
mmix
modula2
cil
mxml
io
blitzbasic
parigp
oberon2

Event Timeline

ori assigned this task to matmarex.
ori raised the priority of this task from to Needs Triage.
ori updated the task description. (Show Details)
ori added a project: SyntaxHighlight.
ori subscribed.
matmarex triaged this task as Lowest priority.Apr 8 2015, 4:27 PM
matmarex raised the priority of this task from Lowest to Low.
matmarex set Security to None.

Slightly related is T94292, where i figured out, we are actually missing a considerable amount of languages, including a few new ones that might be useful.

matmarex subscribed.

Sorry, I'm not currently planning to work on this.

Change 217899 had a related patch set uploaded (by Ori.livneh):
Rewrite to use Pygments instead of GeSHi

https://gerrit.wikimedia.org/r/217899

Change 217899 abandoned by Ori.livneh:
Rewrite to use Pygments instead of GeSHi

Reason:
OK, doing this in SyntaxHighlight_GeSHi instead. See I07446ec98

https://gerrit.wikimedia.org/r/217899

Krinkle claimed this task.
Krinkle subscribed.

Obsolete with T85794.