I was concerned about the time spent by mediawiki in the Cite extension. So, on my server, I ran a profiler on the master branch of mediawiki while it was parsing (with the core parser, not parsoid) a copy of the page Falsifiability. Here is a graphical representation of the result.
For each node in the graph, the first number in the parenthesis is the time in milliseconds spent in the node after excluding the time spent in its children. The second time includes the time spent in its children. The colors are based on the exclusive time. Darker colors are for larger exclusive times. They are attributed in such a way that the total exclusive time for each color is approximately the same. Many profilers remove nodes with very small execution times. This profiler groups them in ColorGroup nodes instead. To get a nicer graph I had a colorGroup that includes nodes with up to 70 ms. So, there is some lost of information inside these groups.
We can see that approximately 1.5 seconds was spent exclusively on a total of 81 processes inside the node with label "braceSubstitution_#invoke:citation/CS1". It is the node with the darkest color. We can also see that not much time was spent in comparison in the Cite extension processes:
- 85.1 ms in a total of 106 "extensionSubstitution_ref" processes under "braceSubstitution_#invoke:Footnotes"
- 116.6 ms in a total of 2 "extensionSubstitution_references" processes under "internalParse" (indirectly through ColorGroup nodes).
- Not shown, hidden in a group, there is a 63.4 ms "extensionSubstitution_ref".
- There are also a lot of small ref processes hidden in the colorGroups, but they had up to less than 100 ms.
This is telling me that the Cite extension is not where you can improve the total performance.
Each node in the graph is an aggregation of similar processes selected by profiling statements that I have inserted in the code: you decide which process segments will be monitored. The label of each node is selected by the profiling statements. (There are more than 200 different labels some hidden in colorGroups, but I only inserted 3 profiling statements to generate them.) For each node, there is one and only one arrow with a +n label toward it, usually going downward. The +n means that there are this arrow and n extra hidden arrows toward the node. Thus the target node is an aggregation of n + 1 process segments.
The other arrows , usually going upward, with a number n without the "+" informs that n processes in the target are parts of processes in the source: the source is calling indirectly the target.