Add functionality (in an extension or MediaWiki) and implement to make English Wikipedia's [[Template:Cite]] work faster
Closed, DeclinedPublic

Description

Author: RSYQFIOJGWZA

Description:
From https://bugzilla.wikimedia.org/show_bug.cgi?id=26092#c19 by MZMcBride: "it takes over 30 _seconds_ to render [[Barack Obama]]
(http://en.wikipedia.org/w/index.php?title=Barack_Obama&action=purge),
largely due to the use of overblown and obnoxious templates like
[[Template:Cite]], from my understanding. I don't know of anyone who thinks
this is acceptably fast."


Version: unspecified
Severity: major
URL: https://bugzilla.wikimedia.org/show_bug.cgi?id=26092#c19

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz26786.
bzimport created this task.Via LegacyJan 18 2011, 1:32 AM
Mr.Z-man added a comment.Via ConduitJan 18 2011, 6:43 AM

There's a lot that could be done to improve the citation templates that doesn't require modifying the software.

Most (probably 90% or more) citations don't need half of the fields offered by the citation templates. If a simple form of the common templates was created that only included the most commonly used fields (and didn't use the bloated {{citation/core}} meta-template), there would likely be a major increase in performance.

I'm actually working on this myself, but it'll be a while before I get anything done. I have some data on parameter usage, but its still pretty raw and needs some cleanup and interpretation.

Basically, the problem is that they're just big. They don't do anything especially fancy. They don't even use the oft-maligned string manipulation templates. Citation/core is over 25 kB and has almost 250 "if" statements in it. Take that and multiply it by 300 or more for some articles.

The only thing that could really make the templates as-is work faster would be to somehow make the parser faster (extremely difficult) or just throw servers at the problem (extremely expensive). The options that I can see are:
A. Make the templates themselves more efficient (i.e. less bloated).
B. Implement the templates in PHP and replace the wikitext with an extension.
C. Use something other than wikitext to program templates, like a real programming language.

While personally I think C is the best, that currently doesn't look like its ever going to happen. A is probably the easiest. B would likely produce a bigger improvement. The main downside to B is that it means any edit to a citation template will have to be done through Bugzilla and might take forever to get done.

MZMcBride added a comment.Via ConduitJan 18 2011, 6:51 AM

(In reply to comment #1)

Most (probably 90% or more) citations don't need half of the fields offered by
the citation templates. If a simple form of the common templates was created
that only included the most commonly used fields (and didn't use the bloated
{{citation/core}} meta-template), there would likely be a major increase in
performance.

Sure, but how are you going to actually solve anything? You can look at the usage section on the template description page to figure out what are probably the most commonly used parameters. It'll be things like url and first/last name. But are you going to make template forks and then replace instances of the templates? That seems like a pretty bad road to head down.

The only thing that could really make the templates as-is work faster would be
to somehow make the parser faster (extremely difficult) or just throw servers
at the problem (extremely expensive). The options that I can see are:
A. Make the templates themselves more efficient (i.e. less bloated).
B. Implement the templates in PHP and replace the wikitext with an extension.
C. Use something other than wikitext to program templates, like a real
programming language.

While personally I think C is the best, that currently doesn't look like its
ever going to happen. A is probably the easiest. B would likely produce a
bigger improvement. The main downside to B is that it means any edit to a
citation template will have to be done through Bugzilla and might take forever
to get done.

Option B is probably the best solution right now. While making narrowly tailored extensions generally isn't the best idea for something like this, it's not as though citation formats change often and the number of uses here is big enough to have a very substantive impact on user experience. I'd personally be focusing my efforts on a MediaWiki extension instead of gathering stats about parameter usage. svip has already done some work in this area.

Catrope added a comment.Via ConduitJan 21 2011, 7:37 PM

CC Happy-Melon who has been working on doing something similar for Template:Convert

hashar added a comment.Via ConduitJan 21 2011, 7:42 PM

raising severity since it impacts performances.

bzimport added a comment.Via ConduitJan 21 2011, 10:21 PM

happy.melon.wiki wrote:

There are fairly substantial differences between {{cite}} and {{convert}}. The latter is big and messy and expensive because it's doing something wikitext wasn't designed for (numerical conversions and string manipulation). It's immediately obvious how OOP, regular expressions, or even array structures and intrinsic maths, can help improve both the readability and performance of that code. I think I've largely finished refining [[Template:Convert]] and its three thousand subtemplates into ~700 lines of PHP, of which 200 are data constants. It doesn't even attempt to implement all the stupidly many features of {{convert}}, but it does the heavy lifting and lets templates put the shiny bells on, if desired, probably in a single moderately complicated wrapper template.

The {{cite}} templates are a different kettle of fish altogether. They're there to do what wikitext is *designed* to do: format plain text into pretty patterns and output it in a consistent way. The reason the templates are so cumbersome is that a) they try to cover every possible permutation of parameters for every type of source in one enormous metatemplate, and b) there's a huge amount of code that's included purely to keep various groups from tearing each other's throats out over trivial stylistic differences. The default solution when two groups have spent a year editwarring over the manual of style over whether authors should be separated by a wiggly line or a straight line, is to say "either style is acceptable, be consistent within an article, and don't convert articles from one style to another without local consensus". Hence a plethora of silly variations from one template to another and even within uses of the same template, all of which need to be supported in the metatemplate.

Moving this parsing logic from wikitext to PHP would aid in reducing that redundancy and bloat, but not by much; there is a genuine need for quite a lot of flexibility in citation layout, and we certainly can't be moving it into PHP if it's only going to get code updates pushed once a year; unless and until Wikimedia gets back onto a more vigorous scap schedule, it would be a recipe for disaster to require code updates for any change to reference formatting.

I can see a way forward for citations which takes them out of wikitext and into the software, but not as merely a string-processing widget. It would need to be a comprehensive change to citation style, making them more semantic and preferably out of wikitext altogether. Special pages to define field structures with clear semantic meaning, and separately define 'citation styles' which cherry-pick particular fields to be included in a given type of citation (web, news, journal, etc). AJAX-y out-of-edit-window editing of citations, and just a tiny XMLish tag <citation 123456/>, or more likely <ref name="usefulPerArticleTitle"><citation globallyUniqueName/></ref> left in the wikitext, that can easily be picked out by future WYSIWYG editors. Search listings to identify and merge identical references, and identify broken links, old accessdates, etc; and no doubt to allow our intrepid bot operators to run various exciting maintenance tasks on the entire corpus of citations en masse. That level of integration would be a proper justification for removing what is fundamentally a text presentation issue, from the medium which wikis generally use to present text -- wikitext -- and making it a software implementation. But taking it out of the wikitext purely to improve performance, while keeping the interface exactly the same, would actually be very difficult; writing a {{Cite: ... }} parser function which was flexible, adaptible and still usable, would be very challenging.

Would giving it the whole nine yards like that improve performance? Very possibly. We could disable brace constructs in the input to citation fields, and cache the final output indefinitely. It would probably measurably increase database load and measurably decrease apache load, and would certainly need careful deployment. But I don't really think there's much value to be gained from half measures for this particular corner of wikitext.

bzimport added a comment.Via ConduitJan 23 2011, 5:45 PM

Ruslik00 wrote:

One reason why the citation templates can not be implemented in PHP is that citation style heavily depends on language/project. Russian Wikipedia may have a very different citation style than English one. Similar the citation style in Wikiquote may be very different from Wikipedia. It also would be unreasonable to install several extensions - one for each language/project.

Convert template is less project dependent, but still language dependent. Some conversions (mi <-> km) would absolutely redundant in other languages. In addition the desirable output format may also differ. So, any PHP implementation of the convert template should be flexible enough for different languages. It should be possible to configure it using MediaWiki pages.

MZMcBride added a comment.Via ConduitJan 23 2011, 9:51 PM

(In reply to comment #6)

One reason why the citation templates can not be implemented in PHP is that
citation style heavily depends on language/project. Russian Wikipedia may have
a very different citation style than English one. Similar the citation style in
Wikiquote may be very different from Wikipedia. It also would be unreasonable
to install several extensions - one for each language/project.

There isn't anything in what you describe that prevents this functionality from being implemented in PHP. It might require a separate interface (using, e.g., a separate Special page) or more customizability (using, e.g., $first_name variables in MediaWiki messages), but it certainly isn't impossible.

bzimport added a comment.Via ConduitJan 23 2011, 10:06 PM

happy.melon.wiki wrote:

(In reply to comment #6)

One reason why the citation templates can not be implemented in PHP is that
citation style heavily depends on language/project. Russian Wikipedia may have
a very different citation style than English one. Similar the citation style in
Wikiquote may be very different from Wikipedia. It also would be unreasonable
to install several extensions - one for each language/project.

That's essentially what I said. A PHP implementation wouldn't be doing
anything very much different to what the wikitext is doing; it probably
wouldn't even do it very much more efficiently. A function which is
essentially
take-a-load-of-text-strings-entered-into-wikitext-and-display-them-in-a-special-way
is not a good candidate for implementation in PHP rather than wikitext. In
order to achieve anything meaningful, you have to change the *nature* of the
content, and turn it into semantic data.

Convert template is...

This belongs on bug 235.

(In reply to comment #7)

There isn't anything in what you describe that prevents this functionality from
being implemented in PHP. It might require a separate interface (using, e.g., a
separate Special page) or more customizability (using, e.g., $first_name
variables in MediaWiki messages), but it certainly isn't impossible.

Indeed. It's more that it's not a nice simple quick fix; this would probably be as big a project as something like ResourceLoader, maybe even as big as AbuseFilter. Not something to be thrown together in an afternoon...

bzimport added a comment.Via ConduitApr 19 2011, 10:14 PM

svippy wrote:

I'm working on it, sheesh. >_>

http://www.mediawiki.org/wiki/Extension:TemplateAdventures

bzimport added a comment.Via ConduitMay 4 2011, 7:18 AM

cogden1970 wrote:

Proposed patch which adds infrastructure for multiple style formats and additional i18n

Svippong, I have been testing and tinkering with the existing source code. On
http://en.wikipedia.org/wiki/Template_talk:Citation/core, there was a brief discussion about the possibility of incorporating several alternative citation styles, such as APA, MLA, Bluebook, Chicago Manual of Style, etc. As a demonstration and to see if it would work, I have put together a patch that implements this functionality. It is attached as TA-patch-cogden.patch, and it appears to work without issue.

It differs from the existing code in one significant way: instead of the first unnamed parameter after the colon being a selector for the type of work cited, it is now a selector for the citation style. This adds flexibility and the possibility for multiple citation styles such as Bluebook for citing legal references, or unknown future citation styles we can't foresee. The user can still include an unnamed parameter for the type of work cited (i.e., "book", etc.), but it would come at least after the first pipe symbol.

Thus, an example function call might look like this: {{#citation:APA|book|last=Smith|first=John|title=My Book|publisher=Random House}}. Instead of "APA", the first unnamed parameter can be blank or "default" in which case the function follows a default style, which would be simliar to what is now done by Citation/core. Alternatively, the user could use parameters such as "MLA", "Bluebook", "Chicago", etc. There will be separate classes deriving from TemplateAdventureBasic for each citation style. I have included in this patch the example file and class CitationChicago which currently does not render Chicago style citations, but could be made to do so as an example of a second citation style.

In my patch, un-named parameters may occur anywhere within the template call, and may be recognized by the TemplateAdventureBasic-derived class. Depending on the particular styling format, these parameters might, for example be "book", "journal", "news", "web", etc.

I've tested the patch, and it seems to work well, and it does use the CitationChicago class when a call such as {{#citation:Chicago|...}} is made, although CitationChicago for now is no different from the default class. Most of the time, I think users will use the default call with {{#citation:|...}} or {{#citation:|web|...}}. I haven't changed any of the details of rendering the citation in the Citation class.

Also, I have added a bit of additional internationalization.

Any thoughts? I don't currently have commit rights, but I'm excited about this project and would love to assist.

Attached: TA-patch-cogden.patch

bzimport added a comment.Via ConduitMay 4 2011, 1:43 PM

svippy wrote:

Your patch seems to assume I already have a CitationChicago.php file. Sure you created the diff correctly?

bzimport added a comment.Via ConduitMay 4 2011, 1:49 PM

svippy wrote:

(In reply to comment #11)

Your patch seems to assume I already have a CitationChicago.php file. Sure you
created the diff correctly?

Perhaps you can attach that file itself rather than a diff?

Bawolff added a comment.Via ConduitMay 4 2011, 6:19 PM

In reply to comment 10.

As a minor comment (I hav not looked at your code, just your comment) - having constructs like {{#citation:|web|...}} where the first parameter is an empty string doesn't seem very pretty. I'd much prefer {{#citation:web|...}} for default type and {{#citation:web|...|type=MLA}} for non-default types. (This of course is a bikeshed issue and not important in the grand scheme of things)

bzimport added a comment.Via ConduitMay 4 2011, 6:26 PM

svippy wrote:

Depends largely on which of these we consider most important. The type or the style, I am just mentioning that your example of |type=MLA should have been |style=MLA ;)

At this point I assume the type is what is likely to change more than style. In fact, if TA gets adopted on other wikis, they are likely to pick one style (global $wgParameter, obviously) and then use change type per usage rather than change the style. In fact, one might even suggest that some admins which to limit people's ability to change the style of the citations on the fly (for consistency purposes).

I would even propose another bikeshed idea, of creating a SpecialPage that auto-generates a documentation for #citation.

bzimport added a comment.Via ConduitMay 4 2011, 7:22 PM

svippy wrote:

In reply to comment #10:

I took your code changes into consideration and updated the code more appropriately in r87436 and r87437. I apologise for the two revisions for one commit. :S

bzimport added a comment.Via ConduitMay 4 2011, 8:56 PM

cogden1970 wrote:

(In reply to comment #13)

As a minor comment (I hav not looked at your code, just your comment) - having
constructs like {{#citation:|web|...}} where the first parameter is an empty
string doesn't seem very pretty. I'd much prefer {{#citation:web|...}} for
default type and {{#citation:web|...|type=MLA}} for non-default types. (This of
course is a bikeshed issue and not important in the grand scheme of things)

It would be easy to fix the code so that the first parameter could be *either* a style type *or* a work type like "book," "web," etc. If the latter, then the default style is used.

You should know that in the the Citation/core template, the work type (i.e., "book", "journal", "web", etc.) is completely irrelevant. The Citation/core template does not even accept these terms as parameters. Rather, it determines what type of work is being cited by looking at what options are set. For example, if the "journal" option is set, the template knows it is citing a journal. If "contribution" is set, it knows that it is citing a chapter in a book, unless both "contribution" and "journal" are set, in which case it knows that it is citing a subsection with a unique author in a larger journal article. If "title" and "url" are set, but "publisher" is not, it knows that what is being cited is a website, etc. Thus, if the Citation class works anything like the Citation/core template, whether or not you include a "book" or "journal" parameter will not even make a difference in how the citation is rendered.

In any event, it ought to be possible for a particular wiki to disallow all citation styles other than some default style. But I think the best way for that to work is within an infrastructure that allows for the creation of many alternate standard reference types, and extensibility for adding even more specialized, or foreign, reference types. A wiki administrator ought to be able to have several standard style formats to choose from, and there ought to be an infrastructure to add additional specialty styles if desired, or if the specialized subject matter of the wiki demands.

Having many style formats to choose from also makes it easier to maintain existing code. Once a standard styling format is coded according to academic conventions, then it is essentially done, and you don't have to keep going back to add features. You can't argue with the Chicago Manual of Style, but you can argue with the present ad-hoc style of Citation/core which is roughly APA, but can't be standard APA because it has to be everything to everybody. (Not that I'm knocking Citation/core, because I was the one that originally wrote it.)

bzimport added a comment.Via ConduitMay 4 2011, 11:06 PM

happy.melon.wiki wrote:

(In reply to comment #13)
It would be easy to fix the code so that the first parameter could be *either*
a style type *or* a work type like "book," "web," etc. If the latter, then the
default style is used.

I haven't looked at any of this code in detail, but this is a fundamentally bad idea, and the first reason is 'localisation'. "title", "url", "MLA", "Chicago", etc, all need to be localisable into foreign languages and foreign alphabets; you don't need to add support for that yourself, but you *do* need to design the syntax such that it can reasonably be added by someone else. "apa" is the word for "water" in Romanian. What might it be in other languages? Perhaps the word for "book" or "journal"? ;-)

Equally whenever you open the doors to users adding new features, you have to open them all the way: the other watchword is "extensibility". Other than the fact that the software might get confused, why shouldn't a wiki user create a citation style called "book"? More seriously, if a user creates a citation style with a name which is *not* confusing, but a developer subsequently makes that keyword a valid citation type, a much more subtle bug is introduced.

Every parameter should either be in a precisely-defined order ({{#if:<test>|<then>|<else>}} etc), or be identified by a unique keyword-and-equals-sign parameter name. Introducing 'shapeshifting' parameters, while convenient at the time, is a recipe for future problems.

bzimport added a comment.Via ConduitMay 5 2011, 1:49 AM

cogden1970 wrote:

Already taken care of. The terms "title" and "url" were already localized by svip. My patch provided localization for "MLA," and "APA," "Chicago," etc., as well. I also added localization options for "book", "web," etc.

I don't think anybody contemplates that a *user* would ever create their own citation style within this system. This would be done by developers, because it has to be done in php, or else you might as well just use a wiki template. There is no standard citation style called "book" or "web". If a developer or wiki administrator wanted to make up a style ad hoc, they should call it something that will not interfere with other magic words.

bzimport added a comment.Via ConduitMay 5 2011, 9:03 AM

happy.melon.wiki wrote:

You mean "they should call it something that will not interfere with other magic words, past, present or future, or any of their current or future translations in any language". I wasn't talking about implementing localisation -- although it's great that you have -- I was saying don't do this:

It would be easy to fix the code so that the first parameter could be *either*
a style type *or* a work type like "book," "web," etc. If the latter, then the
default style is used.

becuase whether or not you *think* you can separate out all the possible values for this parameter into two piles, one pile being citation styles and the other being work types, you can't. Pick one of the options for the first parameter, and give the other one a name=value tag.

bzimport added a comment.Via ConduitMay 5 2011, 10:19 AM

cogden1970 wrote:

I'd pick citation style, and document it as such. I only mentioned the other option as a way for the template to still work despite what you might more properly regard as a technical error. But graceful failure is certainly not a requirement, and I agree it can add complexity.

Another thought: in theory, the first parameter could be any kind of directive that tells the function what kind of citation to render. Parameters like "journal," "book," etc. are not necessary because the function should be able to infer that from the combination of options like Citation/core does. The most meaningful first parameter is one that tells the function something that it doesn't already know. Also, in Citation/core, editors have been gradually adding various options as parameters, like directives on whether to use periods of commas, or whether or not to italicize or bold. If the first parameter is a standard citation style, all these options are unnecessary, and editors will not continually be bugging the developers to add an option for bolding here, or optional brackets there.

@Svip, I don't want to back-seat drive on your excellent work, because I know how annoying that is. But I notice you added a new "getCitationStyle" function, set up so that you read the arguments twice, and note that this is inefficient and there should be a better way to do this. I had solved this problem in the working demo patch I submitted, in which the parameters were each only read once. The first parameter was read (and localized) and split off from the remaining parameters, which were passed to the constructor of the appropriate child class of TemplateAdventureBasic. If there is anything I can do to help, like put together a working CitationChicago or CitationAPA, just send me an email. And again, bravo for doing this much needed work.

kaldari added a comment.Via ConduitJun 24 2011, 12:11 AM

This seems to be a much needed feature. Editors are now resorting to untemplating citations in order to get decent page load times. Once this is in place it will also make it much easier to move RefToolbar into an extension (right now it is still on-wiki JavaScript).

Peachey88 added a comment.Via ConduitJun 24 2011, 12:12 AM

(In reply to comment #21)

This seems to be a much needed feature. Editors are now resorting to
untemplating citations in order to get decent page load times. Once this is in
place it will also make it much easier to move RefToolbar into an extension
(right now it is still on-wiki JavaScript).

This has nothing to do with moving that gadget/js to a extension, it can be done right now if desired.

kaldari added a comment.Via ConduitJun 24 2011, 12:21 AM

This has nothing to do with moving that gadget/js to a extension, it can be
done right now if desired.

It could, but it's not very practical right now. Since RefToolbar relies on templates, it needs to be easily editable to keep up with template changes. If we had a citation scheme built into the software, we could just use that scheme and not have to worry about it. I suppose it's not really a big issue, but I hate having template dependencies in extensions.

tstarling added a comment.Via ConduitAug 6 2011, 9:33 AM

With LuaJIT, it would be possible to allow users to write a {{Citation/core}} equivalent which runs much faster than a custom PHP extension, let alone the existing wikitext.

Lua allows memory usage and stack space to be precisely controlled by the host application. That is a compelling advantage over server-side JavaScript. It is also well-documented and easily embedded in the same process (and thus address space) as PHP. Thus the development time would be lower than embedding any JavaScript interpreter.

With Lua, we can control resource utilisation while maintaining a high degree of flexibility exposed to editors. Although most of our editors are not familiar with it, it is not a difficult language to learn.

Qgil added a comment.Via ConduitMar 25 2013, 4:51 AM

Looking at the summary and comment 0: FIXED?

The functionality:
http://www.mediawiki.org/wiki/Lua
http://en.wikipedia.org/wiki/Wikipedia:Lua

The Citation (was Cite) template:
http://en.wikipedia.org/wiki/Template:Citation/lua

MZMcBride added a comment.Via ConduitMar 25 2013, 5:16 AM

(In reply to comment #25)

Looking at the summary and comment 0: FIXED?

Not yet. I believe work is being done to resolve this bug, but I don't believe it's implemented yet (https://en.wikipedia.org/w/index.php?title=Template:Citation&action=history would be the relevant page history to watch, I suppose).

Bug 26092, on the other hand, is probably fixed now. I just left a comment there addressed to the bug filer.

tstarling added a comment.Via ConduitMar 25 2013, 5:46 AM

(In reply to comment #26)

(In reply to comment #25)
> Looking at the summary and comment 0: FIXED?

Not yet. I believe work is being done to resolve this bug, but I don't
believe
it's implemented yet
(<https://en.wikipedia.org/w/index.php?title=Template:
Citation&action=history>
would be the relevant page history to watch, I suppose).

You can see the migration status at
https://en.wikipedia.org/wiki/Module_talk:Citation/CS1

Currently they have done "phase 5", which is {{cite journal}}, and are discussing "phase 6", which is {{cite web}}.

MZMcBride added a comment.Via ConduitMay 6 2013, 7:38 PM

Scribunto/Lua has now been deployed to all Wikimedia wikis. Today, [[Barack Obama]] takes 12.207 seconds to parse. This reduction is presumably due to the implementation of [[Module:Citation/CS1]] and other related Scribunto modules.

I'm inclined to mark this bug as resolved/fixed.

Aklapper added a comment.Via ConduitMay 9 2013, 1:37 PM

MZMcBride: +1, thanks for finding this. Closing as WORKSFORME.

Add Comment