Pages with a high number of templates suffer extremely slow rendering or read timeout for logged in users
Closed, ResolvedPublic

Description

Author: j.mccranie

Description:
I think I reported this once before, but the problem still exists. I checked other bug reports that talk about the slowness. They say they have been resolved, but this problem has not.

Long pages on the English Wikipedia that have a lot of links such as [[List of chess books, A-L]] usually take 30-35 seconds to load. A couple examples from today:

<!-- Served by srv137 in 34.009 secs. -->
<!-- Served by srv201 in 31.683 secs. -->

Before when I was discussing this problem (probably about three months ago), there was something missing dealing with cache in the downloaded file (I forgot what).

If the page has NOT changed and I am NOT logged in, it is fast. Otherwise it is slow. People were suspecting something in my Preferences is causing it, perhaps a gadget. I don't use many gadgets except Twinkle.

I've tested it under IE, Firefox, and Chrome. Chrome works the best - it seems to be fast if the page has not changed, even if I am logged in. The others are slow anytime I'm logged in, and the page has changed.


Version: unspecified
Severity: major
Whiteboard: aklapper-fixedbyLua?
URL: http://en.wikipedia.org/wiki/List_of_former_NTA_Film_Network_affiliates?action=purge

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz19262.
bzimport created this task.Via LegacyJun 17 2009, 3:55 PM
Chad added a comment.Via ConduitJun 17 2009, 8:38 PM

Updating platform to none, as this isn't a platform-dependent issue.
Switching component to Page rendering, as this is a parser issue.

MZMcBride added a comment.Via ConduitJan 14 2011, 11:23 PM

I don't believe this problem is related to the number of links. I believe it is due to the number of instances of particular templates such as [[Template:Cite book]]. A simple test should be sufficient to demonstrate this.

If I copy the current text of [[List of chess books, A-L]]] (oldid: http://en.wikipedia.org/w/index.php?oldid=407667664) into a sandbox, it takes approximately 39.5 seconds to render according to the page source ("<!-- Served by srv195 in 39.529 secs. -->"), using ?action=purge (http://en.wikipedia.org/w/index.php?oldid=407920835&action=purge) while logged in.

If I take the same text, put it through [[Special:ExpandTemplates]] and save it to my sandbox, it takes approximately 5.3 seconds to render according to the page source ("<!-- Served by srv273 in 5.353 secs. -->"), using ?action=purge (http://en.wikipedia.org/w/index.php?oldid=407924303&action=purge) while logged in.

Special:ExpandTemplates, of course, full expands the templates, their ParserFunctions parser functions, and other magic word variables, while leaving the links. This makes it fairly clear that it is not the number of links that is to blame for the slow rendering time, but is instead the number of instances of particular templates.

I'm updating the bug summary from "large pages with a lot of links still slow" to "Large pages with a high number of particular templates have unacceptably slow rendering time for logged in users" accordingly.

Chad added a comment.Via ConduitApr 29 2011, 3:39 PM
  • Bug 28744 has been marked as a duplicate of this bug. ***
kaldari added a comment.Via ConduitApr 29 2011, 5:25 PM

Changing description and severity per duped bug. Not being able to save articles means loss of new data. Also, I get read timeout errors when trying to view diffs of articles like:
http://en.wikipedia.org/wiki/List_of_former_NTA_Film_Network_affiliates

kaldari added a comment.Via ConduitApr 29 2011, 6:39 PM

Here are some reports for articles with lots of citation templates and extremely long load times:

List of former NTA Film Network affiliates (gives read timeout errors on save or view diff):
Document render time: 2-3 minutes
Preprocessor node count: 424325/1000000
Post-expand include size: 1481257/2048000 bytes
Template argument size: 353396/2048000 bytes
Expensive parser function count: 0/500

List of chess books, A-L:
Document render time: 50 seconds
Preprocessor node count: 376837/1000000
Post-expand include size: 1595542/2048000 bytes
Template argument size: 450957/2048000 bytes
Expensive parser function count: 1/500

World War II:
Document render time: 49 seconds
Preprocessor node count: 223394/1000000
Post-expand include size: 1599138/2048000 bytes
Template argument size: 563135/2048000 bytes
Expensive parser function count: 7/500

List of chess books, M-Z:
Document render time: 46 seconds
Preprocessor node count: 343867/1000000
Post-expand include size: 1682659/2048000 bytes
Template argument size: 462589/2048000 bytes
Expensive parser function count: 1/500

Barack Obama:
Document render time: 42 seconds
Preprocessor node count: 256842/1000000
Post-expand include size: 2026458/2048000 bytes
Template argument size: 823644/2048000 bytes
Expensive parser function count: 21/500

Virginia:
Document render time: 41 seconds
Preprocessor node count: 172228/1000000
Post-expand include size: 1679062/2048000 bytes
Template argument size: 833123/2048000 bytes
Expensive parser function count: 28/500

bzimport added a comment.Via ConduitMay 4 2011, 4:45 AM

j.mccranie wrote:

I'm glad this problem is finally getting some attention. I don't know if it is the same problem, but articles like [[Stalemate]] on the English Wikipedia can take more than 20 seconds to load (and longer to do a diff or save an edit.)

bzimport added a comment.Via ConduitMay 4 2011, 6:44 PM

j.mccranie wrote:

[[Endgame tablebase]] on the English WP also takes a long time to load.

kaldari added a comment.Via ConduitMay 4 2011, 7:08 PM

There are undoubtedly thousands of articles on en.wiki that take over 30 seconds to load. If you find any that take a minute or longer, however, those might be useful for testing against and/or profiling.

bzimport added a comment.Via ConduitMay 4 2011, 7:13 PM

j.mccranie wrote:

Can something be done to get these down to 10 seconds or less?

Platonides added a comment.Via ConduitMay 5 2011, 8:21 PM

I'm glad this problem is finally getting some attention. I don't know if it is
the same problem, but articles like [[Stalemate]] on the English Wikipedia can
take more than 20 seconds to load (and longer to do a diff or save an edit.)

You don't need to wait for rendering it if you just want a diff. Change your preferences to show content-less diffs or apppend diffonly=1 to the url.
Subsequent views should be cached. Do you have some cache-breaking preference enabled? (eg. marking stub links)

bzimport added a comment.Via ConduitMay 5 2011, 8:29 PM

j.mccranie wrote:

Do you mean the "do not show page content below diffs" option?

As far as cache-breaking preferences, not as far as I know, but I don't know the implications of all of the options.

Platonides added a comment.Via ConduitMay 5 2011, 9:18 PM

Yes. That should give you faster diffs, as the slow part is doing the rendering (but you need an additional click to see that content).

The worst cache offenders are 'never show cached page' and the stub threshold. Other user preferences isolate groups of users so that you can only get a cached page if someone (inclusing yourself) with the same preference set has viewed it recently before.

bzimport added a comment.Via ConduitMay 6 2011, 2:33 AM

j.mccranie wrote:

I can't find "never show cached page" or "stub threshold" under "My preferences" - where are they? Also, what other preferences isolate groups - that sounds like my problem.

MZMcBride added a comment.Via ConduitMay 6 2011, 2:48 AM

(In reply to comment #9)

Can something be done to get these down to 10 seconds or less?

You should follow bug 26786.

(In reply to comment #13)

I can't find "never show cached page" or "stub threshold" under "My
preferences" - where are they?

"Disable browser page caching" and "Threshold for stub link formatting (bytes)" under the "Appearance" tab.

(In reply to comment #13)

Also, what other preferences isolate groups - that sounds like my problem.

I don't really follow this. _Anything_ that requires a page to be parsed (purging, showing content under a diff, the parse API module, previewing a page, saving an edit to a page, etc.) is going to be slow with a lot of citation templates. A very limited number of user preferences might make this problem worse, but the underlying problem is going to remain a problem no matter what your user preferences are set to.

bzimport added a comment.Via ConduitMay 6 2011, 3:02 AM

j.mccranie wrote:

OK, I have "threshold for stub" disabled and "disable browser page caching" is not checked. These are the way they should be, right?

Bawolff added a comment.Via ConduitMay 6 2011, 3:06 AM

yes.

Note the "disable browser page caching" only affects how mediawiki gives 304 not modified responses. As far as i know, it does not mess with parser cache (btw, whats the point of that pref anyways, seems pointless, but thats off topic)

Platonides added a comment.Via ConduitMay 7 2011, 10:26 PM

My bad. You are completely right, Bawolff (description + looking pointless).

MarkAHershberger added a comment.Via ConduitMay 9 2011, 10:20 PM

per May 2, 2010 bug triage: Please let robla unassign himself so that he is reminded about this he has time to incorporate it into our future development.

kaldari added a comment.Via ConduitJun 23 2011, 11:41 PM

Any updates on this? Editors are now resorting to untemplating citations so that pages will load in a reasonable time. I just tested the Barack Obama article and got a 54 second load time (not counting the js).

MZMcBride added a comment.Via ConduitJun 23 2011, 11:50 PM

(In reply to comment #19)

Any updates on this? Editors are now resorting to untemplating citations so
that pages will load in a reasonable time. I just tested the Barack Obama
article and got a 54 second load time (not counting the js).

I think Tim did some tests regarding this problem by using HipHop instead of Zend. It's a band-aid, but it'll help for a while. HipHop dropped the parsing time down to 10ms or so on the Barack Obama article, I think? But MediaWiki isn't close to being able to switch to HipHop, as far as I'm aware. Tim started support on that, so a healthy framework exists, but the fine-grained support is all missing at this point.

These particular templates (the citation ones) could be converted into a PHP extension (and Svip has done some work on this in extensions/TemplateAdventures), but people disagree about whether that's the right approach, and perfect is the enemy of the done.

kaldari added a comment.Via ConduitJun 23 2011, 11:55 PM

Converting the citation system from templates to PHP is an interesting idea. We could add RefToolbar into it while we're at it (which is still on-wiki JavaScript).

MZMcBride added a comment.Via ConduitJun 24 2011, 12:00 AM

(In reply to comment #21)

Converting the citation system from templates to PHP is an interesting idea. We
could add RefToolbar into it while we're at it (which is still on-wiki
JavaScript).

bug 26786 — sorry, should've included this in my last comment. It has most of the discussion regarding this idea.

bzimport added a comment.Via ConduitJul 11 2011, 2:57 PM

j.mccranie wrote:

This bug has been getting worse on the English Wikipedia. Now, when I am logged in, articles that don't have nearly as many links are taking 35 seconds to load. Examples are [[Stalemate]] amd [[Zugzwang]]. These use a moderate number of inline author/date (Harvard) references.

Doing a diff, editing, or comparing selected versions takes probably 2 minutes or longer - when it works.

This is becoming a severe problem for when I am logged in.

bzimport added a comment.Via ConduitJul 11 2011, 3:13 PM

j.mccranie wrote:

Well, that is the way it was yesterday. It isn't as bad today. The pages load quick enough but the edit takes a while.

Bawolff added a comment.Via ConduitJul 11 2011, 3:29 PM

This is becoming a severe problem for when I am logged in

If its just one you're logged in, that would probably mean you are using a preference that interferes with page caching (like the stub threshold option).

bzimport added a comment.Via ConduitJul 11 2011, 3:32 PM

j.mccranie wrote:

I have the stub threshold disabled - what else can do it?

kaldari added a comment.Via ConduitAug 12 2011, 5:12 PM

It looks like this issue affects more than just citation templates. Articles with large numbers of Coord templates are also taking extremely long to load.

For example:
http://en.wikipedia.org/wiki/List_of_United_Kingdom_locations:_Am-Ar
took 59 seconds (excluding images and javascript)

RobLa-WMF added a comment.Via ConduitSep 2 2011, 8:58 PM

This isn't really a single issue. Every page is going to have a different specific reason for taking a long time to load. Generally, the problem will be some combination of the following problems:

  1. Our template language is too slow
  2. Our PHP interpreter is too slow
  3. The templates being used by the page are too complicated or inefficient

We have initiatives to solve the first two problems (#1: use a new template language like Wikiscript, Lua, or Javascript; #2: use HipHop). However, if a page is taking over a minute to parse, chances are that the templates themselves need to be made more efficient. No matter how efficient we make the template language, it will always be possible to more than offset the efficiency gain with more complicated templates. The more efficient we make templates, the more complicated people will make templates.

I think more sandbox testing like what MZMcBride did (see comment #2) would be very valuable to isolate specific templates that are ripe for optimization.

I'm not sure if this particular bug is going to be valuable to keep open. It's not specific enough to ever be closed.

kaldari added a comment.Via ConduitSep 2 2011, 9:52 PM

Thanks for the enlightening post Rob. If template complexity is really to blame, we need to make a concerted effort to communicate this to the community. For the past several years the community has been told the opposite: That they should not worry about template costs or server performance issues. For example:

"Generally, you should not worry much about little things like templates and 'server load' at a policy level. If they're expensive, we'll either fix it or restrict it at a technical level; that's our responsibility..." -- Brion Vibber, 2006

In fact, there's an entire essay on en.wiki: "Wikipedia:Don't worry about performance"

Clearly this mindset is now outdated. Perhaps you or Brion could post about this issue on the Wikimedia Blog so that we can start to change this mindset and get people working on template optimization.

bzimport added a comment.Via ConduitSep 2 2011, 11:16 PM

j.mccranie wrote:

When I first posted this, it was taking 35 seconds or longer. It got worse. But I checked it today on IE and Firefox and it was fast - about 2 seconds. Has something been fixed?

MZMcBride added a comment.Via ConduitSep 3 2011, 2:34 AM

(In reply to comment #29)

In fact, there's an entire essay on en.wiki: "Wikipedia:Don't worry about
performance"

Clearly this mindset is now outdated. Perhaps you or Brion could post about
this issue on the Wikimedia Blog so that we can start to change this mindset
and get people working on template optimization.

That's most certainly not the solution. This can't be stressed enough. Tim and I have discussed this (though he comes down on your side still, I think, or did at one point).

The scope of Wikimedia projects is the dissemination of free educational material. When you make it the job of wiki users to debug complex wiki-templates and try to fine-tune them, it's a very bad and very undesirable situation.

Users should not be worried about performance, by and large. They certainly shouldn't be concerned that they're using too many calls to citation templates (of all things!). We want users to be encouraged to cite information and build content. That's the goal. We want to discourage mindless "optimizations" (without any real debugging tools) that users will inevitably and invariably make in the name of fixing a system that they didn't break and that's not their responsibility to maintain.

(In reply to comment #28)

This isn't really a single issue. Every page is going to have a different
specific reason for taking a long time to load.

Err, prove it. The pages that I've seen that are slower all have the same root cause: too many calls to particular types of templates. Citation templates are the biggest issue, but the coord(inates) family and the convert family have also caused problems in the past.

I think more sandbox testing like what MZMcBride did (see comment #2) would be
very valuable to isolate specific templates that are ripe for optimization.

It's valuable when there's a dearth. But at the moment, finding large pages that take an excessive amount of time to load/render/parse is easy. And the solution(s) are already known (as Domas would say, this is very low-hanging fruit from an optimization standpoint). It's a matter of implementing the solutions (which I guess Tim and Victor are working on).

(And, going forward, users ideally won't even have a real concept of templates outside of "those things that make wiki-editing more standardized." We want to get users away from thinking about "{{cite web}}" or "{{coord}}" or anything like that. That's echoing what Brion and many others have said, especially as work on the new parser ramps up. Trying to get users to care about these templates and then trying to get them to make them faster is a step in the wrong direction.)

I'm not sure if this particular bug is going to be valuable to keep open. It's
not specific enough to ever be closed.

This bug is fine. When there is a better system (or systems) in place that make the pages load faster, this bug can be closed. Just because a bug is difficult or is going to likely remain open for a long time doesn't make it any less valid. There's certainly something problematic and actionable here.

(In reply to comment #30)

When I first posted this, it was taking 35 seconds or longer. It got worse.
But I checked it today on IE and Firefox and it was fast - about 2 seconds.
Has something been fixed?

Sounds like you just hit cache. (Or I suppose it's possible someone drastically reduced the number of template calls in the page you're looking at.) Do you have a particular example page/URL? Have you tried with ?action=purge appended?

bzimport added a comment.Via ConduitSep 3 2011, 2:43 AM

j.mccranie wrote:

But when it takes 35+ seconds to load a page, performance does matter! Many readers are not going to wait that long and will miss the content.

And when it takes 2 minutes to get a diff or an edit screen, performance does matter. Some editors (including me) are just not going to wait for that long of a time just to get to the edit screen or to check other's edits.

Catrope added a comment.Via ConduitSep 3 2011, 10:01 AM

(In reply to comment #32)

But when it takes 35+ seconds to load a page, performance does matter!

No one said it didn't matter. We all agree this is a problem, it's just that we're saying this is something for *us* (developers) to fix, not for template editors necessarily.

kaldari added a comment.Via ConduitSep 4 2011, 8:59 PM

Well, it seems Rob's comments have muddied the waters a bit. Correct me if I'm wrong, but Rob seems to be saying that no matter how much effort the developers put into back-end performance and optimization, the current degree of template complexity means that well-cited articles are always going to be slow. If that is the case, then I think we need to tell that to the community and have them work on template optimization. If that isn't the case, then we need to be clear about that as well, and make sure that this bug stays a high priority for the developers.

As for this bug being too vague to fix, I will personally consider it fixed when I no longer get read timeouts from trying to view diffs, which I still do as of today.

RobLa-WMF added a comment.Via ConduitSep 7 2011, 5:04 AM

Here's what I'm saying: current performance is too slow. We know it's too slow, and we have at least a couple initiatives that should make things significantly faster, along with other less dramatic improvements that we should also implement if we still have problems.

However, what I'm also saying is that there's no way to give people a general purpose programming environment, and then expect that it's going to perform well no matter what anyone throws at it. It's just not possible. It can perform well for most reasonable tasks, and we're not *aware* of any tasks that are unreasonable, but there's no guarantee that everything that every programmer does is going to be reasonable. The programmer may be trying to accomplish something reasonable, but I've seen even very good programmers make very poor performance choices in their code. On a wiki anyone can edit, there will almost always be someone(s) who is/are doing it wrong.

I believe that Brion's comment in 2006 was a reaction to the prevailing mood at the time. If I recall his account of things correctly, there was a lot of pseudoscientific "thou shalt not use the foobar template, for you will anger the performance gods, and they will smite the server kittehs". He saw that people were overreacting to advice about template performance, with no one actually doing any genuine profiling.

So, now the pendulum seems to have swung in the other direction. Yes, we need references in articles. Yes, there are plenty of other perfectly reasonable uses of templates. Don't stop doing those things. That said, if there are more efficient ways of achieving the same end using a more efficient template, please, for pete's sake, make the template more efficient. Also, please help us figure out which templates are expensive and why they're expensive. If we can actually narrow down which parts of templates suck, developers may have a better idea of what parts should be implemented directly in PHP or even C if need be.

My point is this: there's not a "problem". There are "problems". Having this all in a single bug suggests there is a single "problem", and that's what I have a problem with.

tstarling added a comment.Via ConduitOct 24 2011, 10:23 PM

Just removing the COINS metadata from {{Citation/core}} would speed up article rendering significantly.

kaldari added a comment.Via ConduitNov 4 2012, 8:49 AM

It seems that it is currently very difficult to edit http://en.wiktionary.org/wiki/a due to this bug. It typically times out when trying to save. Here is the report for the page:

Preprocessor visited node count: 479524/1000000
Preprocessor generated node count: 132979/1500000
Post-expand include size: 1772116/2048000 bytes
Template argument size: 224175/2048000 bytes
Highest expansion depth: 31/40
Expensive parser function count: 219/500

Looking forward to the deployment of Scribunto :)

TheDJ added a comment.Via ConduitNov 8 2012, 9:18 AM
  • Bug 41863 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitNov 10 2012, 12:43 AM

maysara.abdulhaq wrote:

Also it is impossible to edit article https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D9%84%D8%A7%D9%85 (arabic article about Islam)

Request: POST http://ar.wikipedia.org/w/index.php?title=%D8%A5%D8%B3%D9%84%D8%A7%D9%85&action=submit, from 41.43.16.246 via cp1006.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.141 (10.64.0.141)
Error: ERR_READ_TIMEOUT, errno [No Error] at Sat, 10 Nov 2012 00:09:00 GMT

Aklapper added a comment.Via ConduitNov 10 2012, 1:27 PM
  • Bug 41941 has been marked as a duplicate of this bug. ***
kaldari added a comment.Via ConduitNov 12 2012, 2:51 AM

Confirmed that it is no longer possible to edit the Gaddafi article without parser timeout (http://en.wikipedia.org/wiki/Muammar_Gaddafi). That makes 3 reports of significantly important articles suffering read timeout in the past week (on 3 different wikis). Since this is a more significant bug than any of the others currently assigned to Highest priority, I'm going to bump it to Highest as well.

Would it be possible for us to adjust the parser timeout time until Scribunto is deployed?

tstarling added a comment.Via ConduitNov 12 2012, 3:51 AM

(In reply to comment #41)

Confirmed that it is no longer possible to edit the Gaddafi article without
parser timeout (http://en.wikipedia.org/wiki/Muammar_Gaddafi). That makes 3
reports of significantly important articles suffering read timeout in the past
week (on 3 different wikis).

According to slow-parse.log on fluorine, parse times for [[Muammar Gaddafi]] have been stable at 30-35 seconds since the log began in May. The [[a]] article on en.wiktionary.org has been taking more than 30 seconds since June 4. This is not a new or rapidly-changing problem.

Since this is a more significant bug than any of
the others currently assigned to Highest priority, I'm going to bump it to
Highest as well.

Would it be possible for us to adjust the parser timeout time until Scribunto
is deployed?

I don't think that would be a good idea, I think it would worsen our exposure to DoS attacks, and encourage template editors to make articles render even more slowly.

Aklapper added a comment.Via ConduitNov 12 2012, 2:40 PM

Ryan: As you bumped this back to highest priority, is anybody working on this? I'd like to have an assignee for this...

tstarling added a comment.Via ConduitNov 12 2012, 10:31 PM

(In reply to comment #43)

Ryan: As you bumped this back to highest priority, is anybody working on this?
I'd like to have an assignee for this...

Three members of the platform team are working on Lua support, and I removed the COINS metadata from {{Citation/core}} on the English Wikipedia, reducing the parse time for articles with many citations by about 25%. [[Muammar Gaddafi]] now takes only 23 seconds.

kaldari added a comment.Via ConduitNov 13 2012, 12:06 AM

Thanks. I was considering doing that myself, but your edit+opinion carries a lot more weight :)

Moving priority back to High for now.

Aklapper added a comment.Via ConduitNov 13 2012, 12:14 AM

(In reply to comment #44)

I removed the COINS metadata from {{Citation/core}} on the English Wikipedia

Thanks for the workaround!

bzimport added a comment.Via ConduitNov 15 2012, 7:59 PM

libx.org wrote:

LibX (libx.org) is a COinS processor, used by over 200,000 users affiliated with over 1,000 libraries worldwide. We link users to their OpenURL resolvers to obtain referenced items - journal and newspaper articles and books.

We are in the middle of a project to greatly improve COinS processing, with Wikipedia as the primary beneficiary. Whereas the current implementation simply links users; our planned implementation would contact the user's library through such APIs as the Summon API and directly find links to where the user can get the item. This is of tremendous benefit, particularly to users of academic libraries with subscriptions to journal database or news paper archives.

Please restore this functionality, either by restoring COinS, ajaxing COinS, or using alternative microformats; please provide this functionality such that not only metadata extraction is facilitated (like Zotero needs), but also such that a user interface can be provided that alerts users that an agent has processed the metadata - LibX, for instance, places a 'cue' where a COinS appears; we would like to add a tooltip. See an example of our envisioned design here: http://libx.org/how-to-set-up-libx-with-the-summon-api/ (This shows what we right now do for ISBNs on a page - we are working on doing just that for COinS, though would probably stop this project if Wikipedia drops COinS since you are the major provider at this point.)

Thank you for your consideration.

Bawolff added a comment.Via ConduitNov 16 2012, 2:48 AM

(In reply to comment #49)

LibX (libx.org) is a COinS processor, used by over 200,000 users affiliated
with over 1,000 libraries worldwide. We link users to their OpenURL resolvers
to obtain referenced items - journal and newspaper articles and books.

We are in the middle of a project to greatly improve COinS processing, with
Wikipedia as the primary beneficiary. Whereas the current implementation
simply links users; our planned implementation would contact the user's library
through such APIs as the Summon API and directly find links to where the user
can get the item. This is of tremendous benefit, particularly to users of
academic libraries with subscriptions to journal database or news paper
archives.

Please restore this functionality, either by restoring COinS, ajaxing COinS, or
using alternative microformats; please provide this functionality such that not
only metadata extraction is facilitated (like Zotero needs), but also such that
a user interface can be provided that alerts users that an agent has processed
the metadata - LibX, for instance, places a 'cue' where a COinS appears; we
would like to add a tooltip. See an example of our envisioned design here:
http://libx.org/how-to-set-up-libx-with-the-summon-api/ (This shows what we
right now do for ISBNs on a page - we are working on doing just that for COinS,
though would probably stop this project if Wikipedia drops COinS since you are
the major provider at this point.)

Thank you for your consideration.

There's probably a good chance the Wikipedians will add back COinS metadata once scribunto is deployed assuming the assumed performance predictions hold true. At this point I'd recommend just waiting it out.

bzimport added a comment.Via ConduitNov 16 2012, 2:15 PM

libx.org wrote:

So I read through this thread, and I'm amazed, to put it politely.

There is a performance problem that affects only people logged into Wikipedia, which has got to be a small percentage of Wikipedia users, probably just contributors and editors. In response, you disable a crucial feature that allows average users to actually find the article Wikipedia cites. Not only do you disable it for editors, you disable it for everyone!

You know that people make fun of Wikipedia for its lack of reliable sources, and the circularity that sometimes results:
http://itst.net/wp-content/uploads/2009/06/informationsgesellschaft-wikipedia-presse-1024x768.jpg

I conclude a number of things. First, editors don't seem to be in the business of checking cited sources. Otherwise, clicking on a COinS, getting the primary source would be a *frequent* operation for them, and they'd be clamoring for tools like LibX that streamline this process.

Second, why was this disabled both for editors (where, I'm guessing, the page is rendered every time a visit occurs), and ordinary users (who, I'm guessing, fetch a cached, prerendered page?) Why can't the COinS be in the cached page the majority of users sees?

Third, there doesn't seem to be any metadata in the page right now. See point #1 - how are editors checking primary sources efficiently? Why did you disable this feature *before* you had a replacement?

Peachey88 added a comment.Via ConduitNov 16 2012, 7:15 PM

(In reply to comment #51)

So I read through this thread, and I'm amazed, to put it politely.

There is a performance problem that affects only people logged into Wikipedia,
which has got to be a small percentage of Wikipedia users, probably just
contributors and editors.

[Citation Needed] The issue affects the ability to edit and save the pages which in turns affects the non logged in users because people don't edit the pages to update.

In response, you disable a crucial feature that
allows average users to actually find the article Wikipedia cites. Not only
do you disable it for editors, you disable it for everyone!

This isn't a crucial feature, The primary data (the refernces) are still in the page.

I conclude a number of things. First, editors don't seem to be in the business
of checking cited sources.

[Citation Needed]

Second, why was this disabled both for editors (where, I'm guessing, the page
is rendered every time a visit occurs), and ordinary users (who, I'm guessing,
fetch a cached, prerendered page?) Why can't the COinS be in the cached page
the majority of users sees?

Because we currently don't have a system where we can do that.

Third, there doesn't seem to be any metadata in the page right now. See point
#1 - how are editors checking primary sources efficiently? Why did you disable
this feature *before* you had a replacement?

Because most people would view actually editing the page is more important than metadata making source checking easier.

Also it would be nice if you changed your Bz account from showing that your a role account for a business/website to a individual so we know who were actually talking to.

bzimport added a comment.Via ConduitNov 16 2012, 8:57 PM

libx.org wrote:

libx.org@gmail.com is backed by the LibX Team; I'm in charge of the technical aspects. LibX is no business - it's open source; though we have received federal grants to employ some students, it's primarily community driven. Our key community are thousands of librarians who have set it up for their own local communities.

Currently, I'm happy that this happened this week, and not 3 months from now, because I was just able to recruit one student to (finally) improve support for COinS - Wikipedia was our primary target. We were going to analyze the quality of the COinS (which btw wasn't good - I think that's because you had Wikitags in the metadata, like brackets), then decide on which services we needed to use to make sure the user can get to the item cited. Note that libraries have been slow to provide services that expose their knowledge base of what they hold and how their users can get access to it, which is why it's taken so many years that such a project has become feasible at all. Today, it is. Discovery systems like Summon provide full-text indices that not only include the combined content of many traditional abstracting and indexing databases, but also news paper archives, traditional library catalogs, and even local institutional sources like electronic theses and dissertation databases.

In any event, consider doing something - if the performance of your template structure is the issue, use other techniques. Provide an AJAX service, or embed the data in client-side JavaScript (like nytimes.com does), then put it together on the client. From our perspective, the goal is to show the user, upon a mouse gesture, whether they have access to an item that's cited in a Wikipedia article. If so, a single click of the mouse should get them there. This goal is difficult to achieve if only the unstructured, formatted data is present. But it's a worthwhile goal and, I'm convinced, would truly help editors if/when they check sources.

  • Godmar Back (libx.org@gmail.com)
Chad added a comment.Via ConduitNov 16 2012, 9:10 PM

(In reply to comment #53)

In any event, consider doing something - if the performance of your template
structure is the issue, use other techniques.

We are doing something different and it's under active development (and much further along than starting fresh with some AJAXy hacks). It's called Lua/Scribunto, and it was mentioned in comment 50.

bzimport added a comment.Via ConduitNov 16 2012, 9:26 PM

libx.org wrote:

I'm familiar with Lua (the programming language), and googling Scribunto leads to http://www.mediawiki.org/wiki/Extension:Scribunto which, upon 10 second inspection, doesn't explain how you'll be providing metadata.

My use of the acronym 'AJAX' was referring to the asynchronous nature any service would need to have to avoid holding up the rendering of the page, which seemed to be your main concern. In other words, the page would be rendered and sent to the user without metadata, just containing a quickly-generated key for each item. Only when the user accesses it, such as by hovering over an item, would a separate service be accessed that provides the metadata in usable form. You can see this technique in action in many webpages, and it's not a hack at all.

Chad added a comment.Via ConduitNov 16 2012, 10:18 PM

(In reply to comment #55)

I'm familiar with Lua (the programming language), and googling Scribunto leads
to http://www.mediawiki.org/wiki/Extension:Scribunto which, upon 10 second
inspection, doesn't explain how you'll be providing metadata.

It's not just about metadata...the point is that we'll be able to (re)introduce complex things to templates without causing them to take ages to render (which was the whole reason for removing it).

kaldari added a comment.Via ConduitNov 16 2012, 10:29 PM

The current plan is to deploy Scribunto to the production wikis in early 2013 (although I don't know personally if we are still on target for that). One of the first things that Scribunto will be used for is re-implementing the Citation/core template on English Wikipedia. Scribunto will allow our citation templates to be generated with a real programming language (Lua), rather than through a convoluted Turing machine of Wikitext. It is also expected that this conversion will dramatically improve page parsing time so that we are no longer teetering on the edge of the parser timeout abyss.

bzimport added a comment.Via ConduitNov 17 2012, 1:53 AM

libx.org wrote:

So - "reimplement" here means that COinS will just show up again, or will you provide metadata in a different format.

If you provide again COinS, it would be nice if you improved your implementation and made it compliant with NISO Z39.88's context object format. That would help tremendously in making items findable more easily.

Bawolff added a comment.Via ConduitNov 17 2012, 7:42 PM

(In reply to comment #58)

So - "reimplement" here means that COinS will just show up again, or will you
provide metadata in a different format.

If you provide again COinS, it would be nice if you improved your
implementation and made it compliant with NISO Z39.88's context object format.
That would help tremendously in making items findable more easily.

Well the "you" in that sentence is a bit ambiguous. "We" (MW devs) didn't have anything to do with the COinS metadata. Presumably when it gets re-added it will be done by the Wikipedians, so you would have to talk to them about different formats to use.

MarkAHershberger added a comment.Via ConduitNov 17 2012, 7:54 PM

(In reply to comment #59)

"We" (MW devs) didn't have
anything to do with the COinS metadata. Presumably when it gets re-added it
will be done by the Wikipedians, so you would have to talk to them about
different formats to use.

Note that you should talk to the wiki editors on-wiki. You probably need to post something about COinS on [[WP:VPT]]. They will at least be able to direct you to the right place.

Bawolff added a comment.Via ConduitNov 17 2012, 7:59 PM

There's conversation about the removal at [[template_talk:Citation/core]].

bzimport added a comment.Via ConduitNov 17 2012, 9:54 PM

libx.org wrote:

Thanks. At the URL you link to, there's talk about an existing API for metadata extraction. Is this true? We would be fine with an API, as long as it's REST so we can run it from the user's browser, and as long as it allows accessing the metadata for specific references on a page.

aude added a comment.Via ConduitFeb 14 2013, 11:56 AM
  • Bug 44982 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitMar 13 2013, 1:14 AM

sumanah wrote:

Project LibX, now that we've deployed Scribunto to English Wikipedia, it's a good time to engage with the English Wikipedia template editors regarding COinS metadata at https://en.wikipedia.org/wiki/Template_talk:Citation/core#LUA_deployed , if you haven't already.

MZMcBride added a comment.Via ConduitMay 6 2013, 7:34 PM

Given the deployment of Scribunto/Lua to all Wikimedia wikis, I'm inclined to mark this bug as resolved/fixed. Certain pages such as [[wikt:a]] are still taking over 30 seconds to parse, however these individual cases should be split out into individual bugs so that appropriate modules can be written on specific wikis, in my opinion.

MarkAHershberger added a comment.Via ConduitMay 6 2013, 8:08 PM

(In reply to comment #65)

Given the deployment of Scribunto/Lua to all Wikimedia wikis, I'm inclined to
mark this bug as resolved/fixed.

Agreed for all the reasons MZ gave.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.