Investigate and re-enable action=parse API module on Wikimedia wikis
Closed, ResolvedPublic

Description

The action=parse API module was disabled because it was suspected of causing server load (or possibly bandwidth) issues. It should be re-enabled at some point.

The relevant server admin log entry is here: http://wikitech.wikimedia.org/index.php?diff=28854&oldid=28853


Version: unspecified
Severity: normal

bzimport set Reference to bz25238.
MZMcBride created this task.Via LegacySep 21 2010, 6:57 PM
bzimport added a comment.Via ConduitSep 21 2010, 10:30 PM

justin wrote:

I sincerely hope it is re-enabled soon. Now my application === broken :(

Platonides added a comment.Via ConduitSep 21 2010, 10:36 PM

Dispenser noted in #wikimedia-tech that action=parse uses the parser cache, the problem that needs disabling should be just misses.

tstarling added a comment.Via ConduitSep 22 2010, 2:27 AM

If you're using action=parse then please describe your application and typical requests on this bug report.

Bawolff added a comment.Via ConduitSep 22 2010, 2:37 AM

We're using action=parse on wikinews for the preview of a javascript tool that helps users change a template. ( [[n:WN:ML]] ).

An example request would look something like the following data posted to the api:

action: parse
format: xml
prop: text
pst: true
title: Main Page
text: {{Lead 2.0 |id=3 <!-- do not change. Each lead must have its own unique ID --> |image=Kenny McKinley.JPG |width=100x100px |type=none |title=Denver Broncos player Kenny McKinley found dead aged 23 |short_title= |summary=Kenny McKinley, an american football player for Denver Broncos, has been found dead at the age of 23. }}

bzimport added a comment.Via ConduitSep 22 2010, 2:55 AM

cbm.wikipedia wrote:

The WP 1.0 tools use it (http://toolserver.org/~enwp10 and subpages). The
results from the API are (supposed to be) cached in a local database on
toolserver with a 12 hour expiry, to lighten the load on the API.

The things that are parsed are:

  1. Templates such as [[en:Template:B-Class]], to get the formatting that has

been specified inside them. This formatting is used to make the toolserver
program give similar output to on-wiki tables. It would be silly to manually
keep the web tool code in sync with the templates. And the overall collection
of class templates is not predefined and can grow at the whim of a wikiproject.

  1. The page [[en:User:SelectionBot/HomePage]] is parsed to fill in the contents

of the http://toolserver.org/~enwp10 .

The goal of this was to prevent, as much as possible, hard-coding formatting
into the web program that could be or should be updatable by people other than
the tool's maintainer.

If there is a caching failure in this tool, it will be somewhat apparent in the
WMF server logs, because the requests would be coming from the toolserver's web
server. My logs show 4367 invocations in the last 24 hours.

Mr.Z-man added a comment.Via ConduitSep 22 2010, 3:03 AM

I use it in [[Wikipedia:RefToolbar 2.0]] to do previews of citation templates. This is one of the most popular gadgets on the English Wikipedia, though I don't know how much this feature is used and this feature is only available with the new toolbar version, so not all users have access to it. Every parse request is manually triggered by the user.

Example request:

action: parse
title: wgPageName
prop: text
format: json
text: {{cite web|url=http://www.scu.edu.au/news/media.php?item_id=1023&action=show_item&type=M|title=Birds learn to eat cane toads safely |last=Marchant|first=Gillian|date=26 November 2007|work=Southern Cross University website|publisher=Southern Cross University|accessdate=2009-05-09}}

bzimport added a comment.Via ConduitSep 22 2010, 3:07 AM

cbm.wikipedia wrote:

A third use from the WP 1.0 bot: parsing tables such as [[en:User:WP_1.0_bot/Tables/Project/Libraries]] if people want to see them in the web tool instead of on the wiki.

tstarling added a comment.Via ConduitSep 22 2010, 3:33 AM

(In reply to comment #2)

Dispenser noted in #wikimedia-tech that action=parse uses the parser cache, the
problem that needs disabling should be just misses.

It appears that the problem was caused by squid cache hits, not parser cache misses. The byte hit ratio at sq33:3128 spiked from 18% to 92%.

tstarling added a comment.Via ConduitSep 22 2010, 3:34 AM

Created attachment 7695
sq33:3128 hit ratio

Attached:

bzimport added a comment.Via ConduitSep 22 2010, 3:58 AM

pol wrote:

Our application is a free iPad application with more than 200,000 users. It's been the #1 free app during its launch for a week a couple months ago, and never left the top 10 in the Lifestyle category. As you can in its description, we're trying to create interesting alternative presentations of Wikipedia content to really make it look great on iPad: http://itunes.apple.com/us/app/id384224429?mt=8

Discover uses the parse action as a more efficient way to retrieve the contents of the pages from Wikipedia. It always uses the "page" argument, retrieving entire pages, which therefore should be in the cache (as indicated at the very end of the parsing result with the various timestamps). As far as I understanding, the application should be a good citizen toward the Wikipedia servers and it downloads less data this way.

Please re-enable "action=parse" for entire pages as soon as possible, as Discover is effectively completely broken right now.

For reference, here are all the exact parse API requests I could find in the source code:

  • action=parse&prop=displaytitle%%7Ctext%%7Ccategories%%7Cexternallinks&page=%@&redirects&format=xml
  • action=parse&prop=text%%7Cimages&page=%@&redirects&format=xml
  • action=parse&prop=text%%7Cimages&page=%@&redirects&format=xml
  • action=parse&prop=text&page=Wikipedia:Featured_articles&redirects&format=xml
  • action=parse&prop=images&page=%@&redirects&format=xml
  • action=parse&prop=links&page=Wikipedia:Featured_pictures&redirects&format=xml
  • action=parse&prop=text&page=Template:In_the_news&redirects&format=xml
MaxSem added a comment.Via ConduitSep 22 2010, 4:34 AM

[[WP:AWB]] uses action=parse for previews. This tool is used by thousands of Wikimedians.

Typical requests are action=parse&prop=headhtml before the first preview and then just ordinary

action=parse&prop=text in GET with title=...&text=... in POST for every preview. While previews aren't displayed by default, this feature is extensively used.

bzimport added a comment.Via ConduitSep 22 2010, 4:36 AM

pol wrote:

Note that other APIs don't return the same info as action=parse, so replacements is not easy.

For instance, links in the page retrieved from action=parse contain an extra attribute (exists="") indicating if the link exists. The alternative action=query&prop=links doesn't provide this info.

bzimport added a comment.Via ConduitSep 22 2010, 4:55 AM

it wrote:

I'm using the "parse" action for a site I'm developing, and I certainly don't want it suddenly disappearing from the API once we go live with it.

The site contains radio play lists and when you click on a music track it retrieves videos, images and information about each band. The information comes from our own D/B when we have it, but falls back on Wikipedia when we don't.

bzimport added a comment.Via ConduitSep 22 2010, 5:06 AM

mdale wrote:

Timed text pages:
http://commons.wikimedia.org/wiki/Commons:Timed_Text_Demo_Page?withJS=MediaWiki:MwEmbed.js
make use of the api to grab the subtitles for a given video. ( but it uses page title parse ( which should use cache ) for good measure I have added in a maxage=3600 on page parse requests to ideally hit the squids instead of the apaches, but I don't think that it has anything to do with the load issues.

But most likely the issue is caused by some cache miss issue for the site wide enabled features like Image Annotator ?

bzimport added a comment.Via ConduitSep 22 2010, 5:58 AM

pol wrote:

Not sure if related (my guess is yes though), but the iPad and iPhone versions of Wikipanion, likely the most popular Wikipedia app on iOS, are not working anymore either (page just loads endlessly but nothing shows up).

bzimport added a comment.Via ConduitSep 22 2010, 7:06 AM

kali wrote:

(In reply to comment #3)

If you're using action=parse then please describe your application and typical
requests on this bug report.

At fotopedia, we use action=parse in the following contexts:

  • server side for displayable text retrieval:

/w/api.php?action=parse&format=json&prop=text&page=....

These texts are cached on our side for 30 days. So it only happens when somebody tries to display an article page on fotopedia for the first time in a month. Before the API endpoint was unactivated yesterday, we were only requiring a dozen of pages a minute.

Right now, this part of fotopedia works fine, as long as the users don't wander in unexplored pages.

  • client side for both article search and displayable text retrieval

These queries are triggered when a user adds a wikipedia article to a fotopedia page. The typical scenario is a search, followed by a series of:
/w/api.php?action=parse&format=xml&prop=text&page=....

We only have a handful of regular users of the client software, so I don't expect this to be a threat to wikipedia server stability either. On the other hand, the impact on their side is important for us business-wise.

tstarling added a comment.Via ConduitSep 22 2010, 8:01 AM

(In reply to comment #16)

But most likely the issue is caused by some cache miss issue for the site wide
enabled features like Image Annotator ?

I think that ImageAnnotator is indeed the most likely culprit. As I said in comment #9, we're looking for squid cache hits, which most likely means requests with a maxage parameter. Between 14:20 and 14:29, we logged 71 requests with a maxage parameter in the 1/1000 sampled log. 47 were from ImageAnnotator, the other 24 were for [[MediaWiki:Sitenotice-translation]]. And since the sitenotice requests all went to the same URL, it's unlikely they'd hit the disk of the squids, which is what we saw. None of the logged requests came from a site other than commons.

Between 13:00 and 14:00, we logged an average of 74 requests per second from ImageAnnotator. 14:00 to 15:00 saw a decline to 60 req/s, presumably because sq31-33 were toast for most of that period.

I think the best thing to do for now is to disable ImageAnnotator pending a performance review. Since certain administrators on Commons like to revert me when I change things there, I will leave action=parse disabled on Commons until a regular Commons administrator removes it from [[MediaWiki:Common.js]].

bzimport added a comment.Via ConduitSep 22 2010, 8:53 AM

justin wrote:

Thanks for re-enabling the parse method.

(In reply to comment #3)

If you're using action=parse then please describe your application and typical
requests on this bug report.

For the record, my application makes requests for single pages, one at a time, from Wikipedia. An example request would be:

/w/api.php?action=parse&page=Art&format=json&prop=text|revid|links|displaytitle&redirects

I need to traverse the page content client side so it is essential that the content is parsed first, unless I write / implement a reliable parser myself and use the query method instead, though I hope now this is not necessary.

Thanks

bzimport added a comment.Via ConduitSep 22 2010, 9:34 AM

archinf wrote:

Hi,
my web application is broken either.
I need the parse method to analyze image annotations (description & license) for inclusion in our online architecture database www.archinform.net
This information is cached on our server (only updated if the images are refreshed (manually initiated)). Shouldn't cause much traffic.

Please reenable the parse method soon,

thanks
Sascha

daniel added a comment.Via ConduitSep 22 2010, 9:38 AM

I'm using action=parse for the featured article feed provided by wmde at http://feeds.feedburner.com/wikimedia/wp-adt. The feed should contain the teaser text, as it appears on the main page, as html. That is what I grab using action=parse.

bzimport added a comment.Via ConduitSep 22 2010, 9:44 AM

archinf wrote:

Forgot to say, that a typical request looks like this:

http://commons.wikimedia.org/w/api.php?action=parse&format=xml&prop=text&title=TITLE&text=TEXT

bzimport added a comment.Via ConduitSep 22 2010, 11:58 AM

archinf wrote:

Cool, works again ;)

Thank You!

Catrope added a comment.Via ConduitSep 22 2010, 12:01 PM

Reenabled action=parse on Commons. Tim had previously reenabled it on all other wikis, so action=parse is now back across the board. I'll be keeping a close eye on the API Squids throughout the day.

bzimport added a comment.Via ConduitSep 22 2010, 12:03 PM

pol wrote:

Thanks, much appreciated!

Jarekt added a comment.Via ConduitSep 23 2010, 2:01 PM

Unfortunately the fix disabled "Image Annotator" gadget used to add localized description to over 21k files. It would be great is some solution was found to re-enable this great tool.

He7d3r added a comment.Via ConduitSep 23 2010, 3:39 PM

The localized description is not added by the Image Annotator. It is added by Template:Information:
http://commons.wikimedia.org/wiki/Template:Information

Jarekt added a comment.Via ConduitSep 23 2010, 4:17 PM

(In reply to comment #29)

The localized description is not added by the Image Annotator. It is added by
Template:Information:
http://commons.wikimedia.org/wiki/Template:Information

I should have known better than use catch-all word like "localized". I agree the Localized/Internationalized descriptions (in the language of the user) provided by Information, Book or Artwork templates will still be there, but descriptions linked to a specific locations in the image ("localized"?) are gone. Those descriptions were used to annotate a each face in a group image (replacing "5th head, with hat, in the 6th row" kind of descriptions), each building in a panorama, a signature or other inscription in a painting. A lot of effort was put into annotating images to provide more information to the final user. For example every known person in the famous http://commons.wikimedia.org/wiki/File:Stroop_Report_-_Warsaw_Ghetto_Uprising_06b.jpg was identified.

Tgr added a comment.Via ConduitSep 26 2010, 9:46 PM

ImageAnnotator is enabled by default on hu.wikipedia (though only actually used on a handful of images). Is that a problem, or is it OK to use on low-traffic sites?

bzimport added a comment.Via ConduitSep 27 2010, 4:47 AM

test5555 wrote:

The dicussion at [[Commons:Commons:Administrators'_noticeboard#Stats]] suggests that this isn't related to ImageAnnotator.

The current fix for this problem broke a lot of pages on Commons. Please reexamine this problem.

tstarling added a comment.Via ConduitSep 27 2010, 5:29 AM

(In reply to comment #31)

ImageAnnotator is enabled by default on hu.wikipedia (though only actually used
on a handful of images). Is that a problem, or is it OK to use on low-traffic
sites?

Yes it's OK to use it on hu.wikipedia.org for now. The problem was an overload, that's why it happened at the weekly peak time.

(In reply to comment #32)

The dicussion at [[Commons:Commons:Administrators'_noticeboard#Stats]] suggests
that this isn't related to ImageAnnotator.

All I see there is one single person (Slomox) doing some wishful thinking.

MZMcBride added a comment.Via ConduitSep 27 2010, 6:12 AM

(In reply to comment #32)

The dicussion at [[Commons:Commons:Administrators'_noticeboard#Stats]] suggests
that this isn't related to ImageAnnotator.

The current fix for this problem broke a lot of pages on Commons. Please
reexamine this problem.

I'm re-resolving this bug as "fixed."

This bug was about getting action=parse re-enabled on Wikimedia wikis. The bug summary and comment 0 both make this clear.

It's quite possible that other issues have been exposed subsequent to this bug. In particular, there should probably be a bug about ImageAnnotator being turned into an extension, if one hasn't been filed already. But that doesn't change the resolution of this bug. If there are new issues, file separate bugs. This issue (i.e., action=parse being disabled on Wikimedia wikis), as far as I'm aware, is completely resolved.

bzimport added a comment.Via ConduitSep 27 2010, 6:31 AM

test5555 wrote:

Ok, I re-opened it mainly because of the investigation part, but obviously if it's just the "re-enable" part that is important, no problem then.

tstarling added a comment.Via ConduitSep 28 2010, 2:14 AM

ImageAnnotator can be re-enabled for now.

Further testing indicates that the "byte hit ratio" figure in squid includes error messages, and a lot of them were probably being sent at the time in question. The error counters (server.http.errors and client_http.errors) are apparently broken and never incremented.

The issue occurred at peak time, disabling action=purge probably reduced the server load to slightly below peak, bringing demand back under capacity. There are several things we could have disabled which would have had the same effect.

Add Comment