Page MenuHomePhabricator

Fix tool kmlexport
Closed, DeclinedPublic

Description

The tool kmlexport (http://tools.wmflabs.org/kmlexport/) is heavily used in both Wikipedia and Commons (see templates in wikidata:Q5845436).
I guess there are hundreds of thousands of articles that have a link to this tool, German Wikipedia already has more than 100,000 transclusions of the corresponding templates.

But the tool is down a lot ("No webservice"), complaints are getting more and more (enwiki, dewiki and commons is what I know about).
After being restarted on the toolserver, it doesn't take too long and the tool is down again. (See also wikitech:Moving kmlexport to trusty? -> ''kmlexport has been dying a lot recently''.)

Isn't there any possibility to make this tool run more stable?


see http://tools.freeside.sk/monitor/http-kmlexport.html for current availability status

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Have there been more outages since the move to trusty?

@coren can we increase the VMEM limit for kmlexport to something higher?

Yes, the tool was returning a 500 resposne for several hours today as well.

Isn't it possible to setup some quick-and-dirty "intermediate solution" -
e.g. to cron some periodic check&autorestart script using wget. Or an
autorestart via a trigger script for some system monitoring tool.

I've increased its memory limit to 7G (from 4G). Let's see if that helps.

Right now it seems to be down again.

Isn't there any possibility to make this tool run more stable?

That might be a question for Para in https://wikitech.wikimedia.org/wiki/User_talk:Para#Moving_kmlexport_to_trusty.3F (as I don't know if Para has an account in Phabricator).

The talk page is protected and there is no unified login (SUL) for wikitech.wikimedia.org. I'm not up to creating an account there, maybe some else can leave a message for Para.

After 4 days of continuous operation since the limit update advertised,
some outages today, the longest one being ~90 min:

http://tools.freeside.sk/monitor/http-kmlexport.html (time is in CET - UTC
+0100)

Have started it back again. Not sure why bigbrother didn't do that.

Hopefully we'll have a longer term solution in the next few weeks.

http://tools.freeside.sk/monitor/http-kmlexport.html looks quite bad for this morning: 4 breakdowns between 1 and 7 am (UTC+2), two of them for more than one hour.

Thanks in advance for your effort.

8+ hr downtime right now...

Isn't the monitoring triggered autorestart possible workaround for these
lengthy ill states? It should be quite straightforward to set up.

Interesting, the webservice itself was up but for some reason the proxy thought it wasn't... Our monitoring / restarting thing looks for the job being up rather than the http endpoint being hit, so that was why it was down.

I'm going to be working on the proxy infrastracture today, maybe that'll make this less common?

Plenty of dropouts last two days: http://tools.freeside.sk/monitor/http-
kmlexport.html

The overall availability still very very poor.

Nice, but it would be more relevant to monitor actual availability of each
particular service (tool) alone. The availability of kmlexport was < 90%
during last 24h, I guess. And even worse during week 17 (April 20 - 26).

So there are two things here - one is availability of the underlying tools infrastructure, and then of the individual tool itself. If the infrastructure is up and the tool is 'down' - I would think that's the tool's fault (once bigbrother / webservice issues are sorted out - which they are now for the large part). A tool that's timing out because it's doing too much / is too slow isn't something that we can help with. So any downtime when the infrastructure itself is up is probably not something that we (as toollabs admins) can do much about :(

However, individual tool monitoring is something on the radar, and should be made available in the next few weeks, hopefully. Details still being worked out...

24+ hour outage, can anyone have a look and resume normal op, if possible?

http://tools.freeside.sk/monitor/http-kmlexport.html

ToolLabs httpd itself seems to be healthy.

I think this is clearly a problem with the tool itself - and the
maintainers should look into it and see what they can do.

@yuvipanda: who actually is a maintainer of the tool?

I know of three people who may have contributed to it in the past:
https://en.wikipedia.org/wiki/User:Para
https://de.wikipedia.org/wiki/Benutzer:Kolossos
https://de.wikipedia.org/wiki/Benutzer:Plenz

Anyway, is it possible to restart it (as several times recently), or is
there some more serious problem now?

(I restarted it)

According to http://tools.wmflabs.org/ only user Para is listed as maintainer.

No change after restart this time, permanent 502 Bad Gateway response for
all requests...

Someone has already contacted Para on his enwiki talk page, but his recent
contribs are from December/January.

Hello,
I (Kolossos) and Plenz are only developers for the frontend. We have no access about the backend, also if I asked Para to get access to it.
I believe he code in Python, to query the database about Categories and Subcategories.
So I believe the subcategories are perhaps the performance killer and someone should replace these part of code by using CatGraph or something eise.
Sorry that I'm also very busy in the moment. So perhaps somebody else can help. Perhaps @dschwen has an idea.

Hi Kolossos, nice to hear from someone familiar with the thing... :)

Yes, even in times when tool was operating, there seemed to be performance
problems, caused both by depth ("unlimited" cat scan was possible) and
width (some cat queries lead to huge results even in case of "shallow"
depth limit) of cat queries accepted.

Maybe it would be sufficient just to incorporate some very basic sanity
checks on input (max depth) and/or resultset generation (max number of
nodes/items to be fetched).

In the case if Para stays quiet, do you thing it is possible to somehow get
access to the code (become a new maintainer?) and try to make the thing
operational again? It would be pitful if we should lose the functionality
now that it get so widespread across many wikis.

Well, deep category scans are pretty cheap with FastCCI. Is there an API method to fetch coordinates? If so we could immediately modify the frontend to use FastCCI (I planned to do that for WikiMiniAtlas anyways). If not I need to add Coordinate storage for FastCCI.

Yes, even in times when tool was operating, there seemed to be performance
problems, caused both by depth ("unlimited" cat scan was possible) and
width (some cat queries lead to huge results even in case of "shallow"
depth limit) of cat queries accepted.

Maybe it would be sufficient just to incorporate some very basic sanity
checks on input (max depth) and/or resultset generation (max number of
nodes/items to be fetched).

T99236: kmlexport perl script memory usage

Down a lot since yesterday. Please restart.

The tool doesn't work at least in the last hour. It messages "internal error - The URI you have requested .. appears to be non-functional at this time".

Thgoiter set Security to None.

We should probably correlate the graph in http://tools.freeside.sk/monitor/http-kmlexport.html with the puppet failures we've seen to see if they are related (i.e., does kmlexport also die when puppet cannot fork()?)

Or the other way around: If kmlexport uses all the memory it requests, the calculation for memory shared between tools goes wrong, the host runs out of memory and Puppet fails (cf. also T107665).

According to @coren the puppet issue was not actually a memory issue (file system cache was not flushed, indicating the issue wasn't actually a lack of memory, if I remember correctly).

This hasn't been addressed in over a month. Given the demand for this tool, could someone please kick it back into life?

It needs an actual maintainer who can look at the code and figure out what is causing the issue - the webserver itself is running but just 'hung'. I'm not fully sure what us admins can do.

You can consider creating a proposal at https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey. We don't have the manpower to fix tools -- we barely have enough manpower to keep tool labs itself from falling over.

The problem with kmlexport is that it uses massive amounts of memory, which regularly brings the entire server to its knees. Currently, lighttpd is hoarding 800MB of memory, and needs to allocate that when fork()ing -- which is for every .cgi request.

It's not clear to me /why/ lighttpd is hoarding this much memory, but it seems to be a result of very large responses (650/700MB in memory is plain text content).

Thanks, I have addedmy 2c worth there. Meanwhile, it is up again, thanks to whomever did that.

After discussing this issue with lighttpd people:

  • lighttpd does not release memory, so dynamic responses to big requests are a big issue
    • the current memory configuration on tool labs does not handle the lighttpd-taking-lots-of-memory, which is something we're working on. This is also the reason kmlexport is working again.. for now?
    • still, kmlexport should probably not try to serve several hundred MB kml files to clients.
  • the main suggestion when talking to people in #lighttpd is to write the data to a file, and then use lighttpd to serve the content as static file (either with x-sendfile or just by redirecting).

Does lighttpd really hold the dynamic response in memory and not just pass it through? If the culprit was PHP, there is the setting PHP_FCGI_MAX_REQUESTS which (IIRC) triggers after how many requests the PHP backend gets restarted.

But, to concur with @valhallasw, it would be much better if there could be more eyes maintaining and refactoring the application.

Yes. Lighttpd buffers between the CGI app and the client, which allows the CGI app to finish earlier (because the limit is the pipe to lighttpd rather than the internet connection to the client). This then (in theory) allows the server to serve more requests in the same time span. Then, after the request is served, the memory is not free()d because of the risk of memory fragmentation.

I think this is comparable to the situation in http://redmine.lighttpd.net/issues/1264.

I've killed it again just now, since it was killing the other webservices on the webgrid node it was on.

If nobody else steps up to maintain / fix the issues with the application, I'll still try to bring it back up when we have containers working for webservices - at least that way the damage will be limited.

The klmexport tool looks to be a poster child for the types of problems I was thinking of when I started describing Tool Labs support.

  • Widely used on-wiki
  • Single maintainer
  • No published source code
  • No published license
  • Stability problems
  • Resource consumption problems that effect all of Tool Labs

I'd love to work with community developers to fix some or all of these problems. I'm NOT volunteering to jump in and fix all the problems myself, but I would be very interested to help others work with Para to get the source code licensed and published as a first step to getting them some help in working on the stability and resource issues.

To clarify, until some additional maintainers for this tool are found or we setup our kubernetes based container setup, this tool will stay down.

With the Wikimedia Maps project looking promising, I'm not particularly interested in spending more time on the kmlexport tool. If someone wants to keep working on it, the easiest next steps off my todo list would be to use Perl memory profiling tools to find out where the memory's going, build a simple job dispatcher and run the tool on the grid for each request, use FastCCI for the recursive queries, change it to use FastCGI as it did on the toolserver, and if you must, limit the size of the results.

The source code is here. I'm not licensing it in any way, but you can do what you will with it. Happy to add maintainers to the project if someone's interested in continuing to run it there.

With the Wikimedia Maps project looking promising, I'm not particularly interested in spending more time on the kmlexport tool.

That's a unfortunate but understandable choice. The time, energy, and resources available for any developer to work on tools and other volunteer projects are a finite and precious resource. Thanks for having worked on the tool in the first place.

If someone wants to keep working on it, the easiest next steps off my todo list would be to use Perl memory profiling tools to find out where the memory's going, build a simple job dispatcher and run the tool on the grid for each request, use FastCCI for the recursive queries, change it to use FastCGI as it did on the toolserver, and if you must, limit the size of the results.

The source code is here. I'm not licensing it in any way, but you can do what you will with it. Happy to add maintainers to the project if someone's interested in continuing to run it there.

The refusal to license the software is problematic for attracting new developers. Without a clear license you are implicitly claiming copyright without providing an explanation of the rights you are willing to grant to others who wish to use or modify your software. This means that you retain all rights to your source code and that nobody else may reproduce, distribute, or create derivative works from your work until standard copyright lapses. In the US today that means until 70 years after your death. This is counter to the general principals of the Wikimedia movement and technically a violation of the Tool Labs terms of use.

I can understand the frustration of looking at the wide range of OSI-approved licenses and not wanting to have to figure out how to choose the "right" one. The two easiest choices are GPL-2.0+ if you want to ensure that all derivative works are made available under the same license terms (this is the license used for MediaWiki itself) or MIT if you only want to ensure that your original work is mentioned in a derivative project. This is a gross simplification of course. I would recommend http://choosealicense.com/ as a light primer on choosing a license.

@Para please take a minute to respond to T92963#2265474 by stating an OSI-approved license that you find acceptable. Until you do this, no one can actually step in to help solve the issues raised by @Thgoiter.

bd808 changed the task status from Open to Stalled.May 20 2016, 12:34 AM
bd808 changed the status of subtask T134416: Work with kmlexport author to license and publish source code from Open to Stalled.

Marking as stalled pending a response from @Para on OSI-approved license selection.

Closing as Declined as the tool has basically been de-facto abandoned, without a license. It's very unlikely that other developers will fix a no-license tool. Feel free to reopen this if the situation changes.

Why not just leave the task open as we do with thousands of other tasks?
This tool is important as it is massively used all over Wikimedia!

Why not just leave the task open as we do with thousands of other tasks?
This tool is important as it is massively used all over Wikimedia!

The tool is massively used, but it is also abandoned by it's only maintainer (T92963#2265237) and unlicensed. @zhuyifei1999 is correct that there really is nothing that can be done here until @Para licenses the code. Reopening this task won't annoy me, but I don't think that having an open task will suddenly make @Para act responsibly and apply a source code license so that someone could reasonably fork or usurp the project. It would be in the best interest of the wikis that are linking to this tool to find a new tool to replace it.

… or rather define what they need the tool for. AFAIUI it generates a KML file from a page's coordinates or the coordinates of all pages linked from a page or in a category. That sounds like it would fit Extension:GeoData very well.

nothing that can be done here until @Para licenses the code.

This sounds pretty much like a WTFPL license:

The source code is here. I'm not licensing it in any way, but you can do what you will with it. Happy to add maintainers to the project if someone's interested in continuing to run it there.

Or name it public domain if you want.

@Thgoiter I can't license it as anything. Only @Para and his existing contributors can do that.

He did. "Do what you want" = public domain.

He did. "Do what you want" = public domain.

According to https://en.wikipedia.org/wiki/Public-domain_software (not an official legal opinion, but a reputable secondary source of information on many topics):

Under the Berne Convention, which most countries have signed, an author automatically obtains the exclusive copyright to anything they have written, and local law may similarly grant copyright, patent, or trademark rights by default. The Berne Convention also covers programs. Therefore, a program is automatically subject to a copyright, and if it is to be placed in the public domain, the author must explicitly disclaim the copyright and other rights on it in some way, e.g. by a waiver statement. In some Jurisdictions, some rights (in particular moral rights) cannot be disclaimed: for instance, civil law tradition-based German law's "Urheberrecht" differs here from the Anglo-Saxon common law tradition's "copyright" concept.

I'm not licensing it in any way, but you can do what you will with it.

In that sentence, @Para explictly refused to license the source code. I do not personally construe the "you can do what you will with it" statement as an informed and enforcable public domain dedication. If you want to take the personal risk of forking a license-free code base and carrying on the project I will not stop you. I will not personally do so, and I will not advocate that anyone else does so.

I don't want to argue about licenses, just told my opinion.

My intention and wish is to ensure that the functionality of kmlexport won't be lost someday.
A tool that is linked from hundreds of thousands of articles should be worth being cared about.
I personally unfortunately can't help with programming tools or so.