Page MenuHomePhabricator

Issues with 'webservice' kubernetes backend (tracking)
Open, NormalPublic

Description

Tracking ticket for collecting comments + issues about the kubernetes backend for tool labs' webservice command.

See T129309: Goal: Allow using k8s instead of GridEngine as a backend for webservices as well

Event Timeline

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 30 2016, 8:22 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
Magnus added a subscriber: Magnus.Jun 30 2016, 9:41 PM

Tried this with my tool "wikidata-todo". Result:

Warning: include_once(php/common.php): failed to open stream: No such file or directory in /data/project/wikidata-todo/public_html/index.php on line 6

Warning: include_once(): Failed opening 'php/common.php' for inclusion (include_path='.:/usr/share/php:/usr/share/pear') in /data/project/wikidata-todo/public_html/index.php on line 6

Fatal error: Call to undefined function get_request() in /data/project/wikidata-todo/public_html/index.php on line 90

It appears either the symlinks (php/common.php) are not mounted correctly, or the cwd is wrong.

@Magnus I was initially mounting only /data/project/$toolname inside the container, but since it looks like the symlink points to a *different* shared common tool, that won't work. I'm going to merge https://gerrit.wikimedia.org/r/#/c/296868/ that'll mount all of /data/project and fix this. I've already moved wikidata-todo and it seems to work fine now! \o/

Magnus added a comment.Jul 1 2016, 3:19 PM

OK, this works, but now another of the scripts in the same tool fails.

https://tools.wmflabs.org/wikidata-todo/duplicity.php?wiki=frwiki&mode=stats

Warning: mysqli::mysqli(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /data/project/magnustools/public_html/php/common.php on line 93

Warning: mysqli::mysqli(): (HY000/2002): php_network_getaddresses: getaddrinfo failed: Name or service not known in /data/project/magnustools/public_html/php/common.php on line 93

Fatal error: Call to a member function real_escape_string() on boolean in /data/project/wikidata-todo/public_html/duplicity.php on line 19

The line in question (first warning) is:

$db = new mysqli($server, $mysql_user, $mysql_password, $dbname);

$server is "tools-db", $user is "s51211", password is correct, $dbname is "s51211__duplicity_p". All good.
PHP version foobar? tools-db not available?

Restarted the webservice with default settings, works fine.

tools-db is a deprecated alias which is specified in /etc/hosts, and I think the containers only use DNS. tools.labsdb is the stable name of the host. (but we should probably add /etc/hosts to the containers as well)

@Magnus do you think you can switch to tools.labsdb and try? I think we'd
like to not modify /etc/hosts inside the containers since that complicates
things a fair bit...

Actually, I'm just going to make tools-db work - this isn't going to be the only tool running into this issue.

@Magnus ok, 'tools-db' works now. I moved wikidata-todo over again and verified the link you provided works fine. I've moved it back to gridengine just now though, just in case there are other things that're broken and I just do not know where to look for them. Can you move it again and verify / report other broken stuff?

Thank you so much for your patience!

Yes, this seems to work! I also switched most of my tools to use "tools.labsdb" (this is when sharing code across tools comes in handy!). Thanks!

@Magnus awesome! Do you think you can switch some more of your tools once
you gain more confidence in the new system? :D

Magnus triaged this task as Unbreak Now! priority.Jul 3 2016, 11:59 AM

Tried a few. Had some issue with "magnustools", so tried to restart with "--backend=gridengine". Now it's running as an "unkillable" job (state "dr"), returning 503s. Most of my other tools depend on this one, please help ASAP!

Restricted Application added subscribers: Luke081515, TerraCodes, Urbanecm. · View Herald TranscriptJul 3 2016, 11:59 AM
yuvipanda added a comment.EditedJul 3 2016, 12:21 PM

@Magnus I just forced deletion of grid job 7029745 which was stuck and restarted it on gridengine - http://tools.wmflabs.org/magnustools/ is back up.

Magnus added a comment.Jul 3 2016, 2:14 PM

It's up, but qstats as magnustools doesn't see it. Odd.

@Magnus I see it is running under kubernetes now rather than gridengine - my plain 'webservice start' rather than 'webservice --backend=gridengine start' had started it under kubernetes rather than gridengine because I guess that was the last successful start of the webservice. I've explicitly moved it back to gridengine just now.

T138787 was the cause of the earlier dr, and I'm shuffling instances around to try to handle it.

I switched my tool and I'm having an issue with file uploads. I am not sure if this is due to changes in PHP or if this is due to a storage issue. Honestly any help would be appreciated.

I am providing my logs below in case anyone finds them useful.

`DEBUG 20160703 21:20:56 image:397: Uploading /data/project/magog/tmp/RedsRetired14.png#12#0 to File:RedsRetired14.png..
TRACE 20160703 21:20:56 image:398: File owner: tools.magog
TRACE 20160703 21:20:56 image:399: File permissions: 100644
DEBUG 20160703 21:20:56 image:400: Size of file: 15200
DEBUG 20160703 21:20:56 image:428: RedsRetired14.png
TRACE 20160703 21:20:56 Wiki:561: Running API query with params https://commons.wikimedia.org/w/api.php?action=upload&filename=RedsRetired14.png&comment=%28BOT%29%3A%20Uploading%20old%20version%20of%20file%20from%20en.wikipedia%3B%20originally%20uploaded%20on%202015-07-24%2023%3A52%3A32%20by%20%5B%5B%3Aen%3AUser%3AMB27|MB27%5D%5D&text=&token=[redacted]&ignorewarnings=1&file=%40%2Fdata%2Fproject%2Fmagog%2Ftmp%2FRedsRetired14.png%2312%230&format=php&servedby=&requestid=642075177

ERROR 20160703 21:20:56 Wiki:618: API Error...

Code: badupload_file
Text: File upload param file is not a file upload; be sure to use multipart/form-data for your POST and include a filename in the Content-Disposition header.

`

To clarify, this is about an upload from your web tool does to commons? (i.e. it's not about an upload from a user to your tool)

After reading your log in some more detail, the issue is probably this:

&file=%40%2Fdata%2Fproject%2Fmagog%2Ftmp%2FRedsRetired14.png

You're not including the file contents, but the path to a file. In addition, it seems you're doing a GET rather than a POST request? Without the backend code, it's hard to say.

It may be coincidence, but the reason I wanted to switch back one of my tools to gridengine was that OAuth uploading of files to Commons stopped working. It works fine on gridengine.
I did not investigate the cause in detail. Could just be a different PHP version.

@Magnus can you tell me which tool it was?

https://secure.php.net/manual/en/migration56.incompatible.php#migration56.incompatible.curl

Uploads using the @file syntax now require CURLOPT_SAFE_UPLOAD to be set to FALSE. CURLFile should be used instead.

@Magnus hmm, so replacing '@'. $tmpfile with new CurlFile($tmpfile) should make it work from my reading (I haven't written any PHP in forever...) - does that sound right to you? Do you think you'll have time to try that?

yuvipanda lowered the priority of this task from Unbreak Now! to Normal.Jul 4 2016, 11:47 AM

OK, magnustools now running on kubernetes, file upload seems to work

@Magog_the_Ogre can you try out the CurlFile fix as well?

@Magog_the_Ogre can you try out the CurlFile fix as well?

Thanks. I'm quite sure that was the problem. I will make the change when I am able.

Updated, seems to be working, thanks all.

There are still several of my tools that won't start kubernetes webservice because it thinks the gridengine one is still running. Sometimes it is (old gridengine webservice permanently in "dr" state), sometimes it isn't. @yuvipanda mentionend something about a manifest file being left behind.

Tools with this issue include:
fist
joanjoc
wlmuk
commonshelper
most-wanted
isin
geohack
wikisoba
costar
book2scroll
catscan2
wikidata-terminator
sourcemd
catnap

Danny_B updated the task description. (Show Details)Jul 24 2016, 2:27 PM
Danny_B renamed this task from Issues with 'webservice' kubernetes backend to Issues with 'webservice' kubernetes backend (tracking).

I'm not able to get my ws-search tool to run under Kubernetes:

2016-09-28 08:27:51: (mod_fastcgi.c.2569) unexpected end-of-file (perhaps the fastcgi process died): pid: 10 socket: unix:/var/run/lighttpd/php.socket.ws-search-1
2016-09-28 08:27:51: (mod_fastcgi.c.3353) response not received, request sent: 886 on socket: unix:/var/run/lighttpd/php.socket.ws-search-1 for /ws-search/index.php?, closing connection

There's no old gridengine webservice running (qstat returns empty), webservice status says it's running, and it isn't trying to do any curl calls.

Any ideas on what I should be looking for to fix it?

There's no old gridengine webservice running (qstat returns empty), webservice status says it's running, and it isn't trying to do any curl calls.

Any ideas on what I should be looking for to fix it?

I'm not entirely sure how to debug these k8s issues either, but I've been so free to restart the webservice under SGE so that it is at least back online :-)

Thanks @valhallasw — that's strange though, because I had also restarted it under SGE and last I knew it was running! But I guess something else went amiss there. (Which I believe is one of the advantages of Kubernetes? That it'll restart stopped jobs?) That ws-search tool is rather unfunctional at the moment, for other reasons, so I'm not bothered if it's not always up. :-)

ricordisamoa, wikidipendenza and dewkin are converted without much hassle. However ricordisamoa and wikidipendenza's jobs seemed to restart when their service.manifest were left behind. It seems that ricordisamoa's service.manifest had backend: gridengine in it while dewkin's service.manifest had not.

scfc moved this task from Triage to Tracking on the Toolforge board.Dec 4 2016, 8:18 PM
bd808 moved this task from Backlog to Tracking on the Tools-Kubernetes board.Mar 26 2017, 7:09 PM
bd808 edited projects, added Kubernetes; removed Toolforge.Jul 28 2017, 11:02 PM