Page MenuHomePhabricator

wikisource-penguin-classics tool's PHP code is broken
Closed, ResolvedPublic

Description

While doing the 2020 Kubernetes forced migration I discovered that the wikisource-penguin-classics tool is crashing for all requests. This looks to be caused by User-Agent access controls on https://query.wikidata.org requests:

2020-03-01 00:15:22: (mod_fastcgi.c.2702) FastCGI-stderr: PHP Fatal error:  Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /data/project/wikisource-penguin-classics/public_html/index.php:126
2020-03-01 00:15:22: (mod_fastcgi.c.2702) FastCGI-stderr: Stack trace:
2020-03-01 00:15:22: (mod_fastcgi.c.2702) FastCGI-stderr: #0 /data/project/wikisource-penguin-classics/public_html/index.php(126): SimpleXMLElement->__construct('')
2020-03-01 00:15:22: (mod_fastcgi.c.2702) FastCGI-stderr: #1 /data/project/wikisource-penguin-classics/public_html/index.php(21): getXml('SELECT ?work ?w...')
2020-03-01 00:15:22: (mod_fastcgi.c.2702) FastCGI-stderr: #2 {main}
2020-03-01 00:15:22: (mod_fastcgi.c.2702) FastCGI-stderr:   thrown in /data/project/wikisource-penguin-classics/public_html/index.php on line 126
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: PHP Warning:  file_get_contents(https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=SELECT+%3Fwork+%3FworkLabel+%3Ftitle+%3FauthorLabel+%3ForiginalPublicationDate+%3Fabout+%3FindexPage%0A++++WHERE+%7B%0A++++++++%3Fedition+wdt%3AP123+wd%3AQ11281443+.+%23+Published+by+the+desired+publisher%0A++++++++%3Fedition+wdt%3AP31+wd%3AQ3331189+.++++%23+Instance+of+Edition%0A++++++++%3Fedition+wdt%3AP629+%3Fwork+.+++++++++%23+Find+the+work+itself%0A++++++++OPTIONAL%7B+%3Fwork+wdt%3AP1476+%3Ftitle+.+FILTER%28LANG%28%3Ftitle%29+%3D+%27en%27%29+%7D+.+%23+Title+in+English%0A++++++++OPTIONAL%7B+%3Fwork+wdt%3AP50+%3Fauthor+%7D+.%0A++++++++OPTIONAL%7B+%3Fwork+wdt%3AP577+%3ForiginalPublicationDate+%7D+.%0A++++++++OPTIONAL%7B+%3Fwork+wdt%3AP1957+%3FindexPage+%7D+.%0A++++++++OPTIONAL%7B+%3Fabout+schema%3Aabout+%3Fwork+%7D+.+%23+Wikisource+page+name%3F%0A++++++++SERVICE+wikibase%3Alabel+%7B+bd%3AserviceParam+wikibase%3Alanguage+%27en%27+%7D%0A++++%7D): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr:  in /data/project/wikisource-penguin-classics/public_html/index.php on line 125
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: PHP Stack trace:
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: PHP   1. {main}() /data/project/wikisource-penguin-classics/public_html/index.php:0
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: PHP   2. getXml() /data/project/wikisource-penguin-classics/public_html/index.php:21
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: PHP   3. file_get_contents() /data/project/wikisource-penguin-classics/public_html/index.php:125
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: PHP Fatal error:  Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /data/project/wikisource-penguin-classics/public_html/index.php:126
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: Stack trace:
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: #0 /data/project/wikisource-penguin-classics/public_html/index.php(126): SimpleXMLElement->__construct('')
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: #1 /data/project/wikisource-penguin-classics/public_html/index.php(21): getXml('SELECT ?work ?w...')
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr: #2 {main}
2020-03-01 00:17:52: (mod_fastcgi.c.2702) FastCGI-stderr:   thrown in /data/project/wikisource-penguin-classics/public_html/index.php on line 126

I have shutdown the tool's webservice to keep from filling the filesystem with crash report stacktraces.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2020-03-01T00:23:03Z] <wm-bot> <root> Stopped PHP 5.6 webservice due to application crashes (T246556)

Change 576991 had a related patch set uploaded (by Samwilson; owner: Samwilson):
[labs/tools/wikisource-penguin-classics@master] Use Guzzle and other improvments

https://gerrit.wikimedia.org/r/576991

Change 576991 merged by Samwilson:
[labs/tools/wikisource-penguin-classics@master] Use Guzzle and other improvments

https://gerrit.wikimedia.org/r/576991

I've fixed this bug, moved the code to Gerrit, and restarted the web service. I should probably get on with adding Wikidata items for all Penguin Classics editions now... :-)