Page MenuHomePhabricator

Regular wikidata JSON dump scanning broken on analytics machine
Open, Needs TriagePublic

Description

addshore@stat1007:/srv/analytics-wmde/graphite/src/scripts$ sudo -u analytics-wmde cat /srv/analytics-wmde/graphite/log/toolkit-analyzer.log

Shows us:

Error getting data from https://query.wikidata.org/sparql
Connection timed out (Connection timed out)

It looks like perhaps this needs to be going through the webproxy, but is currently not, thus has been broken since extra firewalls were put in place.

Full output:

addshore@stat1007:/srv/analytics-wmde/graphite/src/scripts$ sudo -u analytics-wmde cat /srv/analytics-wmde/graphite/log/toolkit-analyzer.log
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
****************************************************************************
***                       Wikidata Toolkit: ToolkitAnalyzer              ***
******************************* Data Directory Layout **********************
* Target storage directory : data/                                         *
* Downloaded dump locations: data/dumpfiles/json-<DATE>/<DATE>-all.json.gz *
* Processor output location: data/<DATE>/                                  *
****************************************************************************
Targeting latest dump: 20190311
Using data directory: /srv/analytics-wmde/graphite/data
MetricProcessor enabled
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error getting data from https://query.wikidata.org/sparql
Connection timed out (Connection timed out)
java.net.ConnectException: Connection timed out (Connection timed out)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
        at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
        at org.wikidata.analyzer.Fetcher.WikiDataFetcher.queryDataFromWikidata(WikiDataFetcher.java:49)
        at org.wikidata.analyzer.Fetcher.WikimediasFetcher.getWikimediasFromWikidata(WikimediasFetcher.java:64)
        at org.wikidata.analyzer.Fetcher.WikimediasFetcher.getMediawikis(WikimediasFetcher.java:35)
        at org.wikidata.analyzer.Processor.MetricProcessor.populateWikimedias(MetricProcessor.java:57)
        at org.wikidata.analyzer.Processor.MetricProcessor.<init>(MetricProcessor.java:25)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at org.wikidata.analyzer.WikidataAnalyzer.scan(WikidataAnalyzer.java:177)
        at org.wikidata.analyzer.WikidataAnalyzer.run(WikidataAnalyzer.java:145)
        at org.wikidata.analyzer.WikidataAnalyzer.init(WikidataAnalyzer.java:76)
        at org.wikidata.analyzer.WikidataAnalyzer.main(WikidataAnalyzer.java:38)
Command exited with non-zero status 1
2.00user 0.18system 2:12.42elapsed 1%CPU (0avgtext+0avgdata 81756maxresident)k
18816inputs+160outputs (0major+17586minor)pagefaults 0swaps
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
****************************************************************************
***                       Wikidata Toolkit: ToolkitAnalyzer              ***
******************************* Data Directory Layout **********************
* Target storage directory : data/                                         *
* Downloaded dump locations: data/dumpfiles/json-<DATE>/<DATE>-all.json.gz *
* Processor output location: data/<DATE>/                                  *
****************************************************************************
Targeting latest dump: 20190311
Using data directory: /srv/analytics-wmde/graphite/data
MetricProcessor enabled
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Error getting data from https://query.wikidata.org/sparql
Connection timed out (Connection timed out)
java.net.ConnectException: Connection timed out (Connection timed out)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
        at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
        at org.wikidata.analyzer.Fetcher.WikiDataFetcher.queryDataFromWikidata(WikiDataFetcher.java:49)
        at org.wikidata.analyzer.Fetcher.WikimediasFetcher.getWikimediasFromWikidata(WikimediasFetcher.java:64)
        at org.wikidata.analyzer.Fetcher.WikimediasFetcher.getMediawikis(WikimediasFetcher.java:35)
        at org.wikidata.analyzer.Processor.MetricProcessor.populateWikimedias(MetricProcessor.java:57)
        at org.wikidata.analyzer.Processor.MetricProcessor.<init>(MetricProcessor.java:25)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at org.wikidata.analyzer.WikidataAnalyzer.scan(WikidataAnalyzer.java:177)
        at org.wikidata.analyzer.WikidataAnalyzer.run(WikidataAnalyzer.java:145)
        at org.wikidata.analyzer.WikidataAnalyzer.init(WikidataAnalyzer.java:76)
        at org.wikidata.analyzer.WikidataAnalyzer.main(WikidataAnalyzer.java:38)
Command exited with non-zero status 1
1.79user 0.20system 2:10.99elapsed 1%CPU (0avgtext+0avgdata 80856maxresident)k
7040inputs+192outputs (1major+17343minor)pagefaults 0swaps

This is causing some of the data model related grafana dashboard to not be generated:

2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/dumpfiles/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190311/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190304/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190225/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190218/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190211/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190204/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190128/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190121/metrics.json
2019-03-19 03:00:01 wikidata-dumpScanProcessing File not found: /srv/analytics-wmde/graphite/data/20190114/metrics.json