Page MenuHomePhabricator

Issue with runUpdate.sh with local instance of Wikibase and Wikidata Query Service
Closed, ResolvedPublic

Description

I'm trying to get a local, standalone Wikibase with Wikidata Query Service enabled setup together (so that I can potentially teach it to others for educational purposes). I've been documenting everything up to this point to the best of my ability, but I've hit a wall. Because it's a standalone instance, I've had to combine aspects of other guides here and here. A hope to write a guide of my own after I get things up and running.

Basic info:

  • Operating System: Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-1022-aws x86-64)
  • MediaWiki Version: 1.37.2

I've established that all components of MediaWiki and Wikibase are working as intended using the various implementation tests. In order to dump the RDF data I've done the following (there's only one entity in the instance):

php /var/lib/mediawiki/extensions/Wikibase/repo/maintenance/dumpRdf.php --server http://localhost:400 --output /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/data/wikibase-05072022-all.ttl

Then I go in and delete schema:name or skos:prefLabel from the ttl doc manually because it will fail to load correctly otherwise. I'm not sure if there is a workaround for this, but since it's only one entity, it takes like 2 seconds, so it isn't a huge deal right now.

Next I load Blazegraph with:

sudo BLAZEGRAPH_OPTS="-DwikibaseConceptUri=http://localhost:400" bash /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/runBlazegraph.sh

This works with no errors. Then I preprocess the ttl using munge:

sudo bash /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/munge.sh -f /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/data/wikibase-05072022-all.ttl -d /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/data/split -- --conceptUri http://localhost:400

This works with no errors. Following that, I load in using:

sudo bash /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/loadRestAPI.sh -n placeholdernamespace -d /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/data/split

Finally, I run:

sudo bash /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/runUpdate.sh -n placeholdernamespace -- --wikibaseUrl http://localhost:400 --conceptUri http://localhost:400

This crashes immediately providing the following trace:

15:36:36.069 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Checking where we left off
15:36:36.069 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
15:36:36.467 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the dump
15:36:36.580 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2022-05-08T02:41:03Z
15:36:36.649 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during updater run.
java.lang.RuntimeException: org.apache.http.conn.HttpHostConnectException: Connect to localhost:400 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
        at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRecentChanges(WikibaseRepository.java:244)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.doFetchRecentChanges(RecentChangesPoller.java:325)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.fetchRecentChanges(RecentChangesPoller.java:314)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.batch(RecentChangesPoller.java:338)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.firstBatch(RecentChangesPoller.java:162)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.firstBatch(RecentChangesPoller.java:38)
        at org.wikidata.query.rdf.tool.Updater.run(Updater.java:152)
        at org.wikidata.query.rdf.tool.Update.run(Update.java:174)
        at org.wikidata.query.rdf.tool.Update.main(Update.java:98)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to localhost:400 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
        at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.getJson(WikibaseRepository.java:439)
        at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRecentChanges(WikibaseRepository.java:241)
        ... 8 common frames omitted
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
        at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.base/java.net.Socket.connect(Socket.java:609)
        at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
        ... 20 common frames omitted
Exception in thread "main" java.lang.RuntimeException: org.apache.http.conn.HttpHostConnectException: Connect to localhost:400 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
        at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRecentChanges(WikibaseRepository.java:244)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.doFetchRecentChanges(RecentChangesPoller.java:325)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.fetchRecentChanges(RecentChangesPoller.java:314)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.batch(RecentChangesPoller.java:338)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.firstBatch(RecentChangesPoller.java:162)
        at org.wikidata.query.rdf.tool.change.RecentChangesPoller.firstBatch(RecentChangesPoller.java:38)
        at org.wikidata.query.rdf.tool.Updater.run(Updater.java:152)
        at org.wikidata.query.rdf.tool.Update.run(Update.java:174)
        at org.wikidata.query.rdf.tool.Update.main(Update.java:98)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to localhost:400 [localhost/127.0.0.1] failed: Connection refused (Connection refused)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
        at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.getJson(WikibaseRepository.java:439)
        at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRecentChanges(WikibaseRepository.java:241)
        ... 8 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
        at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.base/java.net.Socket.connect(Socket.java:609)
        at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
        ... 20 more

However, I have confirmed that http://localhost:400 is up and running as intended. I'm wondering if perhaps I've done something wrong during the loading process? Or if something is not running correctly? Any help at all would be much appreciated and I can provide more information if necessary.

Event Timeline

Superraptor123 claimed this task.

For the most part, this is now resolved. Because I was tunneling into the instance remotely, I needed to use:

sudo bash /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/runUpdate.sh -n placeholdernamespace -- --wikibaseUrl http://localhost:80 --conceptUri http://localhost:400

Instead of:

sudo bash /var/lib/mediawiki/extensions/wikidata-query-rdf/dist/target/service-0.3.111-SNAPSHOT/runUpdate.sh -n placeholdernamespace -- --wikibaseUrl http://localhost:400 --conceptUri http://localhost:400

Basically changing the Wikibase URL to port 80 instead of port 400. I'm not sure if this behavior is documented elsewhere in terms of remote access, but I stumbled upon it working purely by accident.

Additionally, above I noted that I had to "go in and delete schema:name or skos:prefLabel from the ttl doc manually because it will fail to load correctly otherwise". This was true for the RDF dump in the remote MediaWiki version 1.37. However, when I ran an RDF dump from a local MediaWiki version 1.35, it did not have "schema:name" or "skos:prefLabel" and loaded in fine without the need for preprocessing. This behavior may also need to be documented if individuals are exporting/importing RDF dumps between versions.