Page MenuHomePhabricator

wikibaseImport in the docker images does not add statements when importing an entity
Open, NormalPublic

Description

In a fresh bring-up of the docker images of wikibase, I tried to use the WikibaseImport tool to pull in a sample entity. It does seem to create all of the necessary properties and entities, however, it does not actually connect any of the statements to the imported entity.

I used a fresh checkout of https://github.com/wmde/wikibase-docker.git, and then ran:

epaulson:~/development/wikidata/wikibase-docker $ docker-compose pull
Pulling mysql           ... done
Pulling wdqs            ... done
Pulling wdqs-proxy      ... done
Pulling wdqs-frontend   ... done
Pulling elasticsearch   ... done
Pulling wikibase        ... done
Pulling wdqs-updater    ... done
Pulling quickstatements ... done
epaulson:~/development/wikidata/wikibase-docker $ docker images
REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
mariadb                    10.3                95d6852bba5a        3 days ago          365MB
wikibase/wdqs-frontend     latest              2b432c865cf8        5 days ago          640MB
wikibase/quickstatements   latest              5c00fb868f14        5 days ago          432MB
wikibase/wikibase          1.30-bundle         8e24f36fc326        5 days ago          795MB
wikibase/wdqs-proxy        latest              dd22183101fa        5 days ago          17.7MB
wikibase/wdqs              0.3.0               0e25b5256dbe        5 days ago          209MB
elasticsearch              <none>              671bb2d7da44        6 months ago        486MB
epaulson:~/development/wikidata/wikibase-docker $ docker-compose up

After waiting a little bit to watch everything come up, I docker-exec'd into the wikibase container to run the import command, trying to load the mayor of Madison, WI:
https://www.wikidata.org/wiki/Q16107138

epaulson:~/development/wikidata/wikibase-docker $ docker ps
CONTAINER ID        IMAGE                             COMMAND                   CREATED             STATUS              PORTS                  NAMES
619f7000a11b        wikibase/quickstatements:latest   "/bin/bash /entrypoi…"    44 seconds ago      Up 42 seconds       0.0.0.0:9191->80/tcp   wikibase-docker_quickstatements_1
6bcefdf8b997        wikibase/wdqs:0.3.0               "/entrypoint.sh /run…"    44 seconds ago      Up 42 seconds                              wikibase-docker_wdqs-updater_1
f92b2b031020        wikibase/wdqs-frontend:latest     "/entrypoint.sh ngin…"    44 seconds ago      Up 42 seconds       0.0.0.0:8282->80/tcp   wikibase-docker_wdqs-frontend_1
cdb2d9134828        wikibase/wikibase:1.30-bundle     "/bin/bash /entrypoi…"    45 seconds ago      Up 44 seconds       0.0.0.0:8181->80/tcp   wikibase-docker_wikibase_1
7c64344942d2        wikibase/wdqs-proxy               "/bin/sh -c \"/entryp…"   45 seconds ago      Up 44 seconds       0.0.0.0:8989->80/tcp   wikibase-docker_wdqs-proxy_1
bf6466a38686        mariadb:10.3                      "docker-entrypoint.s…"    46 seconds ago      Up 45 seconds       3306/tcp               wikibase-docker_mysql_1
b94c4efffefa        elasticsearch                     "/docker-entrypoint.…"    46 seconds ago      Up 45 seconds       9200/tcp, 9300/tcp     wikibase-docker_elasticsearch_1
6a025a488001        wikibase/wdqs:0.3.0               "/entrypoint.sh /run…"    46 seconds ago      Up 45 seconds       9999/tcp               wikibase-docker_wdqs_1
epaulson:~/development/wikidata/wikibase-docker $ docker exec -it wikibase-docker_wikibase_1 /bin/bash
root@cdb2d9134828:/var/www/html# cd extensions/WikibaseImport/
root@cdb2d9134828:/var/www/html/extensions/WikibaseImport# php maintenance/importEntities.php --all-properties --entity Q16107138
[2018-11-19 01:22:43]: Creating Q16107138
[2018-11-19 01:22:47]: Creating Q17437796
[2018-11-19 01:22:54]: Creating Q17437798
[2018-11-19 01:22:54]: Q17437796 already imported
[2018-11-19 01:22:54]: Q17437798 already imported
[2018-11-19 01:22:54]: Creating Q17559452
[2018-11-19 01:22:56]: Q17437796 already imported
[2018-11-19 01:22:56]: Q17437798 already imported
[2018-11-19 01:22:56]: Q17559452 already imported
[2018-11-19 01:22:56]: Q17437796 already imported
[2018-11-19 01:22:56]: Q17437798 already imported
[2018-11-19 01:22:58]: Creating P31
[2018-11-19 01:23:06]: Creating Q5
[2018-11-19 01:23:16]: Creating P143
[2018-11-19 01:23:17]: Creating Q328
[2018-11-19 01:23:18]: Creating P21
[2018-11-19 01:23:18]: Creating Q6581097
[2018-11-19 01:23:19]: Creating P27
[2018-11-19 01:23:19]: Creating Q30
[2018-11-19 01:23:28]: Creating P569
[2018-11-19 01:23:28]: Creating P18
[2018-11-19 01:23:30]: Creating P69
[2018-11-19 01:23:31]: Creating Q152303
[2018-11-19 01:23:31]: Creating P735
[2018-11-19 01:23:31]: Creating Q4925623
[2018-11-19 01:23:41]: Creating P102
[2018-11-19 01:23:41]: Creating Q29552
[2018-11-19 01:23:42]: Creating P106
[2018-11-19 01:23:42]: Creating Q82955
[2018-11-19 01:23:42]: Creating P2267
[2018-11-19 01:23:42]: Creating P646
[2018-11-19 01:23:44]: Q17437796 already imported
[2018-11-19 01:23:44]: Q17437798 already imported
[2018-11-19 01:23:44]: Q17559452 already imported
[2018-11-19 01:23:45]: Q17437798 already imported
[2018-11-19 01:23:45]: Q17437796 already imported
[2018-11-19 01:23:46]: Creating Q648625
[2018-11-19 01:23:46]: Creating P19
[2018-11-19 01:23:46]: Creating Q1297
[2018-11-19 01:23:46]: Creating P1412
[2018-11-19 01:23:46]: Creating Q1860
[2018-11-19 01:23:47]: Creating P345
[2018-11-19 01:23:47]: Creating P3344
[2018-11-19 01:23:47]: Creating P2390
[2018-11-19 01:23:47]: Creating P373
[2018-11-19 01:23:48]: Adding statements: Q16107138
[2018-11-19 01:23:48]: DB connection was already closed or the connection dropped.
[2018-11-19 01:23:48]: Done
root@cdb2d9134828:/var/www/html/extensions/WikibaseImport#

All of the changes are listed in the wikibase instance, however, the new page for my imported entity has no statements attached:

I'm using Docker for Mac, with my memory for the Docker VM set to 10GB.

I looked through the logs and only saw one thing that jumped out at me, but I really don't know what I'm looking for. I'm happy to provide more logfiles - but here's a bit of the updater logfile:

01:22:38.121 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:22:38.122 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:22:48.155 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 1 changes, from Q2@2@20181119012245|2 to Q2@2@20181119012245|2
01:22:49.027 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:22:45Z at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
01:22:49.048 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:22:49.048 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:22:59.056 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 3 changes, from Q3@3@20181119012253|3 to Q5@5@20181119012255|5
01:22:59.658 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:22:55Z at (0.0, 0.0, 0.0) updates per second and (114406279.9, 25280675.7, 8568428.9) milliseconds per second
01:22:59.676 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:22:59.677 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:23:09.699 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 1 changes, from P2@6@20181119012305|6 to P2@6@20181119012305|6
01:23:10.470 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:23:05Z at (0.1, 0.0, 0.0) updates per second and (96842972.2, 24451909.1, 8473762.1) milliseconds per second
01:23:10.492 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:23:10.493 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:23:20.521 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 6 changes, from Q6@7@20181119012316|7 to P5@12@20181119012319|12
01:23:21.636 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:23:19Z at (0.1, 0.0, 0.0) updates per second and (81975953.3, 23650312.7, 8380141.3) milliseconds per second
01:23:21.665 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:23:21.666 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:23:31.653 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 6 changes, from Q9@13@20181119012327|13 to P9@18@20181119012331|18
01:23:32.493 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:23:31Z at (0.1, 0.0, 0.0) updates per second and (63843126.1, 22496918.1, 8241645.1) milliseconds per second
01:23:32.510 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:23:32.511 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:23:42.531 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 5 changes, from Q11@19@20181119012341|19 to Q13@23@20181119012342|23
01:23:43.203 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:23:42Z at (0.2, 0.1, 0.0) updates per second and (54042216.1, 21759420.4, 8150591.4) milliseconds per second
01:23:43.236 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 2 changes, from P12@24@20181119012342|24 to P13@25@20181119012342|25
01:23:43.338 [update 1] WARN  org.wikidata.query.rdf.tool.Updater - Contained error syncing.  Giving up on P13
org.wikidata.query.rdf.tool.exception.ContainedException: Unexpected status code fetching RDF for http://wikibase.svc/wiki/Special:EntityData/P13.ttl?nocache=1542590623299&flavor=dump:  500
	at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRdfForEntity(WikibaseRepository.java:300)
	at org.wikidata.query.rdf.tool.Updater.handleChange(Updater.java:302)
	at org.wikidata.query.rdf.tool.Updater.lambda$handleChanges$0(Updater.java:188)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
01:23:43.420 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:23:42Z at (0.2, 0.1, 0.0) updates per second and (54042216.1, 21759420.4, 8150591.4) milliseconds per second
01:23:43.437 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:23:43.437 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:23:53.428 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 9 changes, from Q14@26@20181119012346|26 to P19@34@20181119012347|34
01:23:54.299 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-11-19T01:23:47Z at (0.3, 0.1, 0.0) updates per second and (45745910.1, 21046097.5, 8060542.6) milliseconds per second
01:23:54.320 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:23:54.320 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
01:24:04.352 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
01:24:04.352 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs

Looking into the logs a bit then, I see this:

root@cdb2d9134828:/var/log/mediawiki# cat exception.log 
2018-11-19 01:23:43 cdb2d9134828 my_wiki: [8e83bcec6af5c9b2fff0976e] /wiki/Special:EntityData/P13.ttl?nocache=1542590623299&flavor=dump   Error from line 40 of /var/www/html/extensions/CirrusSearch/includes/Job/ElasticaWrite.php: Unsupported operand types

(For what it's worth, after I stopped and started the entire docker-compose stack, I was able to visit http://localhost:8181/wiki/Special:EntityData/P13.ttl?nocache=1542590623299&flavor=dump and see what I expect is the correct output with status 200)

Event Timeline

Restricted Application added a project: Wikidata. · View Herald TranscriptNov 19 2018, 2:09 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dbs added a subscriber: dbs.Nov 19 2018, 2:48 AM

I reproduced this problem on a fresh Wikibase 1.31 Docker image. The end of my first attempt showed:

[2018-11-19 02:21:59]: Adding statements: Q16107138
[2018-11-19 02:22:01]: DB connection was already closed or the connection dropped.
[2018-11-19 02:22:01]: Done

There were no logs in /var/log/mediawiki/ to peruse. However, the database message made me wonder if there was an issue specifically connected with adding the statements.

I ran the importEntities.php command again. It reported that all of the entities and properties had already been created, but this time no database message followed the "Adding statements" log entry. And voila, now the statements were all visible at http://localhost:8181/wiki/Item:Q2 and visible in WDQS.

Subsequent importEntities.php commands on the same image were inconsistent:

  • Q2882604 and Q7532877 worked on the first try.
  • Q133116 failed with an illegal value related to Q25343 ("square metre"):
[2018-11-19 02:40:12]: Adding statements: Q133116
[2018-11-19 02:40:15]: Illegal value: http://www.wikidata.org/entity/Q25343
[2018-11-19 02:40:15]: Done
  • Q340 failed with an illegal value related to Q712226 ("square kilometre"). Hmm.
dbs added a comment.Nov 19 2018, 3:01 AM

Ah, the square metre and square kilometre issue have been known for a couple of years now: https://github.com/filbertkm/WikibaseImport/issues/16

I don't think the original author @aude will have time to review this change.
@Lydia_Pintscher this could be something that the Wikidata-Campsite picks up for review and followthrough?

Addshore triaged this task as Normal priority.Jun 21 2019, 12:15 AM

I have some other pull requests that I didn't do up to now because they seam not to be accepted:

  1. adds a property Wikidata ID so that one has a link between the new and the old entities
  2. some refactoring making the import faster
  3. a bug about the number of statements of entities

Salut
Dennis

PS the problem is that I'm not a php coder so the code might not have hight quality

I don't think the original author @aude will have time to review this change.
@Lydia_Pintscher this could be something that the Wikidata-Campsite picks up for review and followthrough?

Yes!

I have some other pull requests that I didn't do up to now because they seam not to be accepted:

  1. adds a property Wikidata ID so that one has a link between the new and the old entities
  2. some refactoring making the import faster
  3. a bug about the number of statements of entities

Salut
Dennis

Sorry about that. We didn't have this on our radar.

alaa_wmde added a subscriber: alaa_wmde.

We will open a separate task for moving this repo under wmde github org

Restricted Application added a project: User-Addshore. · View Herald TranscriptTue, Oct 22, 9:09 AM

We will open a separate task for moving this repo under wmde github org

Currently at https://github.com/wikidata/WikibaseImport
The docker images have been switched to read from this location https://github.com/wmde/wikibase-docker/commit/dc466bf858d30ccfddf26cded81a4acea2440bad

Ladsgroup removed Addshore as the assignee of this task.Mon, Oct 28, 11:06 AM