Page MenuHomePhabricator

java.lang.IllegalStateException: RDF store reports the last update time is before the minimum safe poll time.
Closed, ResolvedPublic

Description

To reproduce:

  • docker-compose up # Starting some containers for the first time
  • docker-compose down # Stop the containers but keep the volumes
  • docker-compose ip # Restart the same containers using the same volumes
wdqs-updater_1   | 09:38:01.995 [main] INFO  org.wikidata.query.rdf.tool.Update - Checking where we left off
wdqs-updater_1   | 09:38:01.997 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
wdqs-updater_1   | 09:38:02.593 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the dump
wdqs-updater_1   | 09:38:02.622 [main] INFO  org.wikidata.query.rdf.tool.Update - Defaulting start time to 30 days ago:  2017-11-08T09:38:02Z
wdqs-updater_1   | 09:38:03.072 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
wdqs-updater_1   | 09:38:03.295 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2017-11-08T09:38:02Z at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
wdqs-updater_1   | 09:38:03.318 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
wdqs-updater_1   | 09:38:03.319 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
wdqs-updater_1   | 09:38:13.368 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
wdqs-updater_1   | 09:38:13.368 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
wdqs-updater_1   | 09:38:23.397 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
wdqs-updater_1   | 09:38:23.398 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
wdqs-updater_1   | 09:38:33.441 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
wdqs-updater_1   | 09:38:33.442 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
wdqs-updater_1   | 09:38:43.463 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got no real changes
wdqs-updater_1   | 09:38:43.464 [main] INFO  org.wikidata.query.rdf.tool.Updater - Sleeping for 10 secs
wdqs-updater_1   | wait-for-it.sh: waiting 60 seconds for wikibase.svc:80
wdqs-updater_1   | wait-for-it.sh: wikibase.svc:80 is available after 2 seconds
wdqs-updater_1   | wait-for-it.sh: waiting 60 seconds for wdqs.svc:9999
wdqs-updater_1   | wait-for-it.sh: wdqs.svc:9999 is available after 10 seconds
wdqs-updater_1   | Updating via http://wdqs.svc:9999/bigdata/namespace/wdq/sparql
wdqs-updater_1   | OpenJDK 64-Bit Server VM warning: Cannot open file /var/log/wdqs/wdqs-updater_jvm_gc.pid6.log due to No such file or directory
wdqs-updater_1   |
wdqs-updater_1   | I> No access restrictor found, access to all MBean is allowed
wdqs-updater_1   | Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
wdqs-updater_1   | #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
wdqs-updater_1   | 09:39:32.630 [main] INFO  org.wikidata.query.rdf.tool.Update - Checking where we left off
wdqs-updater_1   | 09:39:32.633 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
wdqs-updater_1   | 09:39:33.298 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater
wdqs-updater_1   | 09:39:33.302 [main] ERROR org.wikidata.query.rdf.tool.Update - Error during initialization.
wdqs-updater_1   | java.lang.IllegalStateException: RDF store reports the last update time is before the minimum safe poll time.  You will have to reload from scratch or you might have missing data.
wdqs-updater_1   |      at org.wikidata.query.rdf.tool.Update.buildChangeSource(Update.java:185)
wdqs-updater_1   |      at org.wikidata.query.rdf.tool.Update.main(Update.java:60)
wdqs-updater_1   | Exception in thread "main" java.lang.IllegalStateException: RDF store reports the last update time is before the minimum safe poll time.  You will have to reload from scratch or you might have missing data.
wdqs-updater_1   |      at org.wikidata.query.rdf.tool.Update.buildChangeSource(Update.java:185)
wdqs-updater_1   |      at org.wikidata.query.rdf.tool.Update.main(Update.java:60)
wdqs-updater_1   | wait-for-it.sh: waiting 60 seconds for wikibase.svc:80
wdqs-updater_1   | wait-for-it.sh: wikibase.svc:80 is available after 0 seconds
wdqs-updater_1   | wait-for-it.sh: waiting 60 seconds for wdqs.svc:9999
wdqs-updater_1   | wait-for-it.sh: wdqs.svc:9999 is available after 0 seconds
wdqs-updater_1   | Updating via http://wdqs.svc:9999/bigdata/namespace/wdq/sparql

Event Timeline

On first start up of the example docker-compose file (running on 08/12/2017) the query service gives me:

<http://wikibase.svc>	schema:dateModified	2017-11-08T09:51:19.641Z

This is added by the updater on first run.
This date is 1 month off... and could explain the issue we are seeing when restarting the updater...

MediaWiki recent changes reports the last change as 2017 12 08

image.png (57×487 px, 6 KB)

When creating an item the updater correctly sets the update timestamp to 2017-12-08T09:54:28.000Z

Then stop the containers, and start them again and everything is fine...

The issue occurs when I create containers, then stop them, and start them again without making any edits.
TLDR Due to the updater setting a bad initial timestamp it can't run :(

Addshore added a subscriber: Smalyshev.

Adding @Smalyshev @ the query service project as as far as I can see this is an issue in the updater..

Looks like you're trying to run Updater on empty database without any incoming data. Probably not the use case it was intended for, but not sure what is supposed to happen in this case... Generally Updater would not attempt to catch up if it's more than 30 days behind, since for most active DBs that would be pointless - it would take many days to catch up. But this probably can be changed for testing purposes...

@Smalyshev what decides the first date the updater adds?
I'm trying to see where the first 2017-11-08T09:51:19.641Z timestamp comes from.?

If on an empty wikibase install this was actually the date of the first run of the updater then everything would work as expected.

Also, is the 30 days configurable? as the length of RecentChanges within mediawiki is configurable, and for third parties they may have this set to 90 days etc and be able to recover their query service by just using recent changes for much longer than in WMF production. Should I file a separate ticket about this?

what decides the first date the updater adds?

Dump date or last updated item date.

If on an empty wikibase install this was actually the date of the first run of the updater then everything would work as expected.

Updater does not mark its own runs if nothing was actually imported to the database.

Also, is the 30 days configurable?

No, currently it's hardcoded. But it's not hard to make it configurable if needed.

I'm still not sure what is the scenario though - what use case is this? Why DB is empty and what is supposed to happen?

If on an empty wikibase install this was actually the date of the first run of the updater then everything would work as expected.

Updater does not mark its own runs if nothing was actually imported to the database.

So this is what is happening, there are no edits in recent changes (expect perhaps the default MainPage creation by mediawiki, but the updater is adding a date to blazegraph, and the date seems rather wrong.

Also, is the 30 days configurable?

No, currently it's hardcoded. But it's not hard to make it configurable if needed.

Configurable would be great, should I create a seperate ticket?

I'm still not sure what is the scenario though - what use case is this? Why DB is empty and what is supposed to happen?

So the use case is a sample docker-compose system, that creates a fresh wikibase install, installs a query service and has an updater running.

If the user sets this system up, but doesn't make an edit, and then restarts their machine the updater will suddenly break because of the bad / old date being added to blazegraph by the initial run of the updater.

I would feel bad adding a note to the readme that says something like "NOTE: on the first run you MUST make an entity or the updater will break on subsequent runs."

@Addshore would it work well if I add an option (say --init) that sets the last update time to the same as start time (of course, requires start time to be specified)? This would have the same effect as creating an entity only without creating any entities.

Change 405826 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Enable to set up starting point in the Database.

https://gerrit.wikimedia.org/r/405826

@Smalyshev that sounds like that could work. Would you be able to pass in --init a second time? if the time was already previously set? and set it again?
It could also make sense to have this in a separate script rather than in the updater?

This in combination with the ability to configure the amount of time recent changes hangs around for would be amazing!

@Addshore yes, you can pass --init as many times as you want, provided you passed -s with it.

It could also make sense to have this in a separate script rather than in the updater?

Not very easily since this feature relies on existing infrastructure in Updater, which will need to be duplicated then. Of course, we could also make a script that would just update the database via SPARQL update, if you feel that would be better for your purposes.

This in combination with the ability to configure the amount of time recent changes hangs around

Not sure what you mean here?

This in combination with the ability to configure the amount of time recent changes hangs around

Not sure what you mean here?

So you said above that the limit that the updater uses for recent changes is 30 days, however this is configurable in mediawiki, so it would be great to have it configurable in the updater too / another option you can pass in.

Ok, got it. I've added property wikibaseMaxDaysBack to set the maximum timeframe.

Change 405826 merged by jenkins-bot:
[wikidata/query/rdf@master] Enable to set up starting point in the Database.

https://gerrit.wikimedia.org/r/405826

Smalyshev claimed this task.

In 0.3.0 more likely. I plan to release that soon, after T178712 is done.

great thanks, is there a tracking ticket for the release at all? (I can make a sub ticket then for making new docker images based on 0.3.0!)