I think I know the reason... Constraint violations are always fetched for latest version, while revision fetches may be fetched for non-current ones. This means that constraint violation statements can include statement IDs that are not present in current revision... I wonder what the best way to fix it. Possible solutions are:
Results of caching can be seen here:
Also, errors on different servers do not seem to match, even though the content is supposed to be exactly the same - this is the whole point of caching after all! Something weird is definitely going on.
First error on wdq24 is from 2019-04-11 07:45:56.038. Patch was merged also on Apr 11 and Updater was restarted with it on 07:42:21.097. I'm pretty much sure it's connected - still not sure how.
Curiously, this happens only on hosts where revision-fetch is enabled. I wonder whether it's related, though I am not sure how.
Thu, Apr 18
@aaron Is there any docs how to use the client ID on the client side? I see there's support for cpPosIndex cookie which is $index@$time#$clientId but if I have only client ID that is not going to help me. So I am not sure how would one use it - I can't find any trace of ChronologyClientId HTTP header.
The data seems to be updated in the database (revision 915966700). I suspect it may be the same issue as with T197447 - Blazegraph conflates strings with some invisible characters and strings without them.
This is probably the request for @aaron, he knows the most about it.
Wed, Apr 17
The web UI frontend is just a bunch of HTML files, so there's not much to move. There's however nginx frontend, which serves in dual capacity as both gateway to database backend and UI server. It is possible to run either function on a separate container in necessary, though it makes sense to keep them together. I am not sure whether there's an added value in separating the frontend from the backend.
Still getting failures, see: https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-java8-docker/855/console
Tue, Apr 16
So there is no difference between "failed because URL is too long" and "failed because so many people use the service that it's been blocked by rate limit"
Mon, Apr 15
@aaron I have started looking into it, and I am not sure how one can get an instance of ChronologyProtector in the code that needs it. The only place I can see that uses ChronologyProtector is LBFactory class, but getChronologyProtector() is protected there, so it's impossible to get it from outside, and there are no other code anywhere that work with it. What would you advise to do in this case? How can one get ChronologyProtector::getClientId()?
@Ladsgroup which one? With the query (top left) or the results (down right in the menu)? In any case, probably makes sense to open new task with a screen shot.
Looks like URL shortener works without it, so we don't need this for now.
Thu, Apr 11
The latest patch doesn't seem to help, because test is run by pom.xml as it seems. Please see full log in https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-java8-docker/850/console
@Gehel any input on this?
Wed, Apr 10
According to @jcrespo at IRC: "you can jump over ids with no problem, what you cannot do is insert values with lower than the last insert id, if you are doing that you should not use autoinc"
I don't expect them all to be used. This is mostly if we wanted to make shortcuts for 3-letter language codes, like arz. But if this is not desired then doing just 2 letters is fine.
Tue, Apr 9
The script, excluding the boilerplate, should be roughly this:
$dbw = self::getDB( DB_MASTER ); $url = 'https://meta.wikimedia.org/'; $rowData = [ 'usc_id' => $newId, 'usc_url' => $url, 'usc_url_hash' => md5( $url ) ]; $dbw->insert( 'urlshortcodes', $rowData, __METHOD__, [ 'IGNORE' ] );
for example you need to explicitly set the table to accept PK on insert
can't jump over them and assign them later.
Just my two cents, but wouldn't it make sense to reserve all one-, two-, and three-letter URLs for future use?
Turns out Sesame 2.8 has pretty big difference from 2.7 - in RDF 1.1/SPARQL 1.1 there are no "simple literals" anymore, i.e. literal "abc" is the same as literal "abc"^^xsd:string and the only type of plain literal is language one - "abc"@en. Updating for this may be a bit tricky.
Mon, Apr 8
This seems to be a change in Sesame - before, LiteralImpl(String) produced null as datatype, now it produces XMLSchema.STRING. It was changed in this commit: https://bitbucket.org/openrdf/sesame/commits/675015e6b996cc8609fa735730baa49edf27d2e7
Getting this in the CI run:
8:59:10 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". 18:59:10 SLF4J: Defaulting to no-operation (NOP) logger implementation 18:59:10 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 18:59:10 log4j:WARN No appenders could be found for logger (com.bigdata.rdf.ServiceProviderHook). 18:59:10 log4j:WARN Please initialize the log4j system properly. 18:59:10 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 18:59:10 Failed tests: 18:59:10 TestEncodeDecodeValue.test_encodeDecode_Literal:89->doTest:190 expected:<"abc"> but was:<"abc"[^^<http://www.w3.org/2001/XMLSchema#string>]> 18:59:10 TestEncodeDecodeValue.test_encodeDecode_Literal_escapeCodeSequence:170->doTest:190 expected:<"ab"c"> but was:<"ab"c"[^^<http://www.w3.org/2001/XMLSchema#string>]> 18:59:10 TestEncodeDecodeValue.test_encodeDecode_Literal_languageCode:128->doTest:190 expected:<"abc"@en> but was:<"abc"@en[^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#langString>]> 18:59:10 TestEncodeDecodeValue.test_encodeDecode_Literal_singleQuotes:123->doTest:190 expected:<"'ab'c'"> but was:<"'ab'c'"[^^<http://www.w3.org/2001/XMLSchema#string>]> 18:59:10
Fri, Apr 5
Thu, Apr 4
@Aklapper also, could you do the necessary magic to make Igor's phabricator account approved?
@Aklapper: I see there that:
Wed, Apr 3
I've also made a counter to check how many "forward skips" - i.e. loading revision further than we've asked in change - we get. The averages are between 0.1 and 0.5, sometimes going to 1 - i.e. we're saving up to one item fetch/update per second, or since we're processing about 10 updates per second, it's from 1% to 10% speed improvement. 1% is low, but 10% is not, so we may not want to give up on skip-ahead just yet.
Looks like we have problem with redirects - they can not be fetched by-revision. E.g.:
GUI is still not mergeable, so I think I'll disable running wdio test on CI for now.
Tue, Apr 2
It's a 4 year old task, so I myself is not 100% clear which one it was back then. So I think having it here is fine.
OK it seems to be a bit unclear whether this was asking for revision IDs on particular entity or on the dump as a whole. I think that we need both, but the patch above seems to add the revision ID to entities. I think it makes sense.