Vast majority of remaining failures is from wdqs2003: https://logstash.wikimedia.org/goto/fe077467d39c2ee03ce8127bdca517ae
Of course in that literature they also don't have multiple options, the user is auto completing a query and not an item from the database. In our case the name displayed should be chosen by the highlighter,
Mon, Sep 24
Session abandonment (typed something, but no final item selected).
Fri, Sep 21
This is about a PHP constant, not a data item for the lexeme namespace, right?
To add some context, some Vagrant setups (i.e. on development machines) have /vagrant set up in the way that does not allow other users easily writing there and changing permissions. This causes a lot of hassle that could be avoided if the directory does not reside in /vagrant space.
Thu, Sep 20
Right now full lexeme dump is just 2.1M compressed, so adding it to main dump would not be a big deal for dump size. However, absent the separate dump, you'd have to always download the huge one, of course. Which makes me still support the separate dump route.
@Tpt this is done too, right?
I think this is done.
What do folks think about putting all the 'misc dump scripts' (puppet/modules/snapshot/files/cron) in their own repo
Could you please be more specific as to what you mean by "contaminated with references to Wikidata", i.e. spell out specific things that need to be done?
Yep, according to @Gehel there were restarts going on in the same time (my luck... sigh, should have checked) so maybe it's caused by restarts. Will try again and see. We probably should somehow detect this case anyway.
Thanks, and what about the spike in connection failures?
This is tricky since Lexeme and Item use different search queries (due to the fact that they have different structure).
Tue, Sep 18
OK, gotcha. It's going to take a bit of time as I need to remember the details of the whole setup there... ldfclient should be easy but pole has some setup involved.
As the immediate problem ceased, resetting to Normal priority.
Is Retry-after always provided?
Looking at logstash: https://logstash.wikimedia.org/goto/39a6fe9edd787798129b66ae9d61ed90 there's definitely a drop in timeouts, but there are still present, so I will monitor this further.
Looks like it's still in Cirrus. Probably a good idea to move, I forgot about this one.
By "upgrade", you mean shut down these VMs and create new ones with Stretch, or is it possible to migrate an existing VM?
The API requests for recentchanges now seem to be faster, but I still get exceptions in the log :( I also get a bunch of errors for Wikidata URLs like: https://www.wikidata.org/wiki/Special:EntityData/Q33799921.ttl?nocache=1537250691109&flavor=dump
These are supposed to be pretty fast but still produce "no response" sometimes. I'll try to see what else can be causing those. Individual requests that I am testing seem to be fine, but I wonder if it's possible that the request still occasionally uses the DB host with wrong index?
Mon, Sep 17
I am a bit confused by now - is the original problem because recentchanges is using a wrong host, or it's using right host and the indexes there are wrong, or something else? And how can it be fixed? WDQS poller depends on RC API, and having it take 30+ seconds instead of usual sub-second response time is a serious issue.
Rate limiting is bucketing based on IP+User agent, so having distinctive user agent for WBQC is certainly a good idea.
But since August 10th, the SPARQL usage number is very small (even 0 for certain days)
Sat, Sep 15
I tried db2085:3318 and the result the same as other codfw host. So if that's what actual API is using, that could be the reason why it is so slow.
@Reedy I am not sure which host, I just logged in to maintenance host for eqiad and codfw. Lookups show db2082.codfw.wmnet and db1092.eqiad.wmnet.
Looks like codfw one does not use index. @jcrespo do you have any idea why that could happen?
SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20180914110000') AND rc_namespace IN ('0','120') AND rc_type IN ('0','1','3','6') ORDER BY rc_timestamp ASC,rc_id ASC LIMIT 101 ;
on mwmaint1001 takes 0.00 sec, on mwmaint2001 takes 56.50 sec!
Fri, Sep 14
Doesn't seem to be WDQS related entirely - e.g. if I call 'https://www.wikidata.org/w/api.php?format=json&action=query&list=recentchanges&rcdir=newer&rcprop=title%7Cids%7Ctimestamp&rcnamespace=0%7C120&rclimit=100&continue=&rcstart=2018-09-14T00%3A00%3A00Z' - i.e. try to load 100 items from start of the day today - it takes 29 seconds:
A simple solution, I suppose, would be to completely skip prefixes for REGEX queries, which are (I believe) the most common queries we send out and never need any prefixes.
All bans are temporary, so as soon as traffic returns to normal the bans will expire. It would be nice if there was a way to wbqc to respect the 429 throttling header, which will avoid bans.
So, weird thing: now that we switched data centers, wdqs2003 is showing the same anomaly. Could it be that our load balancing is not balancing the load evenly for these hosts?
12,499,055 throttling events in last 24 hours. This is definitely not good.
Any idea what is going on here?
Thu, Sep 13
I think the whitelist was not deployed yet, could you try again now and see if it works?
Wed, Sep 12
There is the possibility that we will need to provide dumps of all constraint violations in order to ease the loading of data into WDQS servers that are starting from scratch
it doesn't seem possible to connect to Blazegraph because the updater already runs?
This error means that the timestamp stored in the database is more than 30 days behind (can be changed with wikibaseMaxDaysBack property). In this case, you can:
- Load a dump that is reasonably recent
- Run Updater with -s DATE --init
Is something happening on this or this was shelved for now?
Tue, Sep 11
Applying wikidata role does not create content, AFAIK. I just went to Special:NewItem and created a bunch of them manually. There is probably a way to load it from dumps etc. (WMDE folks and particularly @Addshore may know some better ways) but I just made them manually.
@Lucas_Werkmeister_WMDE your query finds a lot of statements with wdno: claims, which do not have ps:.
Mon, Sep 10
So repackaging with a more recent jetty-http (or the whole jetty stack) might not be that hard
Something weird is definitely going on - out of 582769 statements with P39, we have 2334 that are missing rank (and possibly other clauses). As statement should never ever be missing rank, it's clearly some bug. I'll dig into it and see how it could happen.