Thu, Aug 15
Furthermore, https://query.wikidata.org/css/embed.style.min.fa3ff6a142279256ede4.css gives 404
Looks like Belgium is back to Q31 and cats are cats again. I will investigate what happened to the URIs (most likely the order in the dictionary switched somehow because of changing of underlying storage class but I have missed it since the content of the dictionary is still the same).
Probably my fault, I've deployed new WDQS with some URL refactoring, but looks like something went wrong with URI scheme. I will be rolling it back.
This should not be hard to do.
http://dcatap.wmflabs.org/ is not up and can be queried.
@WMDE-leszek you patch is still WIP, are you still working on it?
Wed, Aug 14
Anther point: while WDQS does fetch data from both clusters (at least is supposed to), it only tracks its timestamp by one topic: reportingTopic, which is currently eqiad.mediawiki.revision-create. Otherwise we get weird jumps in the timestamps, since different topics can get different events in different sequence. So if eqiad does not get any events, the updater seems to lag even though it is processing events from codfw. I am not sure whether "nothing useful" message is related anyhow.
Tue, Aug 13
We'll also probably need to update items edited in that timeframe manually, just in case. I'll do that a bit later (and also will add docs for doing this).
This is pretty weird, the updater should be able to consume from both eqiad and codfw, maybe something between the brokers did not work, or we're not connecting to the right endpoint? The messages and the situation definitely looks like it stopped getting events - in general Did not find anything useful in this batch, returning existing data is normal if it happens occasionally (it means no new events) but if it happens all the time that means there's trouble since we're not getting events. So we need to check maybe we're missing something in our kafka setup.
Mon, Aug 12
I've just remembered we'd need VPS for testing SPARQL service for T141602, so we might allocate some quote for those as well.
I've added fix for one of the issues in T222497 already but it doesn't fix everything. I think it's still would be interesting to test what happens in production - maybe not full dump but just partial, to estimate what we're dealing with and how bad is it? Maybe due to the fact we don't have yet too many mediainfo records and they're small we could be still fine?
Sat, Aug 10
Fri, Aug 9
This may be moved to a separate endpoint, probably in Toolforge.
Setting it to High priority since it will start generating problems in production as soon as the train gets there, I imagine.
Thu, Aug 8
Wed, Aug 7
Note also that CheckConstraintsRdf.php contains mention of RdfVocabulary::NS_STATEMENT which can also possibly break after this patch.
So it's a bit tricky since in the new scheme the entity URI actually depends on the repository. So we probably have to check all possible entity prefixes really.
Another this is - do we even need $this->prefixes? WDQS understands all those by default already.
Hmm, possibly this patch changes config requirements in some incompatible way? I'll try to check what's going on.
There's a patch suggestion: https://gist.github.com/dstogov/43a992d481f65ac16c454e1a292be38e
No recent instances of this, so I think it's fixed now.
Tue, Aug 6
@Gehel I think for now we need to reload the DB from other server and repool it. Let's keep an eye on this problem and see if it ever happens again.
Mon, Aug 5
The journal dump says: There are 755560 commit points.. Probably not a normal situation.
Sun, Aug 4
If the item has been edited since that time, it is probably not affected. If not, then it depends - whether the modification has been made before dumping code got to it or after. There's no real way for me to know it for each item, at least I don't think I know any way.
Sat, Aug 3
I've updated affected items from 2019-07-02T12:59:28 to 2019-07-03. The items between 2019-07-01T23:00:02Z and 2019-07-02T12:59:28 still may be missing updates, but since all streams from that time seem to be already purged, I can't update them, so please tell me if any items are still missing, and I'll update them. Or just edit them. Also please tell me if any items outside that timeframe are wrong - that would be some other issue.
Fri, Aug 2
@Igorkim78 I wonder if you have any idea about this.
Thu, Aug 1
Thanks! Indeed, there's the edit icon now.
Looks like the earliest dump started at 2019-07-01T23:00:02Z and the latest at 2019-07-03T16:20:54Z. Between those dates, there might be updates missed due to T229617. I'll try to update it but the problem is that RC stream seems to be preserved only for 30 days, so I only have data since 2019-07-02T22:04:34. I'll try to see if I can find which items have been updated between 2019-07-01T23:00:02Z and 2019-07-02T22:04:34 but that data may not be available anymore.
the user agent is not set, it just shows up as - so that's what WDQS sees