Page MenuHomePhabricator

WDQS cache returns invalid JSON
Open, LowPublic

Description

It seems as if WDQS can return invalid JSON. In my Python script I can get:

simplejson.scanner.JSONDecodeError: Invalid control character u'\n' at: line 192512 column 34 (char 4700746)

I suspect the cache can contain invalid JSON if the query times out in the middle of serving the response.

Event Timeline

Fnielsen created this task.Mar 12 2018, 5:25 PM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptMar 12 2018, 5:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Hm, I guess that’s possible if the query times out while sending out results (in which case we already sent the HTTP status, so we can’t tell the cache that this is actually an error response). No idea how to fix this, though…

The server can return partial JSON in case of timeout. The HTTP status should be an error then, like 500, though. Not sure what the "cache" refers to here though. Maybe Vagrant caches cache 500 responses too?

Another option is that the query finishes successfully but then there's a timeout while delivering the response because response is too big. In this case I guess it's possible to get invalid JSON too...

Another option is that the query finishes successfully but then there's a timeout while delivering the response because response is too big. In this case I guess it's possible to get invalid JSON too...

Yes, that’s what I meant. (A simple example is SELECT * { ?s ?p ?o } with a low maxQueryTimeMillis.) We also have a task for rendering such responses in the UI, by the way: T169666: Render partial results

The server can return partial JSON in case of timeout. The HTTP status should be an error then, like 500, though. Not sure what the "cache" refers to here though. Maybe Vagrant caches cache 500 responses too?

I am not sure where the partial JSON arise. I get it with a big response. I suppose my problem is like T169666.

I have recently experienced that the partial JSON comes very quickly with a query that usually takes 10s of seconds. As if the partial JSON is in the "cache" (Vagrant? Varnish?).

Addshore moved this task from incoming to monitoring on the Wikidata board.Sep 17 2018, 8:17 AM
Smalyshev triaged this task as Low priority.Feb 20 2019, 11:38 PM