As you can see here, some of the values of "Instance of counts" on "Wikidata Datamodel Statements" aren't updating anymore. Are the queries getting too large?
Description
Details
Related Objects
- Mentioned In
- rADESe42e607c6d42: instanceof.php short sleep to avoid rate limit
rADESe11d932a3c12: instanceof.php short sleep to avoid rate limit
rADES487af916046e: instanceof.php talk to wdqs1003:8888 directly for now
rADES71dded4a7c35: instanceof.php talk to wdqs1003:8888 directly for now
rADES5503c970a671: instanceof.php improved extra logging on sparql result
rADES0d2b35b095d7: instanceof.php improved extra logging on sparql result
rADESae341142fce0: instanceof.php extra logging on sparql result
rADES682b2fe60eca: instanceof.php extra logging on sparql result - Mentioned Here
- T119941: Enable longer WDQS queries from inside the cluster
Event Timeline
For reference this is generated by https://github.com/wikimedia/analytics-wmde-scripts/blob/9bac17d653bdfea85a7d0d60e2a41a43be4768fd/src/wikidata/sparql/instanceof.php
So these are generated using SPARQL queries.
As a guess (as the value are the largest on the graph) maybe the query is too big :/
I will add some extra debug logging!
Change 380490 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@master] instanceof.php extra logging on sparql result
Change 380491 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@production] instanceof.php extra logging on sparql result
Change 380491 merged by jenkins-bot:
[analytics/wmde/scripts@production] instanceof.php extra logging on sparql result
Change 380490 merged by jenkins-bot:
[analytics/wmde/scripts@master] instanceof.php extra logging on sparql result
2017-09-26 03:00:01 wikidata-sparql-instanceof Script Started! 2017-09-26 03:00:09 wikidata-sparql-instanceof Result for: Q11266439 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"898376"}}]}} 2017-09-26 03:00:40 wikidata-sparql-instanceof Result for: Q4167836 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"4049476"}}]}} 2017-09-26 03:00:40 wikidata-sparql-instanceof Result for: Q15184295 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"2488"}}]}} 2017-09-26 03:00:59 wikidata-sparql-instanceof Result for: Q16521 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"2357175"}}]}} 2017-09-26 03:01:59 wikidata-sparql-instanceof Result for: Q11173 null 2017-09-26 03:02:26 wikidata-sparql-instanceof Result for: Q5 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"3614631"}}]}} 2017-09-26 03:02:36 wikidata-sparql-instanceof Result for: Q56061 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"1695766"}}]}} 2017-09-26 03:03:36 wikidata-sparql-instanceof Result for: Q1190554 null 2017-09-26 03:03:43 wikidata-sparql-instanceof Result for: Q811979 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"1304179"}}]}} 2017-09-26 03:03:45 wikidata-sparql-instanceof Result for: Q13406463 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"306758"}}]}} 2017-09-26 03:03:52 wikidata-sparql-instanceof Result for: Q4167410 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"1226816"}}]}} 2017-09-26 03:03:54 wikidata-sparql-instanceof Result for: Q11424 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"252571"}}]}} 2017-09-26 03:03:57 wikidata-sparql-instanceof Result for: Q83620 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"408207"}}]}} 2017-09-26 03:03:59 wikidata-sparql-instanceof Result for: Q6999 {"head":{"vars":["scount"]},"results":{"bindings":[{"scount":{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#integer","type":"literal","value":"152799"}}]}} 2017-09-26 03:04:59 wikidata-sparql-instanceof Result for: Q16686448 null 2017-09-26 03:04:59 wikidata-sparql-instanceof Script Finished!
Change 380715 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@master] instanceof.php improved extra logging on sparql result
Change 380716 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@production] instanceof.php improved extra logging on sparql result
Change 380716 merged by jenkins-bot:
[analytics/wmde/scripts@production] instanceof.php improved extra logging on sparql result
Change 380715 merged by jenkins-bot:
[analytics/wmde/scripts@master] instanceof.php improved extra logging on sparql result
addshore@stat1005:~$ cat /srv/analytics-wmde/graphite/log/daily.03.log |grep wikidata-sparql-instanceof |grep Q11173 2017-09-26 03:01:59 wikidata-sparql-instanceof Result for: Q11173 null 2017-09-27 03:02:07 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q11173} [{"http_code":"HTTP\/1.1 500 Server Error","Date":"Wed, 27 Sep 2017 03:02:07 GMT","Content-Type":"text\/plain","Content-Length":"3093","Connection":"keep-alive","Server":"nginx\/1.11.13","X-Served-By":"wdqs1004","Access-Control-Allow-Origin":"*","Vary":"Accept-Encoding","X-Varnish":"21878624, 304358865","Via":"1.1 varnish-v4, 1.1 varnish-v4","Age":"0","X-Cache":"cp1051 miss, cp1058 miss","X-Cache-Status":"miss","Strict-Transport-Security":"max-age=106384710; includeSubDomains; preload","Set-Cookie":"WMF-Last-Access-Global=27-Sep-2017;Path=\/;Domain=.wikidata.org;HttpOnly;secure;Expires=Sun, 29 Oct 2017 00:00:00 GMT","X-Analytics":"https=1;nocookies=1","X-Client-IP":"2620:0:861:108:1a66:daff:fefa:bc4"},"SPARQL-QUERY: queryStr=PREFIX wd: <http:\/\/www.wikidata.org\/entity\/>PREFIX wdt: <http:\/\/www.wikidata.org\/prop\/direct\/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31\/wdt:P279* wd:Q11173}\njava.util.concurrent.TimeoutException\n\tat java.util.concurrent.FutureTask.get(FutureTask.java:205)\n\tat com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:293)\n\tat com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlQuery(QueryServlet.java:654)\n\tat com.bigdata.rdf.sail.webapp.QueryServlet.doGet(QueryServlet.java:288)\n\tat com.bigdata.rdf.sail.webapp.RESTServlet.doGet(RESTServlet.java:240)\n\tat com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doGet(MultiTenancyServlet.java:271)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:687)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1667)\n\tat org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:244)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)\n\tat ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:49)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)\n\tat org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:42)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:497)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)\n\tat org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)\n\tat java.lang.Thread.run(Thread.java:748)\n"]
So it does indeed look like these are timing out.
We have 2 options:
- Rewrite this so that the numbers come from the dumps instead of from the SPARQL endpoint. This will mean only having weekly data
- Use the open port (for the analytics cluster only) that allows for longer running SPARQL queries. This will mean we keep the daily data. I guess one day in the future we might run into the same problem though. (Relevant ticket T119941)
I have unassigned myself as the investigation of the cause is now done!
@Lydia_Pintscher what priority is this?
I'd say pretty important because it lets us see huge imports happening or changes to the ontology that have huge impacts.
I'd say let's try to go with the internal endpoint if that is ok. If/when that fails we can switch to using the dumps?
Change 380974 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@master] instanceof.php talk to wdqs1003:8888 directly for now
Change 380975 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@production] instanceof.php talk to wdqs1003:8888 directly for now
Change 380975 merged by jenkins-bot:
[analytics/wmde/scripts@production] instanceof.php talk to wdqs1003:8888 directly for now
Change 380974 merged by jenkins-bot:
[analytics/wmde/scripts@master] instanceof.php talk to wdqs1003:8888 directly for now
So, looking at the dashboard, the query for the count of events is working again, but some other queries failed in the past day.
Looking at the logs:
2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q811979} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q13406463} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q4167410} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q11424} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q83620} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q6999} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof WikidataInstanceOf::doSparqlQuery: PREFIX wd: <http://www.wikidata.org/entity/>PREFIX wdt: <http://www.wikidata.org/prop/direct/>SELECT (count(distinct(?s)) AS ?scount) WHERE {?s wdt:P31/wdt:P279* wd:Q16686448} [{"http_code":"HTTP\/1.1 429 ","Server":"nginx\/1.11.6","Date":"Thu, 28 Sep 2017 03:07:40 GMT","Content-Type":"application\/octet-stream","Transfer-Encoding":"chunked","Connection":"keep-alive","Cache-Control":"no-cache"},"Rate limit exceeded\n"] 2017-09-28 03:07:40 wikidata-sparql-instanceof Script Finished!
Looks like it is hitting the new rate limit, so we probably want to put some sleeps in there! :D
Change 381207 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@master] instanceof.php short sleep to avoid rate limit
Change 381208 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@production] instanceof.php short sleep to avoid rate limit
Change 381208 merged by jenkins-bot:
[analytics/wmde/scripts@production] instanceof.php short sleep to avoid rate limit
Change 381207 merged by jenkins-bot:
[analytics/wmde/scripts@master] instanceof.php short sleep to avoid rate limit
And the graph is indeed interesting: I saw that scientific articles were listed as a event for some reason, so fixed the family tree.