Page MenuHomePhabricator

Wikidata constraint check is getting throttled from wdqs-internal more than usual
Closed, DeclinedPublic

Description

Checking the telemetry metrics for jobrunner -> wdqs-internal we found weird patterns in error rates.

Checking more closely it appears that wdqs-internal is serving more requests (type fallback ones) and thus throttling more of them:

image.png (622×1 px, 119 KB)

(c.f. https://grafana-rw.wikimedia.org/d/000000344/wikidata-quality?orgId=1&refresh=30s)

The system is reacting as it is told to do but should we adapt the service to this new behavior if it persists?
Are there ways to measure the actual user-impact of these errors?

AC:

  • determine if some actions need to be taken
  • configure the system to support this load if yes, decline the task otherwise

Event Timeline

dcausse renamed this task from Wikidata constraint check is getting throttled from wdsq-internal more than usual to Wikidata constraint check is getting throttled from wdqs-internal more than usual.Dec 21 2022, 11:15 AM

I think the user impact of this will be that constraint checks using SPARQL (“subject type”, “value type”, “distinct values”) won’t always work, and users will not be shown some violations of those constraints even when they should be; but if I’m reading DelegatingConstraintChecker::getCheckResultsFor() correctly, these errors won’t abort the whole constraint check, and other constraint violations should still be shown if I’m not mistaken.

Lydia_Pintscher subscribed.

I'm wondering if more requests are coming because one or more impactful constraints have been added. Anyone got a hunch? Worth doing the detective work?

@Lydia_Pintscher we're waiting on you to tell us how important / urgent this is.

To me it looks like SPARQL type checks generally went back to normal around January 4th (Grafana permalink):

image.png (715×1 px, 302 KB)

@Lydia_Pintscher we're waiting on you to tell us how important / urgent this is.

Generally a part of the constraint checks not working is bad for Wikidata because editors don't get shown notifications about issues in the Item they are looking at.
But as Lucas said things seems to be looking ok again.

@dcausse I think we can close this if the metrics look good from your side too? (I don’t know what I’m looking for in the Envoy Telemetry Grafana dashboard.)

Everything looks fine from my end! closing :)