Page MenuHomePhabricator

Disable fetching constraints from the wdqs updater
Open, Needs TriagePublic

Description

The way constraints are updated using the old updater is suboptimal and partial:

  • does not take the revision into account
  • only fetched on item edits
  • they all disappear after a data-reload
  • the new streaming updater does not support these
  • user impact does seem small: 370 out of 230,492,864 queries for jan 2021 are using the wikibase:hasViolationForConstraint predicate

I suggest to disable them while a proper solution is found and put in place on the wikibase side (T201150, T201147)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 664782 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/puppet@production] [wdqs] disable fetching constraints

https://gerrit.wikimedia.org/r/664782

What would this effectively mean? Would existing constraint violations in the query service go stale? Would only new ones not be added?

If we agree to stop fetching constraints from the updater this would effectively mean that we do not fetch new violations for new edits existing ones will stay until the next reload (T267927). After the reload wdqs won't be usable for querying constraint violation. Proper solutions will have to be found to expose&query constraint violations (ideally in another graph).

If we disagree to stop fetching constraint violations and keep the system as-is they will disappear after the reload and only new edits will repopulate violations. When enabling the new updater we will have to keep the old one running just for fetching violations on edits (status quo).

Ok that'd obviously not be great but I guess given the current low usage it could be acceptable if we announce it.
Progress on our side is blocked on T214362 which needs approval from WMF team and has been stuck for quite a while. Our efforts to revive it have not been very successful, unfortunately. Maybe you can help push this internally?

Getting caught up on this now. This is still a little abstract for me having just learned about constraints. Is it possible to get an example of what the (changed) user experience would be in each case?

@Lydia_Pintscher , the blocker you mentioned is before I started; is there anything specific I should know or can do to help unblock? (about to start reading it -- seems lengthy, and if there's a tl;dr, that would also be helpful)

Getting caught up on this now. This is still a little abstract for me having just learned about constraints. Is it possible to get an example of what the (changed) user experience would be in each case?

Constraints are a way to find issues in the data in Wikidata. You can for example use them to define things like "date of death property should have values that are after the date of birth on the same item". Or "such and such property should only be used on Items of this class". They then show up next to problematic statements when you're logged in as little exclamation marks. The idea is to alert editors to mistakes in the data so they can fix them. The rules are defined by the editors. Here are some examples for these rules: https://www.wikidata.org/wiki/Property:P50#constraints

We've been working on getting the constraint violations also into the query service so you can do queries like "give me all dutch painters from the 1800s that have statements with constraint violations. So far not all constraint violations are loaded into the query service but only a more or less random (but not really random) subset. Due to this the uptake has also been slow from the editors. To change that we have been pushing towards getting all or at least the vast majority of the constraint violations into the query service. The big ticket for that is T192565.

@Lydia_Pintscher , the blocker you mentioned is before I started; is there anything specific I should know or can do to help unblock? (about to start reading it -- seems lengthy, and if there's a tl;dr, that would also be helpful)

I think Adam, Leszek or Lucas can probably say more to that.

Here are some examples for these rules: https://www.wikidata.org/wiki/Property:P50#constraints

If you go to its talk page, the constraint templates have links to the lists of constraint violations. The links labeled SPARQL (new) use wikibase:hasViolationForConstraint, which will break. Most constraints also have non-new SPARQL queries, but some (it’s only the allowed entity types constraint on P‌50, but AFAIR there are some others as well) don’t. So how to find these constraint violations? Even though the overall impact of the removal is small, its impact on this particular—rare but important—use case is very high. I’m open to all solutions, including a special page (not Special:ConstraintReport, as I want to search by predicate, not by subject), a Toolforge tool etc. The only important criteria are that it should provide (nearly) real-time results and it should have a stable URL that can be linked from the constraint template.

Thanks for bringing this here, this link is generated from https://www.wikidata.org/wiki/Module:Constraints/SPARQL and seems to be added to all properties except the fews that define no constraint.
Digging more through the impact over the 370 queries using wikibase:hasViolationForConstraint for March (1st -> 28th):

cc: @matej_suchanek @Mike_Peel
Not sure if you saw the announcement for the new WDQS Streaming Updater a couple of weeks ago, but it might be relevant to you that it no longer supports fetching constraint violations (I was informed you are among people who use this importing a lot). You may want to update your modules/bots in the near term.

For the longer term, we are currently collecting feedback on the new Streaming Updater, including what specific use cases there are. Please let us know if you have thoughts on how we can improve this functionality in the future!

cc: @matej_suchanek @Mike_Peel
Not sure if you saw the announcement for the new WDQS Streaming Updater a couple of weeks ago, but it might be relevant to you that it no longer supports fetching constraint violations (I was informed you are among people who use this importing a lot). You may want to update your modules/bots in the near term.

This is specifically for Wikidata, not related to the Commons query service, right? If so, this is a serious issue, various of my bot codes rely on this, I can look into changing them to only use the classical constraint violations in a week or so.

cc: @matej_suchanek @Mike_Peel
Not sure if you saw the announcement for the new WDQS Streaming Updater a couple of weeks ago, but it might be relevant to you that it no longer supports fetching constraint violations (I was informed you are among people who use this importing a lot). You may want to update your modules/bots in the near term.

This is specifically for Wikidata, not related to the Commons query service, right? If so, this is a serious issue, various of my bot codes rely on this, I can look into changing them to only use the classical constraint violations in a week or so.

Correct: this change affects Wikidata and Wikidata Query Service. It may also affect Commons Query Service when we move to production with it, as I think that will involve applying the new Flink-based Streaming Updater there as well.

cc: @matej_suchanek @Mike_Peel
Not sure if you saw the announcement for the new WDQS Streaming Updater a couple of weeks ago, but it might be relevant to you that it no longer supports fetching constraint violations (I was informed you are among people who use this importing a lot). You may want to update your modules/bots in the near term.

Thank you for letting me know anyway, but this doesn't have any direct impact on my tools. This is still somehow relevant for me, though, as it probably was me who created the queries (or "had them generated") discussed above.

@matej_suchanek We thought of you because you are the creator of this module https://www.wikidata.org/wiki/Module:Constraints/SPARQL that uses the Query Service.

@matej_suchanek We thought of you because you are the creator of this module https://www.wikidata.org/wiki/Module:Constraints/SPARQL that uses the Query Service.

Right, this is where the wikibase:hasViolationForConstraint queries are synthesized. Although the module is actually used in https://www.wikidata.org/wiki/Module:Constraints where the boxes are built and the links are assembled. I think I can only take them down when the new updater is live.

OK, I've pulled them from my bot now, with this commit: https://bitbucket.org/mikepeel/wikicode/commits/889116f38a6240a5a8d2fede4017e94048aab985 . This affects removing bad P373 values (for which I am now reliant on the on-wiki database reports), and inverse values between P1753/P1754 and P301/P910. For P910/P301 I have a problem that the alternative query times out, but I'm also still checking through on-wiki database reports anyway. I estimate I was doing around 5x30=150 out of those 370 queries per month that the task description mentions.