Page MenuHomePhabricator

Drop EntityId from statement GUID
Open, LowPublic

Description

Aleksey wrote:

Statement ID should be mandatory on domain model level. To achieve this we need to have method addStatement() on an Entity that will generate correct ID (we cannot do it in Statement constructor because we need an entity ID so we will adhere to the current contract <Q1$UUID>)

Current changes complicated by this:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 17 2018, 3:52 PM
Pablo-WMDE updated the task description. (Show Details)Sep 17 2018, 3:54 PM

@JeroenDeDauw wrote

The premise here is that there are costs to having the EntityId in statement GUIDs. Such as not being able to require that each statement has a GUID.
This raises the question: can we get away with dropping the EntityId from statement GUIDs?
I am not aware of any places where we rely on the EntityId bit in the GUID string. Since the GUIDs are part of several public interfaces, usage outside Wikibase should also be considered.

@Addshore wrote

This is at least needed right now by SetClaim.php API module in Wikibase.
I haven't looked around for any more but I imagine there are a few.
If we were to get rid of this we would have to provide another way to do the lookup somewhere.

At least one place where we need to get the entity ID from the statement ID is WikibaseQualityConstraints: in the wbcheckconstraints API, you can ask for all constraint violations of a statement, but for that we still need to load the full entity to which that statement belongs.

I imagine this isn’t the only such location (looking for usages of StatementGuidParser::parse is probably a good point to start searching for them), and to be honest I don’t see the benefit of removing the entity ID from the statement GUID.

Pablo-WMDE updated the task description. (Show Details)Sep 21 2018, 6:45 AM
I don’t see the benefit of removing the entity ID from the statement GUID.

Having it in there creates a bunch of problems:

  • Entities that do not have an ID yet can have statements. This forces us to have nullable GUIDs
  • Statements can exist loose from Entities. For instance the Statements in a Diff between two Entities do not belong to a single Entity. Hence there is no sensible Entity id to be use here

At least one place where we need to get the entity ID from the statement ID is WikibaseQualityConstraints: in the wbcheckconstraints API, you can ask for all constraint violations of a statement, but for that we still need to load the full entity to which that statement belongs.

Why does the whole entity need to be loaded? The requested violations are stored elsewhere no? Why would you need anything beyond the GUID of the Statement?

Why does the whole entity need to be loaded? The requested violations are stored elsewhere no? Why would you need anything beyond the GUID of the Statement?

Because this is the API to check constraints, not to load constraint check results from wherever. We cache constraint check results for whole entities, but cached results may not be available (or they may be stale), and for single-statement constraint checks we never cache results. And to check constraints, you need the data of the statement itself (“one of”, “range”), other statements of the same entity (“conflicts with”, “item requires statement”, “type”, “difference within range”), and even statements of other entities (“value requires claim”, “value type”).

Right ok, that makes sense.

Addshore triaged this task as Low priority.Oct 31 2018, 4:45 PM

Setting to low as there is no really consensus to do this or a need for it right now.

hoo added a subscriber: hoo.Nov 6 2018, 12:27 PM