Make Context serializable
Closed, ResolvedPublic

Description

For T185709: Cache CheckResult serializations per-entity in ObjectCache, we need to be able to serialize and deserialize Contexts, so that they can afterwards be used to store the deserialized CheckResults into an array again. I assume this will have to be specific for the three context types we support, and each of them will serialize the data it needs, and then there’ll be some code somewhere that delegates to the correct one of the three subclasses for deserialization.

I’m not sure how this would look – non-static serialize and static deserialize methods on Context and its subclasses? Or should it be a separate kind of ContextSerialization service? I think my colleagues know more about this than me, there’s a lot of serialization going on in Wikibase ;)


Lucas_Werkmeister_WMDE triaged this task as Normal priority.

I’m not sure how this would look – non-static serialize and static deserialize methods on Context and its subclasses? Or should it be a separate kind of ContextSerialization service? I think my colleagues know more about this than me, there’s a lot of serialization going on in Wikibase ;)

Separate ContextSerializer and ContextDeserializer is probably the way to go, in analogy to some other (de)serializers we already have in WBQC and also in Wikibase.

Different question: a Context currently holds an EntityDocument, which many checkers also assume is a StatementListProvider. Obviously, we don’t want to serialize the full entity… what do we do? Return some fake EntityDocument that only holds the ID and nothing else, and document that deserialized contexts cannot be used to actually check the constraint? Or return some kind of EntityDocument that lazily loads the real underlying object (using an underlying EntityLookup)? Or restructure Context into a more convenient form?

what do we do?

So here are the things that you need in order to check constraints on a Context:

  • type (e. g. “used as qualifier” constraints)
  • snak (e. g. “one of” constraints)
  • snak rank (many constraint types are disabled on deprecated statements – although that is the only use of getSnakRank(), and we might want to turn that into a declarative solution instead, similar to the supported context types)
  • entity ID (e. g. “symmetric”, “inverse” constraints)
  • entity statements (e. g. “item requires statement” constraints)
  • snak group (e. g. “single value” constraints)

And here are the things that you need in order to store a CheckResult into the right place according to its Context:

  • type
  • entity ID serialization
  • statement property ID serialization
  • statement GUID
  • snak hash
  • snak property ID serialization (qualifier, reference only)
  • reference hash (reference only)

The two sets are almost completely disjoint, though the second one can be mostly derived from the first one. So perhaps we should split them up? The checkers get the first set of data in something called, dunno, a ContextData container. A CheckResult contains the second set of data in, uh, a ContextCursor. And we only need to deserialize ContextCursors (which, you’ll observe, exclusively consist of strings), not ContextData. The separation between the two isn’t entirely clean because the checkers need some way to get a ContextCursor from the ContextData (so that they can instantiate a CheckResult – unless the cursor is only attached at the DelegatingConstraintChecker level? hmm), but since a ContextCursor can be mostly derived from ContextData (except that ContextData doesn’t explicitly contain the statement), that shouldn’t be a huge problem.

@Ladsgroup since IIRC you were unhappy with the old Context interface… any thoughts on this? (I’m afraid it’s still more than one method per interface, though.)


On the other hand – that separation is a lot more work than I assumed for this task, so I admit I’m tempted to go with the much simpler

Return some fake EntityDocument that only holds the ID and nothing else

at least temporarily… in fact such a FakeEntityDocument implementation already exists (though only in the tests of data-model-services).

I admit I’m tempted to go with the much simpler … fake EntityDocument …

Hm, though if we do go with a temporary solution, an alternative approach would be a new Context implementation, which only supports storeCheckResultInArray (because that’s all we need for the caching use-case), and throws LogicExceptions on all other getters. (I think it could be a single implementation supporting all three context types – statement, qualifier, reference – since they’re pretty similar anyways: see the shared code in ApiV2Context.)

I would suggest going with another class named "CompactContext" or "SerializerableContext" which has two (more) public functions. "newFromContext" which tuns the context to smallest size possible and returns a new self and other method that turns the compact version into a Context implementation instance with dummy EntityContent (or something similar). This is how we did it with EntityChange (see EntityDiffChangedAspects and its factory) to make it possible to transmit a wikibase repo edits to client wikis without causing too much hassle db/IO/etc. wise.

I don’t fully understand your proposal yet… what would be the use of newFromContext? Would you store the resulting Context object directly in the cache, instead of serializing it into some array form?

Okay, so how about this:

  • Extract Context::storeCheckResultInArray into a separate interface. (I’ll go with ContextCursor for now – I don’t think CompactContext or something like that really fits, because this isn’t just a different representation of a Context, it’s a separate thing.) Use three different implementations for the different context types we have.
  • Add Context::getCursor(), to return the associated ContextCursor.
  • Change CheckResult to hold a ContextCursor instead of a full Context. (We could even make the constructor accept both, and call getCursor if necessary, so that none of the callers would have to change. Might be worth it, at least as an interim measure.)
  • ContextCursor is easily serializable. (However, it’s not completely trivial – deserialization needs to instantiate one of three different classes – so I’d still add dedicated serializer and deserializer classes, instead of just having methods on ContextCursor itself.)
  • We don’t (yet?) offer any way to get from a ContextCursor back to a full Context.
  • Apart from this, Context stays as it is for now (no ContextData).

Okay, so how about this:

  • Add Context::getCursor(), to return the associated ContextCursor.

I really like to avoid more public methods in Context, my recommendation is to have a public static function named ContextCursor::newFromContext( Context $context ) there instead. Beside that, it looks great to me.

To build a cursor, you need the statement GUID, which StatementContext currently doesn’t expose. So the alternative would be another method, getSurroundingStatement or something like that, as far as I can see :/

:(((((((((((((((((((

I forgot one thing… CheckResult also has getters for the snak type and data value, which are used in the special page. We definitely don’t want to add the data value to the cache, but I’m not sure how we can keep the special page functionality in that case…

Change 413773 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add ContextCursor interface and implementations

https://gerrit.wikimedia.org/r/413773

Change 413774 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add Context::getCursor getter

https://gerrit.wikimedia.org/r/413774

Change 413775 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Remove CheckResult::getEntityId()

https://gerrit.wikimedia.org/r/413775

I forgot one thing… CheckResult also has getters for the snak type and data value, which are used in the special page. We definitely don’t want to add the data value to the cache, but I’m not sure how we can keep the special page functionality in that case…

I’m still not sure how to solve this… we have all the information, in theory, since the special page doesn’t use the cache, so the only time when we’re interested in the snak type and data value, we also have access to them. But the question is how to reconcile that inside the CheckResult class.

For now, since the CheckResult constructor accepts both Context and ContextCursor, we could extract the snak type and data value into separate fields, and populate them in the constructor only if the constructor argument was a full Context. That’s not very nice, but it works.

In the long term, perhaps we should split up these “full check results” and “deserialized check results” into two classes…

Change 415546 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add some getters to ContextCursor

https://gerrit.wikimedia.org/r/415546

Change 415547 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add and use CheckResult::getContextCursor()

https://gerrit.wikimedia.org/r/415547

Change 415548 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add item IDs to test items

https://gerrit.wikimedia.org/r/415548

Change 415549 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add snaks to test context mocks

https://gerrit.wikimedia.org/r/415549

Change 415550 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Replace CheckResult’s Context with ContextCursor

https://gerrit.wikimedia.org/r/415550

Change 413773 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add ContextCursor interface and implementations

https://gerrit.wikimedia.org/r/413773

Change 413774 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add Context::getCursor getter

https://gerrit.wikimedia.org/r/413774

Change 413775 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Remove CheckResult::getEntityId()

https://gerrit.wikimedia.org/r/413775

Change 415546 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add some getters to ContextCursor

https://gerrit.wikimedia.org/r/415546

Change 415547 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add and use CheckResult::getContextCursor()

https://gerrit.wikimedia.org/r/415547

Change 415548 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add item IDs to test items

https://gerrit.wikimedia.org/r/415548

Change 415549 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add snaks to test context mocks

https://gerrit.wikimedia.org/r/415549

Change 415550 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Replace CheckResult’s Context with ContextCursor

https://gerrit.wikimedia.org/r/415550

Change 416425 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Remove Context::storeCheckResultInArray and ApiV2Context

https://gerrit.wikimedia.org/r/416425

Change 416425 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Remove Context::storeCheckResultInArray and ApiV2Context

https://gerrit.wikimedia.org/r/416425

Change 416667 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Add more getters to ContextCursor

https://gerrit.wikimedia.org/r/416667

Change 416668 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseQualityConstraints@master] Make ContextCursor serializable

https://gerrit.wikimedia.org/r/416668

Jonas updated the task description. (Show Details)Mar 7 2018, 6:22 PM

Change 416667 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Add more getters to ContextCursor

https://gerrit.wikimedia.org/r/416667

Change 416668 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Make ContextCursor serializable

https://gerrit.wikimedia.org/r/416668

Lucas_Werkmeister_WMDE closed this task as Resolved.Mar 8 2018, 10:52 AM

I think we can close this, the possible refactorings will be done later after we’re done with the special page update.