Page MenuHomePhabricator

Drop "type": "statement" from JSON serialization of Wikibase statements
Open, Needs TriagePublic

Description

As a developer on the Wikidata team, I want to reduce the size of the Wikibase JSON serialization in order to save bandwidth and storage space on the production systems.
As a developer working with Wikidata, I want the size of the Wikibase JSON serialization to be reduced in order to save bandwidth on my own systems.

Problem:
Wikibase used to distinguish between Claims, which had a main snak and qualifiers, and Statements, which were/had a Claim and also had references. To distinguish between them, the JSON serialization for claims/statements has a field "type" which could be [either "statement" or "claim"](https://github.com/wmde/WikibaseDataModelSerialization/blob/1.3.0/src/Serializers/ClaimSerializer.php#L87).

Since version 3.0.0 of the PHP datamodel, released four years ago, this distinction is no more, and [the "type" is hard-coded to "statement"](https://github.com/wmde/WikibaseDataModelSerialization/blob/1.4.0/src/Serializers/StatementSerializer.php#L86). I think it’s time to remove it – doing so saves at least 19 bytes per statement (before compression) in:

  • stored entity revisions
  • API responses for wbgetentities, wbeditentities and other modules
  • Special:EntityData JSON responses
  • regular page views, which include the full JSON serialization (cf. T85499)
  • JSON dumps
  • probably more places I can’t think of right now

For reference, Wikidata currently has some 732 million statements (see wikidata-datamodel-statements for current data), so this might save, for example, almost 13 GiB in the JSON dumps (before compression).

Obviously, this is a breaking change, and should be announced in accordance with our Stable Interface Policy; however, I don’t expect many people to still use this field, so the impact should be low.

Example:
https://www.wikidata.org/wiki/Q1#P828:

{
  "mainsnak": {
    "snaktype": "value",
    "property": "P828",
    "datavalue": {
      "value": {
        "entity-type": "item",
        "numeric-id": 323,
        "id": "Q323"
      },
      "type": "wikibase-entityid"
    },
    "datatype": "wikibase-item"
  },
  "type": "statement", // <--
  "id": "Q1$4e52017b-4fe2-3a7f-864b-51945d9c8104",
  "rank": "normal"
}

Acceptance criteria:

  • There is no more "type" field in JSON serializations of statements.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 4 2019, 1:39 PM