As a Wikidata user, I want Wikidata to function with limited data in the case of Blazegraph failure due to reaching maximum graph size, rather than being completely non-functional.
This ticket is a part of WDQS disaster planning, and reflects research into mitigation strategies for catastrophic failure of Blazegraph: specifically in the case that the Wikidata graph becomes too big for Blazegraph to continue supporting. This is not a commitment to a long term state of WDQS or Wikidata, but part of the disaster mitigation playbook in a worst case scenario.
In the case of Blazegraph reaching the maximum number of triples it can store, we may need to prioritize which data to keep and which to delete (temporarily, until we can reload it back in from dumps later, when our graph backend can support it). This task is to determine how much space we can save by deleting the candidates below. Note that these candidates are unordered in priority, and this list does not take into account anything beyond the size of these data in Blazegraph.
For each candidate in the list, determine its size in Blazegraph (actual, percentage), and how much runway time deleting it gains us at current Wikidata growth rate.
 All non-English labels
 All labels that are covered by fallbacks (names duplicated across 200 languages etc)
 All labels
 all descriptions
 All aliases
 All labels, description and aliases
 External identifiers
 Scholarly papers: http://wikicite.org/statistics.html
 Scholarly papers + authors + scientific journals + identifiers
 Astronomical objects: https://www.wikidata.org/wiki/Q6999
 Items that don’t have 3 backlinks
 Look at the distribution of number of backlinks, and use that to determine how many backlinks might make more sense
 All statements of a specific datatype: monolingual text (not important for querying says Lydia)
 Non-normalized values (units, dates, external ids)
 non-top-ranked statements https://grafana.wikimedia.org/d/000000175/wikidata-datamodel-statements?orgId=1&refresh=30m
 Every item with ORES quality score lower than X or no ORES score
 Items (without hot properties) that are being queried
Other pieces that would be good to know even if we can't drop them:
 All Properties
 All sitelinks
 All classifying statements (aka the ontology) - this would be all statements using the Property "subclass of", "instance of", "part of", "has part" or "parent taxon"
 All humans