**Problem:**
As Wikidata PMs
we need to better understand how much of Wikidata's graph consists of Labels, Descriptions, and Aliases,
in order to make a good decision about how to split the Blazegraph database.
**Questions:**
* # of triples that describe Labels
* % of triples that describe Labels
* # of triples that describe Descriptions
* % of triples that describe Descriptions
* # of triples that describe Aliases
* % of triples that describe Aliases
**How the data will be used:**
* see T337799
**What difference will these insights make:**
* see T337799
**Notes:**
* The most recent numbers that we can get will do.
== Assignee Planning ==
**Information below this point is filled out by WMDE Analytics and specifically the assignee of this task.**
=== Sub Tasks ===
Full breakdown of the steps to complete this task:
[ ] Aggregate total and percent for labels
[ ] Aggregate total and percent for descriptions
[ ] Aggregate total and percent for aliases
=== Data to be used ===
See [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake | Analytics/Data_Lake ]] for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- link_to_table
=== Notes and Questions ===
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- Prior analysis on this has been done in the following places:
- [[ https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis | Wikidata_Subgraph_Analysis ]]
- [[ https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis | Wikidata_Vertical_Analysis ]]
- See in general:
- [[ https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=User%3AAKhatun&namespace=0 | AKhatun ]]
- [[ https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=User%3AAndreaWest&namespace=0 | AndreaWest ]]