**Problem:**
As Wikidata PMs we need to better understand how much of Wikidata's graph consists of Labels, Descriptions, and Aliases, in order to make a good decision about how to split the Blazegraph database.
**Questions:**
* # of triples that describe Labels
* % of triples that describe Labels
* # of triples that describe Descriptions
* % of triples that describe Descriptions
* # of triples that describe Aliases
* % of triples that describe Aliases
**How the data will be used:**
* see T337799
**What difference will these insights make:**
* see T337799
**Notes:**
* The most recent numbers that we can get will do.
== Assignee Planning ==
**Information below this point is filled out by WMDE Analytics and specifically the assignee of this task.**
=== Sub Tasks ===
Full breakdown of the steps to complete this task:
[x] Look into prior research on this topic
[x] Define tables to be used below
[x] Aggregate total and percent for labels
- 2023-7-10:
- Total: 801,847,766
- Percent: 5.334
- 2023-07-19:
- Total: 2,877,509,113
- Percent: 19.14
[x] Aggregate total and percent for descriptions
- 2023-7-10:
- Total: 2,877,509,113
- Percent: 19.14
- 2023-07-19:
- Total:
- Percent:
[x] Aggregate total and percent for aliases
- 2023-7-10:
- Total: 178,352,219
- Percent: 1.186
- 2023-07-19:
- Total:
- Percent:
[ ] Putting results/process in a public place for future reference
- Where would this ideally be?
=== Data to be used ===
See [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake | Analytics/Data_Lake ]] for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- The `discovery.wikibase_rdf` table will be used for this
- Schemas are not documented for this table on Wikitech, but anyone with access to the analytics cluster can access it (as of 17-7-2023)
- The table includes subject-predicate-object relationships for Wikibase instances including Wikidata
=== Notes and Questions ===
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- Related tasks:
- {T293628}
- {T303831}
- Prior analysis on this has been done in the following places:
- [[ https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis | Wikidata_Subgraph_Analysis ]]
- [[ https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis | Wikidata_Vertical_Analysis ]]
- See in general work by:
- [[ https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=User%3AAKhatun&namespace=0 | AKhatun ]]
- [[ https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=User%3AAndreaWest&namespace=0 | AndreaWest ]]