**Problem:**
As Wikidata PMs we need to better understand how much of Wikidata's graph consists of the scholarly articles subgraph, in order to make a good decision about how to split the Blazegraph database.
**Questions:**
What is the size of the direct instances of Q13442814 (scholarly article) in Wikidata
* # of triples
* % of triples
* # of Items
* % of Items
**How the data will be used:**
* see T337799
**What difference will these insights make:**
* see T337799
**Notes:**
* The most recent numbers that we can get will do.
== Assignee Planning ==
**Information below this point is filled out by WMDE Analytics and specifically the assignee of this task.**
=== Sub Tasks ===
Full breakdown of the steps to complete this task:
[x] Define tables to be used below
[] Derive aggregate and percentage data
- Derive # of triples
- Derive % of triples
- Derive # of Items
- Derive % of Items
=== Data to be used ===
See [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake | Analytics/Data_Lake ]] for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- The `discovery.wikibase_rdf` table will be used for this for the aggregate and percent of triples
- [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Wikidata_entity | wikidata_entity ]] can then be used for the aggregate and percentage values for items
=== Notes and Questions ===
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- See: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis.
- See: [[ https://docs.google.com/document/d/1QsV96LtpK5lDD2N2jy-6vaF_0d_Yf_HLb8uFARFMxJ8/edit | Preparations for WDQS graph-splittig ]]