**Problem:**
As Wikidata PMs we need to understand better how much of Wikidata's graph consists of the scholarly articles subgraph, to make a good decision about splitting the Blazegraph database.
**Questions:**
1) Is the ontology clean enough to include all subclasses of Q13442814 (scholarly article) or does that lead to unexpected results?
- See https://w.wiki/77FU
2) What is the size of the instances of Q13442814 (scholarly article) including all instances of only the subclasses that AKhatun used?
* list of subclasses that AKhatun used
* # of triples
* % of triples
* # of Items (optional)
* % of Items (optional)
3) What is the size of the instances of Q13442814 (scholarly article) including all instances of all direct (`wdt:P279`) subclasses?
* # of triples
* % of triples
* # of Items (optional)
* % of Items (optional)
**How the data will be used:**
* see T337799
**What difference will these insights make:**
* see T337799
**Notes:**
* The most recent numbers that we can get will do.
== Assignee Planning ==
**Information below this point is filled out by WMDE Analytics and specifically the assignee of this task.**
=== Sub Tasks ===
Full breakdown of the steps to complete this task:
[x] Define tables to be used below
[x] Base investigation of the subclasses of Q13442814
[x] Find the subclasses that were used by AKhatun and add them to the task's description
- [[ https://www.wikidata.org/wiki/Q13442814 | scholarly article: Q13442814 ]]
- [[ https://www.wikidata.org/wiki/Q18918145 | academic journal article: Q18918145 ]]
- [[ https://www.wikidata.org/wiki/Q5633421 | scientific journal: Q5633421 ]]
- [[ https://www.wikidata.org/wiki/Q58632367 | scholarly conference abstract: Q58632367 ]]
- [[ https://www.wikidata.org/wiki/Q23927052 | conference paper: Q23927052 ]]
- [[ https://www.wikidata.org/wiki/Q10885494 | scientific conference paper: Q10885494 ]]
[x] List the subclasses of scholarly article so we can have an overview of what's being included
- Note that there are at times papers listed as subclasses of scholarly article, but these errors are fixed quickly
- Results from https://w.wiki/7DJX on 8/8/23:
- [[ http://www.wikidata.org/entity/Q187685 | doctoral thesis: Q187685 ]]
- [[ http://www.wikidata.org/entity/Q1228945 | working paper: Q1228945 ]]
- [[ http://www.wikidata.org/entity/Q1347686 | eprint: Q1347686 ]]
- [[ http://www.wikidata.org/entity/Q1504425 | systematic review: Q1504425 ]]
- [[ http://www.wikidata.org/entity/Q2774197 | A-publication: Q2774197 ]]
- [[ http://www.wikidata.org/entity/Q7301211 | Realist Evaluation: Q7301211 ]]
- [[ http://www.wikidata.org/entity/Q7316896 | retraction notice: Q7316896 ]]
- [[ http://www.wikidata.org/entity/Q7318358 | review article: Q7318358 ]]
- [[ http://www.wikidata.org/entity/Q10885494 | scientific conference paper: Q10885494 ]]
- [[ http://www.wikidata.org/entity/Q15706459 | research article: Q15706459 ]]
- [[ http://www.wikidata.org/entity/Q18918145 | academic journal article: Q18918145 ]]
- [[ http://www.wikidata.org/entity/Q56478376 | expression of concern editorial notice: Q56478376 ]]
- [[ http://www.wikidata.org/entity/Q58898396 | classical article: Q58898396 ]]
- [[ http://www.wikidata.org/entity/Q58900805 | Corrected and Republished Article: Q58900805 ]]
- [[ http://www.wikidata.org/entity/Q58901470 | historical article: Q58901470 ]]
- [[ http://www.wikidata.org/entity/Q58902427 | introductory journal article: Q58902427 ]]
- [[ http://www.wikidata.org/entity/Q60535861 | survey article: Q60535861 ]]
- [[ http://www.wikidata.org/entity/Q82969330 | medical scholarly article: Q82969330 ]]
- [[ http://www.wikidata.org/entity/Q92998777 | opinion paper: Q92998777 ]]
- [[ http://www.wikidata.org/entity/Q93003322 | research commentary: Q93003322 ]]
- [[ http://www.wikidata.org/entity/Q99770806 | executable paper: Q99770806 ]]
- [[ http://www.wikidata.org/entity/Q101116078 | scoping review: Q101116078 ]]
- [[ http://www.wikidata.org/entity/Q108196115 | legal article: Q108196115 ]]
- [[ http://www.wikidata.org/entity/Q110716513 | scholarly letter/reply: Q110716513 ]]
- [[ http://www.wikidata.org/entity/Q114413783 | reply paper: Q114413783 ]]
- [[ http://www.wikidata.org/entity/Q115528532 | sleeping beauty: Q115528532 ]]
- [[ http://www.wikidata.org/entity/Q115546988 | prince: Q115546988 ]]
[] Derive aggregate and percentage data for all AKhatun's subclasses
- Derive # of triples
- Derive % of triples
- Derive # of Items
- Derive % of Items
[] Derive aggregate and percentage data for all direct (wdt:P279) subclasses of Q13442814
- Derive # of triples
- Derive % of triples
- Derive # of Items
- Derive % of Items
=== Data to be used ===
See [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake | Analytics/Data_Lake ]] for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- The `discovery.wikibase_rdf` table will be used for this for the aggregate and percent of triples
- [[ https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Wikidata_entity | wmf.wikidata_entity ]] can then be used for the aggregate and percentage values for items
=== Notes and Questions ===
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- Related task: {https://phabricator.wikimedia.org/T337021}
- Related task: {https://phabricator.wikimedia.org/T342111}
- Prior related task: {https://phabricator.wikimedia.org/T281854}
- Prior related analysis: [[ https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis | Wikidata_Scholarly_Articles_Subgraph_Analysis ]]
- See: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis.
- See: [[ https://docs.google.com/document/d/1QsV96LtpK5lDD2N2jy-6vaF_0d_Yf_HLb8uFARFMxJ8/edit | Preparations for WDQS graph-splittig ]]