For almost every Wikimedia wiki, we want the number of pages without sitelink to Wikidata in the main namespace both as an absolute number and as a fraction of the total number of pages in the main namespace.
Example SQL code to get the number pagse without a sitelink in the main namespace on a given wiki:
SELECT COUNT(*) FROM `page_props` WHERE pp_propname = "unexpectedUnconnectedPage" AND pp_value = 0;
Acceptance criteria:
- An initial collection of the data has been made
- in the future, we may want to have the data to be continuously collected and displayed, for example on Grafana
Open Questions:
- Which wikis do we want to exclude?
- Wikidata does not make sense
- What about functionswiki?
- also test-wikis if they're easy to exclude
- commons seems to have more sitelinks in the categories namespace?
- https://en.wikipedia.org/wiki/17_Hippies links to the category on commons https://commons.wikimedia.org/wiki/Category:17_Hippies
- The gallery https://commons.wikimedia.org/wiki/17_Hippies is not connected to an Item
- how do wikisources work namespace and sitelink-wise?
- Are there wikis where namespaces other than the main namespace are interesting?
- How much more work would it be to collect the data for all namespaces for all wikis? (and drop the ones with 0 sitelinks)
- Wiktionaries don't usually use sitelinks in the classic sense and rather use Cognate, do we still want to include them here?