We currently don't provide JSON dumps for Wikidata's Lexemes. We should provide them. They should be published in dumps separate from the regular Wikidata dumps containing Items and Properties to not make those even bigger.
- Format of the JSON dump file resembles the format of the JSON dump containing data of Items. It is a collection of individual lexeme data in the regular JSON format.
- JSON dumps containing the Lexemes in Wikidata are published regularly, following the same publication cycle as dumps containing Items (https://www.wikidata.org/wiki/Wikidata:Database_download#JSON_dumps_(recommended) )
- JSON dumps for the state before the resolution of this task are not expected to be retroactively generated
- Dumps are published at URLs like https://dumps.wikimedia.org/wikidatawiki/entities/YYYYMMDD/wikidata-YYYYMMDD-lexemes.json.bz2 and https://dumps.wikimedia.org/wikidatawiki/entities/YYYYMMDD/wikidata-YYYYMMDD-lexemes.json.gz
- Links to the latest versions are at: https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.json.gz and https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.json.bz2
Lexicographical data has been deployed for almost a year (May 2018) and is now a significant part of Wikidata. Despite of that, Wikidata JSON dumps include only a subset of the lexicographical data in Wikidata (only the identifiers of lexemes and senses used as value in main (Q) and Property (P) namespaces). At the moment, we only have inconsistent dumps, as L items are not included, even if they are linked by other items within the dumps.
Lexemes have been removed from Wikidata JSON dumps for an unknown reason (see T195419).
Is it possible to include them again?
One possible application would be to have an easy way to compute statistics about the usage of all Wikidata properties across all namespaces, without having to gather data from several dumps in various formats.