- Develop a set of standardized, efficient Pyspark/R procedures for processing from the WD Dump copy (hdfs) in the WMD Data Lake.
Description
Description
Related Objects
Related Objects
- Mentioned Here
- T221965: Wikidata Languages Landscape
P21 bla.yaml
P27 Parsoid log message as seen by logstash
P39 Heatmap
P50 wmf config inherit settings
P101 grr puppet
P103 (An Untitled Masterwork)
P106 (An Untitled Masterwork)
P172 Varnish hits performance per Age
P1344 Masterwork From Distant Lands
P1412 Masterwork From Distant Lands
Event Timeline
Comment Actions
- These operations are meant to replace all R orchestrated, massive, and time consuming Wikidata API/WDQS SPARQL calls from WDCM and related dashboard back-ends;
- The following datasets were produced until now:
- WD labels for the top 15 languages per number of speakers (essential for the WDCM system);
- Q5 (Human): all items + the essential properties for our WD statistical systems:
- Prop 21
- Prop 106
- Prop 170
- Prop 50
- Prop 101
- Prop 27
- Prop 39
- Prop 103
- Prop 1412
- Prop 172
- Prop 463
- Prop 1344
Next steps:
- essential properties/classes for languages (for the Wikidata Languages Landscape T221965)
- essential properties/classes for taxa (WDCM)
- essential properties/classes for geographical objects (WDCM)
- essential properties/classes for organizations (WDCM)
Comment Actions
- essential properties/classes for organizations (WDCM) - DONE.
- essential properties/classes for geographical objects (WDCM) - DONE.
Comment Actions
- essential properties/classes for languages - for the Wikidata Languages Landscape #T221965 --> this will be transferred as a sub-task to #T221965.
Resolved.