The backfill of unique device data for T401666 changed the domains used to identify Wikidata, Wikifunctions, and MediaWiki.org in the Data Lake datasets (wmf_readership.unique_devices_per_*)
Previously, the canonical main and mobile domains were used (e.g. www.wikidata.org and m.wikidata.org). Now, non-canonical version of the main domains are used (e.g. wikidata.org, without the leading www.).
This adds new friction for folks querying these datasets:
- They must learn and then remember that unique devices data now uses these non-canonical domains
- If they wish to join the data with other datasets using the domain as the wiki identifier (which will be increasingly common as the data modelling guidelines recommend it as the primary wiki identifier), they must manually handle these cases which will not join correctly. The canonical wiki dataset could make this a little easier by adding a new unique_device_domain column, but even so it will be cumbersome.