The table gdi.country_meta_data is the source of information regarding geographical metadata. In particular, it maps countries to the "wmf_region", which is often used in reporting.
However, research has worked with geographical models before that table was created, and uses a different set of regions defined here. The differences between these two datasets are:
as defined by research, without a match in the gdi.country_meta_data:
as defined by gdi.country_meta_data, but without a match in the research base regions
The purpose of this task is to track the consolidation / alignment of the base regions definition and the gdi country metadata.
Motivating use case: the calculation of intersections between e.g. the gende and geography gaps (T336766).
The geography gap (available on the country and wmf_region level) is using a geospatial model, which uses lat/lon coordinates from the P625 wikidata property to reverse geo code. However, the overlap between articles associated with lat/lon coordinates and articles about humans is almost zero since people are not generally associated with coordinates. Instead, there is "cultural" geography model which makes use of properties associated with countries, which currently is mapped to a named geographic entity using a this mapping file. However, the issue is that the base regions (which are mostly countries) currently can't be mapped to the "source of truth" for geographical data at the wmf (gdi.country_meta_data), and in particular to the "wmf_region" which are commonly used in the reports.