The new cirrus dumps available in are not properly formatted.
The redirect array looks like:
"redirect": [ [ 0, "Area code 256" ], [ 0, "Area code 938" ] ],
But should look like:
"redirect": [ { "namespace": 0, "title": "Area code 256" }, { "namespace": 0, "title": "Area code 938" } ],
This seems to affect other array of objects, for instance the coordinates array looks like this:
"coordinates": [ [ { "lon": 8.816666666666666, "lat": 51.78333333333333 }, null, 1000, "earth", null, true, null, null ] ],
From https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/cirrussearch/update_pipeline/update/current.yaml
The potential fields affected are:
- redirect
- coordinates
- lexeme_forms
AC:
- new cirrus dumps are properly formatted