I normally have been ingesting wikidata json dump files into mongodb using mongo import. This has worked for a year or so and then the last two weekly dumps have failed with this error:
2021-03-05T15:35:17.320-0800 Failed: error reading separator after document #11554732: bad JSON array format - found '{' outside JSON object/array in input source
2021-03-05T15:35:17.320-0800 11553900 document(s) imported successfully. 0 document(s) failed to import.
The command I run is:
bunzip2 -dc ./wiki_job/latest-all.json.bz2 | mongoimport --host 127.0.0.1:27017 --db wikiData --collection wiki --type json --drop --numInsertionWorkers 4 --jsonArray
The dumps affected are March 3 and February 24 (as of 3-05-2021).
Feb 24th dump: https://dumps.wikimedia.org/wikidatawiki/entities/20210222/wikidata-20210222-all.json.bz2
March 3rd dump: https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2
I am not sure what has changed in the dump file but I have tried various mongoimport parameters but all exhibit the issue. The weekly dumps before Feb 24th are fine.