- Download enwiktionary HTML dump (April 1st, 2022)
- Untar file
Stale data:
- Extract data for the page "apreciable":
$ jq -r 'select(.name == "apreciable")' enwiktionary_*ndjson | head { "name": "apreciable", "identifier": 2713698, "date_modified": "2021-03-19T05:53:16Z", "version": { "identifier": 62182446, "comment": "convert {{es-adj-old}} to new {{es-adj}} format",
What happens?:
The data returned is from March 2021. ("date_modified": "2021-03-19T05:53:16Z")
What should have happened instead?:
The data returned is from March 2022. (last edit 2022-03-09, diff)
Missing page:
- Extract data for the page "paniaguarse":
$ jq -r 'select(.name == "paniaguarse")' enwiktionary_*ndjson $
What happens?:
No output.
What should have happened instead?:
Data is returned for the page paniaguarse (created 2018-07-11)
There seem to be missing or outdated pages in all the recent (enwikt) HTML dumps I've tried. If it's useful, I can try to compile a list by diffing with the XML dump.
To do
- additional tickets to be created