Page MenuHomePhabricator

Mystery of the missing de2en file
Closed, ResolvedPublic

Description

As per T180264#3807842 there was a dump run that was missing file cx-corpora.de2en.html.json.gz.

We should monitor dump runs to see if the file will not go missing in them, or maybe of some other file goes missing instead. If we see missing files, we should set a trap to catch the file stealing bunny.

Event Timeline

Nikerabbit claimed this task.

The file is not missing in the latest dump. Looking at the timestamps it is clear that the html format is created first. Thus my theory is that de2en just crossed a threshold between the runs of different formats.

cx-corpora.de2en.text.json.gz                      01-Dec-2017 11:46             1134980
cx-corpora.de2en.text.tmx.gz                       01-Dec-2017 16:49             1096219
cx-corpora.de2fr.html.json.gz                      01-Dec-2017 09:56             4882249
cx-corpora.de2fr.text.json.gz                      01-Dec-2017 12:47             2332741
cx-corpora.de2fr.text.tmx.gz                       01-Dec-2017 17:53             2283268