The CX Corpora dumps we provide at https://dumps.wikimedia.org/other/contenttranslation/ are generated everytime from the entire CX published data.
This implies or assumes the cx_corpora table will have the data forever for all published translation. Due to T183890: Remove very old translation drafts from CX database we are planning to change that assumption and going to remove old published translation.
This means, CX Corpora dumps will be using a start time and end time to fetch the data. And a user of this dumps need to collect the dumps for all these time intervals. For example, en-es dump will be cx-corpora.en2es.201706.text.json.gz + cx-corpora.en2es.201707.text.json.gz + cx-corpora.en2es.201708.text.json.gz and so on if we generate it on monthly basis.
This also implies that the dumps once generated should not be deleted at all.