Currently, [[ https://phabricator.wikimedia.org/diffusion/OSWB/browse/master/wmfbackups/cli/recover_dump.py | recover-dump ]] has a regex for validating the filename of the backup dump.
This is the code for the regex which is in the file (in the link above)
```
# FIXME: backups will stop working on Jan 1st 2100
DUMPNAME_REGEX = r'dump\.([a-z0-9\-]+)\.(20\d\d-[01]\d-[0123]\d\--\d\d-\d\d-\d\d)(\.tar\.gz)?'
```
It should accept a filename starting with dump, followed by a timestamp of the format YYYY-MM-DD--HH-MM-SS followed by the file format .tar.gz
See this [[ https://regexr.com/5oseb | example here ]] to help analyse the regex above
In addition to the fixme there are other issues with this regex.
Currently as it stands, the regex accepts valid inputs below
```
dump.s3.2022-11-12--19-05-35.tar.gz
dump.s3.2021-03-19--12-05-32.tar.gz
dump.s3.2011-09-05--00-35-26.tar.gz
dump.s3.2021-03-19--12-05-32.tar.gz
```
It also accepts the following below as valid. It also accepts the following below as valid. <del>This could be to compensate for backup files with missing tar.gz extension (needs clarification - is it a bug or a feature?) </del>
All backups and recoveries can work with a directory or a tarball of such directory, based on the option set.
```
dump.s3.2022-11-12--19-05-35
dump.s3.2021-03-19--12-05-32
```
Fixing the FIXME would need the regex to accept the below (currently rejects it)
```
dump.s3.2105-03-19--12-05-32.tar.gz
```
The following below is definitely supposed to be invalid (invalid timestamp), however the regex accepts them.
```
dump.s3.2021-19-39--99-05-32.tar.gz
dump.s3.2021-19-19--12-99-32.tar.gz
dump.s3.2021-19-19--12-99-99.tar.gz
dump.s3.2021-19-19--12-99-99
```
Tasks
* Fix existing regex so that it can accept input with a valid timestamp and also reject input with invalid timestamp. (in addition to the fixme for Jan 1st 2100 issue)
* If regex check fails, consider adding a error message to print something like "Invalid input - regex check failed"
Tests
* The regex is not in a function, that might make it difficult for unit testing. Figure out a way to make it unit-testable. (without breaking existing functionality - check usages of the DUMPNAME_REGEX global variable in code)
* Ideal to write unit tests to check that the regex works correctly with varying test inputs.
NOTE: Please remember to keep the scope small- backups can get quite complicated :-)