cleanUpTitle() function of flickrripper can be simplified. It contains
title = title.strip() title = re.sub(r'[<{\[]', '(', title) title = re.sub(r'[>}\]]', ')', title) title = re.sub(r'[ _]?\(!\)', '', title) title = re.sub(',:[ _]', ', ', title) title = re.sub('[;:][ _]', ', ', title) title = re.sub(r'[\t\n ]+', ' ', title) title = re.sub(r'[\r\n ]+', ' ', title) title = re.sub('[\n]+', '', title) title = re.sub('[?!]([.\"]|$)', r'\1', title) title = re.sub('[&#%?!]', '^', title) title = re.sub('[;]', ',', title) title = re.sub(r'[/+\\:]', '-', title) title = re.sub('--+', '-', title) title = re.sub(',,+', ',', title) title = re.sub('[-,^]([.]|$)', r'\1', title) title = title.replace(' ', '_') title = title.strip('_')
obviously the regex --+ can be replaced with -+, ,,+ with ,+ etc. Any replacement to ' ' can be directly shorten to '_' which is the last resort of these statements. [;] is the same as just ;. [\t\n ]+ and [\r\n ]+ can be combined and [\n]+ is obsolet. [/+\\:] can also be combined with --+ and there are several other replacements which can be combined or simplified or removed.
Source code is available to download from Gerrit: https://gerrit.wikimedia.org/r/#/admin/projects/pywikibot/core (flickrripper.py file is in scripts folder)