As a follow up of T190254: Remove irrelevant sections from source article for translation, we need to improve the way non-translatable meta content removal in cxserver. Currently a yaml file is used to blacklist templates, classes, rdfa identifiers. Regular expression support and case insensitve support was further added.
@Nikerabbit observed that
I suspect we also need to normalize spaces vs. underscores (or better yet use the canonicalized name from the href?) Bunch of such examples in https://quarry.wmflabs.org/query/27460
Also, the current YAML configuration, if extended for all language pairs, can be a big one. We need to find out smarter ways to do this.