Page MenuHomePhabricator

Improve non-translatable content skiplist mechanism in cxserver
Open, LowestPublic

Description

As a follow up of T190254: Remove irrelevant sections from source article for translation, we need to improve the way non-translatable meta content removal in cxserver. Currently a yaml file is used to blacklist templates, classes, rdfa identifiers. Regular expression support and case insensitve support was further added.

@Nikerabbit observed that

I suspect we also need to normalize spaces vs. underscores (or better yet use the canonicalized name from the href?) Bunch of such examples in https://quarry.wmflabs.org/query/27460

Also, the current YAML configuration, if extended for all language pairs, can be a big one. We need to find out smarter ways to do this.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 21 2018, 11:57 AM
Vvjjkkii renamed this task from Improve non-translatable content blacklisting mechanism in cxserver to 8iaaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 8iaaaaaaaa to Improve non-translatable content blacklisting mechanism in cxserver.Jul 2 2018, 7:50 AM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Pginer-WMF updated the task description. (Show Details)Jul 20 2018, 8:25 AM
Pginer-WMF triaged this task as Medium priority.Jul 20 2018, 8:47 AM
Pginer-WMF moved this task from Backlog to Page contents issues on the CX-cxserver board.
Nikerabbit lowered the priority of this task from Medium to Lowest.Mon, Jul 13, 7:37 AM

Triage meeting update: This is probably one of those tickets which we will not look at until there is need to improve the skiplist. If/when that time comes, we probably have forgotten this task already.

Nikerabbit renamed this task from Improve non-translatable content blacklisting mechanism in cxserver to Improve non-translatable content skiplist mechanism in cxserver.Mon, Jul 13, 7:38 AM