Page MenuHomePhabricator

Better reliability of domain matching
Closed, ResolvedPublic

Description

Currently, we check for domains by doing a simple string match against e.g. news.com. However, there is the issue that if a citation is referencing bestnews.com, it would still match with news.com despite being another domain.

We should improve our domain matching by ensuring we are checking against the full URL (this likely means appending to the beginning of and doing two checks for . and // [as in http(s)://] on each domain). This will likely also require separating out the string matching for non-domains (e.g. /opinions/).

Event Timeline

SuperHamster created this task.
SuperHamster moved this task from Backlog to In Progress on the Cite-Unseen board.
SuperHamster moved this task from In Progress to Done on the Cite-Unseen board.