Better reliability of domain matching
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	SuperHamster
	May 19 2019, 11:04 AM

Description

Currently, we check for domains by doing a simple string match against e.g. news.com. However, there is the issue that if a citation is referencing bestnews.com, it would still match with news.com despite being another domain.

We should improve our domain matching by ensuring we are checking against the full URL (this likely means appending to the beginning of and doing two checks for . and // [as in http(s)://] on each domain). This will likely also require separating out the string matching for non-domains (e.g. /opinions/).

Event Timeline

SuperHamster triaged this task as High priority.May 19 2019, 11:04 AM

SuperHamster created this task.

SuperHamster moved this task from Backlog to In Progress on the Cite-Unseen board.

SuperHamster added a project: Wikimedia-Hackathon-2019.

SuperHamster closed this task as Resolved.May 19 2019, 12:10 PM

SuperHamster moved this task from In Progress to Done on the Cite-Unseen board.

Better reliability of domain matchingClosed, ResolvedPublicActions

Description

Event Timeline

Better reliability of domain matching
Closed, ResolvedPublic
Actions