Page MenuHomePhabricator

Package dictionaries better for ORES models
Open, LowPublic

Description

Currently we use enchant. It is tied to the OS. This is a bug. We should be able to package the specific dictionaries for our deployments. Can we actually do that? How?

Event Timeline

Halfak created this task.Feb 28 2019, 5:00 PM
Restricted Application added a project: Analytics. · View Herald TranscriptFeb 28 2019, 5:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Yes. And the pyenchant python library.

So, the difficulty here is going to be that this isn't just dictionaries that feed into python deps, there is a decent bit of non-python code coming from enchant. We can look into how to separate the dictionaries from the executables such that we can guarantee what dictionaries are being used for a particular build, but it's not clear that feeding the same dictionaries into different versions of the underlying libraries will produce the same results. Of course for ML purposes if we can't guarantee the same results, the whole thing is a non starter.

My sense is that we can have a pretty good guarantee that the dictionaries are the dictionaries, but you're right that underlying enchant code might be relevant. After all if there is a bug that results in part of a dictionary not being appropriately applied, that could be disastrous (and hard to debug).

Harej triaged this task as Low priority.Mar 19 2019, 9:04 PM
Harej moved this task from Untriaged to Maintenance/cleanup on the Scoring-platform-team board.
Harej raised the priority of this task from Low to High.
Harej lowered the priority of this task from High to Low.