Page MenuHomePhabricator

Duplicate clustering with old kmeans strategy
Closed, ResolvedPublic

Description

See @aetilley's repo

Event Timeline

Halfak assigned this task to Ladsgroup.
Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Backlog on the Machine-Learning-Team (Active Tasks) board.
Halfak added subscribers: Halfak, aetilley.

The file data2.tsv has 19863 samples, your clusters sum to 802 samples. Let me look at the code you sent and get back to you.

Because we only test on reverted edits and the last column is reverted status (not a feature). I did this mistake initially too :)

I had understood that we were interesting in clustering edits generally. Thus I just dropped the last column. Aaron, which did you have in mind?

Responded in IRC. Do both! Cluster the entire set and also cluster just the damaging set and compare the difference.

Halfak set Security to None.