Page MenuHomePhabricator
Feed Advanced Search

Jun 30 2016

aetilley committed rTESTREVSCORINGAGAINdbac5d07c0a4: Added docopt.py module to revscoring/revscoring..
Added docopt.py module to revscoring/revscoring.
Jun 30 2016, 1:22 AM
aetilley committed rTESTREVSCORINGAGAIN5733347aa99c: Merge dbac5d07c0a4783a74d50c1fb3e2f071e5edf5e9 into….
Merge dbac5d07c0a4783a74d50c1fb3e2f071e5edf5e9 into…
Jun 30 2016, 1:22 AM

Jan 17 2016

aetilley moved T123759: Create Rule and Symbol objects in pcfg.py. Generalize types of rules that can be read into PCFG object. from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.
Jan 17 2016, 12:17 AM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T123759: Create Rule and Symbol objects in pcfg.py. Generalize types of rules that can be read into PCFG object..

Implemented.

Jan 17 2016, 12:16 AM · Machine-Learning-Team (Active Tasks)

Jan 15 2016

aetilley added a comment to T122728: Determine how to build WP phrase-structure tree-bank..

Redirecting into Project

Jan 15 2016, 6:10 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T123759: Create Rule and Symbol objects in pcfg.py. Generalize types of rules that can be read into PCFG object. from Parked to Backlog on the Machine-Learning-Team (Active Tasks) board.
Jan 15 2016, 6:04 PM · Machine-Learning-Team (Active Tasks)
aetilley created T123759: Create Rule and Symbol objects in pcfg.py. Generalize types of rules that can be read into PCFG object..
Jan 15 2016, 6:03 PM · Machine-Learning-Team (Active Tasks)

Jan 8 2016

aetilley moved T122728: Determine how to build WP phrase-structure tree-bank. from Review to Backlog on the Machine-Learning-Team (Active Tasks) board.
Jan 8 2016, 5:47 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T122728: Determine how to build WP phrase-structure tree-bank. from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Jan 8 2016, 5:47 PM · Machine-Learning-Team (Active Tasks)

Jan 1 2016

aetilley created T122728: Determine how to build WP phrase-structure tree-bank..
Jan 1 2016, 6:43 PM · Machine-Learning-Team (Active Tasks)

Dec 18 2015

aetilley moved T121258: Complete beta version of pcfg_scorer and approximate overhead from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.
Dec 18 2015, 5:39 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T121258: Complete beta version of pcfg_scorer and approximate overhead.

PCFG object beta complete

Dec 18 2015, 5:39 PM · Machine-Learning-Team (Active Tasks)

Dec 11 2015

aetilley moved T121258: Complete beta version of pcfg_scorer and approximate overhead from Parked to Backlog on the Machine-Learning-Team (Active Tasks) board.
Dec 11 2015, 7:11 PM · Machine-Learning-Team (Active Tasks)
aetilley created T121258: Complete beta version of pcfg_scorer and approximate overhead.
Dec 11 2015, 7:06 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies..

Looked at two more papers.

Dec 11 2015, 7:00 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies. from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.
Dec 11 2015, 6:39 PM · Machine-Learning-Team (Active Tasks)

Dec 3 2015

aetilley moved T118730: Flake8 of aetilley/sigclust from Review to Completed on the Machine-Learning-Team (Active Tasks) board.
Dec 3 2015, 6:49 PM · User-Ladsgroup, Machine-Learning-Team (Active Tasks)
aetilley added a comment to T118730: Flake8 of aetilley/sigclust.

Sorry, I just saw this. All done.

Dec 3 2015, 6:49 PM · User-Ladsgroup, Machine-Learning-Team (Active Tasks)

Nov 28 2015

aetilley added a comment to T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies..

Using large feature sets requires very large datasets to be effective, and the more subtle the content that you're trying to extract (e.g. "sneaky vandalism") the more difficult it is to extract this content from an editor's word choice.

Nov 28 2015, 11:46 PM · Machine-Learning-Team (Active Tasks)

Nov 20 2015

aetilley moved T118593: Spike -- methods for identifying overfitting/bias/whatever problems in prediction models. from Parked to Paused on the Machine-Learning-Team (Active Tasks) board.
Nov 20 2015, 6:23 PM · Spike, Machine-Learning-Team
aetilley moved T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies. from Paused to Backlog on the Machine-Learning-Team (Active Tasks) board.
Nov 20 2015, 6:23 PM · Machine-Learning-Team (Active Tasks)
aetilley renamed T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies. from [Spike] Experiment with using bag-of-words badwords features to [Spike] Experiment with using bag-of-words badwords features and general NLP strategies..
Nov 20 2015, 6:22 PM · Machine-Learning-Team (Active Tasks)

Nov 13 2015

aetilley added a comment to T118004: Compare R sigclust to python sigclust implementation.

Python and R sigclusts giving similar results on enwiki data. See R_read.R in /tests.

Nov 13 2015, 6:11 PM · Machine-Learning-Team (Active Tasks)

Nov 11 2015

aetilley added a comment to T116403: Testing python sigclust (relationship between full cluster & damaging clusters).

Introducing soft thresholding in python sigclust:

Nov 11 2015, 7:59 AM · Machine-Learning-Team (Active Tasks)

Nov 6 2015

aetilley added a comment to T118003: [Spike] Figure out why clustering is behaving weird. .

An important realization was that default pre-scaling of input data (mean centering and normalizing variance to 1) did away with the strange behavior or the simulated CIs being so much lower than the input data CI. The scaling has taken us from always getting a p-value of 1 for the main dataset to always getting a p-value of 0.

Nov 6 2015, 6:24 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T118004: Compare R sigclust to python sigclust implementation.

Python Sigclust and R sigclust gave similar results on enwiki_data.

Nov 6 2015, 6:23 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T116403: Testing python sigclust (relationship between full cluster & damaging clusters).
Nov 6 2015, 5:06 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T116403: Testing python sigclust (relationship between full cluster & damaging clusters).

An important realization this week was that default pre-scaling of input data (mean centering and normalizing variance to 1) did away with the strange behavior or the simulated CIs being so much lower than the input data CI. The scaling has taken us from always getting a p-value of 1 for the main dataset to always getting a p-value of 0. Thus, we begin clustering.

Nov 6 2015, 5:03 PM · Machine-Learning-Team (Active Tasks)

Nov 3 2015

aetilley added a comment to T117253: Duplicate clustering with old kmeans strategy.

I had understood that we were interesting in clustering edits generally. Thus I just dropped the last column. Aaron, which did you have in mind?

Nov 3 2015, 12:07 AM · User-Ladsgroup, Machine-Learning-Team (Active Tasks)

Nov 2 2015

aetilley added a comment to T117253: Duplicate clustering with old kmeans strategy.

The file data2.tsv has 19863 samples, your clusters sum to 802 samples. Let me look at the code you sent and get back to you.

Nov 2 2015, 10:36 PM · User-Ladsgroup, Machine-Learning-Team (Active Tasks)

Oct 30 2015

aetilley renamed T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies. from Experiment with using bag-of-words badwords features to [Spike] Experiment with using bag-of-words badwords features.
Oct 30 2015, 5:51 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies. from Paused to Backlog on the Machine-Learning-Team (Active Tasks) board.
Oct 30 2015, 5:51 PM · Machine-Learning-Team (Active Tasks)
aetilley claimed T102343: [Spike] Experiment with using bag-of-words badwords features and general NLP strategies..
Oct 30 2015, 5:51 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T116403: Testing python sigclust (relationship between full cluster & damaging clusters) from Completed to Backlog on the Machine-Learning-Team (Active Tasks) board.
Oct 30 2015, 5:26 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T116403: Testing python sigclust (relationship between full cluster & damaging clusters) from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.
Oct 30 2015, 5:26 PM · Machine-Learning-Team (Active Tasks)

Oct 28 2015

aetilley added a comment to T116403: Testing python sigclust (relationship between full cluster & damaging clusters).
  1. See recently added script R_read.R in sigclust/enwiki_data (see https://github.com/aetilley/sigclust). Call source("R_read.R") in R (inside the "enwiki_data" directory) to apply sigclust to the enwiki data (now titled "data2.tsv" ) as well as some other artificial data.
Oct 28 2015, 10:15 PM · Machine-Learning-Team (Active Tasks)

Oct 23 2015

aetilley moved T116403: Testing python sigclust (relationship between full cluster & damaging clusters) from Parked to Backlog on the Machine-Learning-Team (Active Tasks) board.
Oct 23 2015, 5:50 PM · Machine-Learning-Team (Active Tasks)
aetilley added a project to T116403: Testing python sigclust (relationship between full cluster & damaging clusters): Machine-Learning-Team (Active Tasks).
Oct 23 2015, 5:50 PM · Machine-Learning-Team (Active Tasks)
aetilley created T116403: Testing python sigclust (relationship between full cluster & damaging clusters).
Oct 23 2015, 5:46 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T113761: Draft implementation SigClust in python from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.
Oct 23 2015, 5:45 PM · Machine-Learning-Team (Active Tasks)

Oct 16 2015

aetilley added a comment to T113761: Draft implementation SigClust in python.

"Hard Thresholding" variant implemented.

Oct 16 2015, 4:55 PM · Machine-Learning-Team (Active Tasks)
aetilley set Security to default on T113761: Draft implementation SigClust in python.
Oct 16 2015, 4:54 PM · Machine-Learning-Team (Active Tasks)

Oct 9 2015

aetilley added a comment to T113761: Draft implementation SigClust in python.

Converting algorithm summary into psuedo-code.

Oct 9 2015, 5:23 PM · Machine-Learning-Team (Active Tasks)

Sep 25 2015

aetilley reopened T113057: Prepare summary of SigClust and other methods for choosing number of clusters. as "Open".
Sep 25 2015, 4:36 PM · Machine-Learning-Team (Active Tasks)
aetilley closed T113057: Prepare summary of SigClust and other methods for choosing number of clusters. as Resolved.
Sep 25 2015, 4:35 PM · Machine-Learning-Team (Active Tasks)

Sep 18 2015

aetilley created T113057: Prepare summary of SigClust and other methods for choosing number of clusters..
Sep 18 2015, 5:04 PM · Machine-Learning-Team (Active Tasks)

Sep 11 2015

aetilley renamed T112303: Review of papers by Tufekci and Sandvig et. al. from Review of papers by Tufekci and Saldvig et. al. to Review of papers by Tufekci and Sandvig et. al..
Sep 11 2015, 10:20 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T112303: Review of papers by Tufekci and Sandvig et. al. from Review to Completed on the Machine-Learning-Team (Active Tasks) board.
Sep 11 2015, 7:49 PM · Machine-Learning-Team (Active Tasks)
aetilley moved T112303: Review of papers by Tufekci and Sandvig et. al. from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Sep 11 2015, 7:49 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T112303: Review of papers by Tufekci and Sandvig et. al..

The Sandvig paper did make brief mention of feedback mechanisms which seem to be pertinent to our considerations.

Sep 11 2015, 7:39 PM · Machine-Learning-Team (Active Tasks)
aetilley added a comment to T112303: Review of papers by Tufekci and Sandvig et. al..

Tufekci's paper is mostly expository of other studies, but the studies that she mentions are truly fascinating. aetilley has never had a Facebook account, but was intrigued by the possibilities that Tufekci mentions.
Sandvig et. al. seem to be a diverse group of experts taking many pages to say something which is more or less obvious, but perhaps it bears repeating. There is a distinction between a function, an algorithm for computing a function, and a specific implementation of an algorithm. Racism, and bias in general can creep in at more than one level.
A mantra that kept coming to mind while reading these was "strive for open algorithms and open training sets." The principal barrier here is in determining the level of detail at which to describe an algorithm/dataset to a most likely non-technical user or in which to let said user specify their own personal algorithm.

Sep 11 2015, 7:12 PM · Machine-Learning-Team (Active Tasks)
aetilley created T112303: Review of papers by Tufekci and Sandvig et. al..
Sep 11 2015, 5:25 PM · Machine-Learning-Team (Active Tasks)

Sep 1 2015

aetilley updated the task description for T107599: Arthur's init for revscoring.
Sep 1 2015, 7:49 AM · Machine-Learning-Team (Active Tasks)
aetilley closed T107599: Arthur's init for revscoring as Resolved.
Sep 1 2015, 7:48 AM · Machine-Learning-Team (Active Tasks)