Sumit (Sumit)
User

Projects (8)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Dec 16 2014, 4:23 PM (144 w, 3 d)
Availability
Available
IRC Nick
codezee
LDAP User
Sumit
MediaWiki User
Sumit.iitp

Recent Activity

Wed, Sep 20

Qgil awarded T164525: Weekly reports of GSoC17/Outreachy14 projects (tracking) a Yellow Medal token.
Wed, Sep 20, 9:56 AM · Outreachy (Round-14), Google-Summer-of-Code (2017)

Wed, Sep 6

Sumit updated the task description for T172326: Create machine-readable version of the WikiProject Directory.
Wed, Sep 6, 8:05 PM · Research Ideas, Scoring-platform-team
Sumit added a comment to T172326: Create machine-readable version of the WikiProject Directory.

PR for tests - https://github.com/wiki-ai/drafttopic/pull/1

Wed, Sep 6, 8:05 PM · Research Ideas, Scoring-platform-team
Sumit removed a project from T172326: Create machine-readable version of the WikiProject Directory: Scoring-platform-team (Current).
Wed, Sep 6, 5:51 PM · Research Ideas, Scoring-platform-team
Sumit edited projects for T173107: New Page Patrol - Number of users, added: Scoring-platform-team; removed Scoring-platform-team (Current).
Wed, Sep 6, 5:21 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol
Sumit edited projects for T173210: New Pages patrol - Number of re-reviews, added: Scoring-platform-team; removed Scoring-platform-team (Current).
Wed, Sep 6, 5:21 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol

Tue, Sep 5

Halfak awarded T172326: Create machine-readable version of the WikiProject Directory a Like token.
Tue, Sep 5, 4:48 PM · Research Ideas, Scoring-platform-team
Sumit created T175037: Publish Machine-Readable WikiProjects Dataset.
Tue, Sep 5, 4:06 PM · Scoring-platform-team (Current), Research Ideas, Scoring-platform-team
Sumit added a comment to T172326: Create machine-readable version of the WikiProject Directory.
Tue, Sep 5, 3:15 PM · Research Ideas, Scoring-platform-team

Mon, Aug 28

Sumit added a project to T172326: Create machine-readable version of the WikiProject Directory: Scoring-platform-team (Current).
Mon, Aug 28, 3:42 PM · Research Ideas, Scoring-platform-team
Sumit merged T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews into T172719: Get Sumit access to deleted page data for quality modeling.
Mon, Aug 28, 3:39 PM · draftquality-modeling, articlequality-modeling, artificial-intelligence, Scoring-platform-team (Current)
Sumit merged task T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews into T172719: Get Sumit access to deleted page data for quality modeling.
Mon, Aug 28, 3:39 PM · WMF-NDA-Requests, Scoring-platform-team (Current)

Sat, Aug 26

Sumit claimed T172326: Create machine-readable version of the WikiProject Directory.
Sat, Aug 26, 6:02 AM · Research Ideas, Scoring-platform-team

Aug 21 2017

Sumit moved T172726: Project around page reviewing and drafts from Active to Review on the Scoring-platform-team (Current) board.
Aug 21 2017, 3:09 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)

Aug 20 2017

Sumit added a comment to T172726: Project around page reviewing and drafts.

https://meta.wikimedia.org/wiki/Research:Automatic_new_article_topics_suggestion

Aug 20 2017, 4:40 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)

Aug 18 2017

Sumit added a comment to T172326: Create machine-readable version of the WikiProject Directory.

might be useful if we sync the machine readable format from here, probably using a cron script - https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Directory/All ( Updated by the reports bot )

Aug 18 2017, 8:09 PM · Research Ideas, Scoring-platform-team
Sumit added a comment to T123327: Train/test draft topic model (new article routing AI).

Also from eranroz, a bot tagging new articles with wikiprojects or lists using a rule-based system - https://en.wikipedia.org/wiki/User:AlexNewArtBot

Aug 18 2017, 7:50 PM · Research Ideas, artificial-intelligence, Scoring-platform-team

Aug 12 2017

Sumit moved T173107: New Page Patrol - Number of users from Done to Review on the Scoring-platform-team (Current) board.
Aug 12 2017, 6:42 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol
Sumit moved T173210: New Pages patrol - Number of re-reviews from Done to Review on the Scoring-platform-team (Current) board.
Aug 12 2017, 6:42 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol
Sumit updated the task description for T173107: New Page Patrol - Number of users.
Aug 12 2017, 6:34 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol
Sumit updated the task description for T173210: New Pages patrol - Number of re-reviews.
Aug 12 2017, 6:34 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol
Sumit created T173210: New Pages patrol - Number of re-reviews.
Aug 12 2017, 6:31 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol

Aug 11 2017

Sumit created T173107: New Page Patrol - Number of users.
Aug 11 2017, 4:39 PM · Scoring-platform-team, English-Wikipedia-New-Pages-Patrol

Aug 7 2017

Sumit updated the task description for T172726: Project around page reviewing and drafts.
Aug 7 2017, 7:56 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)
Sumit updated the task description for T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews.
Aug 7 2017, 7:14 PM · WMF-NDA-Requests, Scoring-platform-team (Current)
Sumit updated the task description for T172726: Project around page reviewing and drafts.
Aug 7 2017, 7:11 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)
Sumit added a project to T172726: Project around page reviewing and drafts: draftquality-modeling.
Aug 7 2017, 7:10 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)
Sumit created T172726: Project around page reviewing and drafts.
Aug 7 2017, 7:10 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)
Sumit renamed T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews from Ops-Access-Requests - shell access related to Scoring Platform project to Data and Shell access related to Scoring Platform project on drafts and page reviews.
Aug 7 2017, 6:36 PM · WMF-NDA-Requests, Scoring-platform-team (Current)
Sumit added a subtask for T172719: Get Sumit access to deleted page data for quality modeling: T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews.
Aug 7 2017, 6:35 PM · draftquality-modeling, articlequality-modeling, artificial-intelligence, Scoring-platform-team (Current)
Sumit added a parent task for T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews: T172719: Get Sumit access to deleted page data for quality modeling.
Aug 7 2017, 6:35 PM · WMF-NDA-Requests, Scoring-platform-team (Current)
Sumit created T172720: Data and Shell access related to Scoring Platform project on drafts and page reviews.
Aug 7 2017, 6:35 PM · WMF-NDA-Requests, Scoring-platform-team (Current)
Sumit moved T170177: Test draftquality sentiment feature on Editquality from Active to Done on the Scoring-platform-team (Current) board.
Aug 7 2017, 3:45 AM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)
Sumit added a comment to T170177: Test draftquality sentiment feature on Editquality.

After taking Adam's suggestions into account and scoring the sentiment of edits, I still wasn't able to get a decent signal. The 1% rise in accuracy is due to something else, and I think sentiment on edits is not a very good feature. To validate my conclusion further, I took 1000 samples from enwiki, half damaging and half not damaging and here's the result:

Aug 7 2017, 3:45 AM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)

Aug 5 2017

Sumit merged T171486: Blog about ORES regex-pocalypse into T172200: Blog post about the ORES regex-pocalypse.
Aug 5 2017, 2:47 PM · Wikimedia-Incident, ORES, Scoring-platform-team (Current)
Sumit merged task T171486: Blog about ORES regex-pocalypse into T172200: Blog post about the ORES regex-pocalypse.
Aug 5 2017, 2:47 PM · ORES, Scoring-platform-team

Aug 4 2017

GitHub <noreply@github.com> committed rODQ96b6d904b932: Merge 13ebd377fdf9e1c16be9ead4f8497b89ff509674 into… (authored by Sumit).
Merge 13ebd377fdf9e1c16be9ead4f8497b89ff509674 into…
Aug 4 2017, 6:16 PM
Sumit committed rODQ13ebd377fdf9: Address review comments in https://github.com/wiki-ai/draftquality/pull/9 (authored by Sumit).
Address review comments in https://github.com/wiki-ai/draftquality/pull/9
Aug 4 2017, 6:16 PM

Jul 24 2017

Sumit added a comment to T170177: Test draftquality sentiment feature on Editquality.

Goodfaith model shows a slight fall in accuracy:

Jul 24 2017, 1:48 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)
Sumit added a comment to T170177: Test draftquality sentiment feature on Editquality.

Enwiki damaging gives slight rise in accuracy:

Jul 24 2017, 1:45 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)

Jul 20 2017

GitHub <noreply@github.com> committed rOEQ957951e01838: Merge 3c11d1a779feff020a6142b9545ce816de6d91f1 into… (authored by Sumit).
Merge 3c11d1a779feff020a6142b9545ce816de6d91f1 into…
Jul 20 2017, 3:45 PM
Sumit committed rOEQ3c11d1a779fe: Add label param for enwiki goodfaith in Makefile (authored by Sumit).
Add label param for enwiki goodfaith in Makefile
Jul 20 2017, 3:45 PM

Jul 17 2017

Sumit closed T170069: Add ORES technical documentation as Resolved.
Jul 17 2017, 3:20 PM · Documentation, Scoring-platform-team (Current)
Sumit closed T170069: Add ORES technical documentation, a subtask of T148974: [Epic] Clean up ORES service documentation, as Resolved.
Jul 17 2017, 3:20 PM · Scoring-platform-team (Current), Epic, Documentation, ORES
Sumit moved T170069: Add ORES technical documentation from Active to Epics on the Scoring-platform-team (Current) board.
Jul 17 2017, 3:20 PM · Documentation, Scoring-platform-team (Current)
Sumit added a comment to T170069: Add ORES technical documentation.

The above page is complete in documentation of technical details and is linked from https://www.mediawiki.org/wiki/ORES hence closing.

Jul 17 2017, 3:20 PM · Documentation, Scoring-platform-team (Current)

Jul 16 2017

Sumit closed T163009: Train/test damaging & goodfaith models for Albanian Wikipedia as Resolved.
Jul 16 2017, 5:10 AM · Scoring-platform-team (Current), artificial-intelligence, editquality-modeling
Sumit closed T163009: Train/test damaging & goodfaith models for Albanian Wikipedia, a subtask of T130213: [Epic] Edit quality models (damaging/goodfaith), as Resolved.
Jul 16 2017, 5:10 AM · artificial-intelligence, Epic, editquality-modeling, Scoring-platform-team (Current)
Sumit moved T163009: Train/test damaging & goodfaith models for Albanian Wikipedia from Active to Done on the Scoring-platform-team (Current) board.
Jul 16 2017, 5:10 AM · Scoring-platform-team (Current), artificial-intelligence, editquality-modeling

Jul 14 2017

GitHub <noreply@github.com> committed rOEQ0e41603a5bdd: Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into… (authored by Sumit).
Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into…
Jul 14 2017, 7:53 PM

Jul 13 2017

Sumit added a project to T170177: Test draftquality sentiment feature on Editquality: draftquality-modeling.
Jul 13 2017, 3:38 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)

Jul 12 2017

Sumit moved T170205: Add test to ensure timeout of functions taking too long from Review to Done on the Scoring-platform-team (Current) board.
Jul 12 2017, 5:18 PM · Scoring-platform-team (Current)

Jul 10 2017

GitHub <noreply@github.com> committed rORESc482c430b979: Merge 6f79568e9f2cf50df4bc73c35f29e2b658bc00fb into… (authored by Sumit).
Merge 6f79568e9f2cf50df4bc73c35f29e2b658bc00fb into…
Jul 10 2017, 9:23 PM
Sumit committed rORES6f79568e9f2c: Remove flake8 errors (authored by Sumit).
Remove flake8 errors
Jul 10 2017, 9:23 PM
GitHub <noreply@github.com> committed rORESbe70a99d44eb: Merge a7550e723751c2319a0c5d4a3b9ae78a01b5afbb into… (authored by Sumit).
Merge a7550e723751c2319a0c5d4a3b9ae78a01b5afbb into…
Jul 10 2017, 9:11 PM
Sumit committed rORESa7550e723751: Add Timeout test for a function taking a long time (authored by Sumit).
Add Timeout test for a function taking a long time
Jul 10 2017, 9:11 PM
Sumit added a subtask for T168965: Why don't timeouts work during long regular expression matching?: T170205: Add test to ensure timeout of functions taking too long.
Jul 10 2017, 9:11 PM · revscoring, ORES, artificial-intelligence, Scoring-platform-team (Current)
Sumit added a parent task for T170205: Add test to ensure timeout of functions taking too long: T168965: Why don't timeouts work during long regular expression matching?.
Jul 10 2017, 9:11 PM · Scoring-platform-team (Current)
Sumit added a comment to T170205: Add test to ensure timeout of functions taking too long.

https://github.com/wiki-ai/ores/pull/219

Jul 10 2017, 9:10 PM · Scoring-platform-team (Current)
Sumit created T170205: Add test to ensure timeout of functions taking too long.
Jul 10 2017, 9:10 PM · Scoring-platform-team (Current)
Sumit added a comment to T168369: Add language support for Albanian.

https://github.com/wiki-ai/revscoring/pull/335

Jul 10 2017, 7:18 PM · Scoring-platform-team (Current), Bad-Words-Detection-System, revscoring, artificial-intelligence
Sumit added a comment to T168369: Add language support for Albanian.

Hi, we're done, please see here https://www.mediawiki.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq

Let us know if you seen any problems.

Thanks!

Jul 10 2017, 7:18 PM · Scoring-platform-team (Current), Bad-Words-Detection-System, revscoring, artificial-intelligence
Sumit created T170177: Test draftquality sentiment feature on Editquality.
Jul 10 2017, 5:38 PM · artificial-intelligence, draftquality-modeling, Scoring-platform-team (Current)

Jul 9 2017

Sumit created T170069: Add ORES technical documentation.
Jul 9 2017, 6:48 AM · Documentation, Scoring-platform-team (Current)

Jul 8 2017

GitHub <noreply@github.com> committed rODQf9dfb7665ca5: Merge 2e24b1e39f51207ba9c930131460dceb31b5e592 into… (authored by Sumit).
Merge 2e24b1e39f51207ba9c930131460dceb31b5e592 into…
Jul 8 2017, 8:50 AM
Sumit committed rODQ3c47adb653e8: Take most common word sense for polarity score (authored by Sumit).
Take most common word sense for polarity score
Jul 8 2017, 8:50 AM
Sumit committed rODQd568fe6b85bb: (WIP) Add feature for polarity using SentiWordnet (authored by Sumit).
(WIP) Add feature for polarity using SentiWordnet
Jul 8 2017, 8:50 AM
Sumit committed rODQ2e24b1e39f51: ADD SentiWordnet requirement to README (authored by Sumit).
ADD SentiWordnet requirement to README
Jul 8 2017, 8:50 AM
Sumit added a comment to T167305: Experiment with Sentiment score feature for draftquality.

New PR - https://github.com/wiki-ai/draftquality/pull/9

Jul 8 2017, 8:34 AM · draftquality-modeling, artificial-intelligence, Scoring-platform-team (Current)

Jul 7 2017

GitHub <noreply@github.com> committed rOEQb0d94fc6c5ec: Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into… (authored by Sumit).
Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into…
Jul 7 2017, 7:35 PM
Sumit moved T156503: Build damaging/goodfaith models for Romanian Wikipedia from Review to Done on the Scoring-platform-team (Current) board.
Jul 7 2017, 7:27 PM · Scoring-platform-team (Current), artificial-intelligence, revscoring, editquality-modeling

Jul 3 2017

GitHub <noreply@github.com> committed rODQ3086fdda8d50: Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into… (authored by Sumit).
Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into…
Jul 3 2017, 6:04 PM
Sumit moved T167305: Experiment with Sentiment score feature for draftquality from Review to Active on the Scoring-platform-team (Current) board.
Jul 3 2017, 3:52 PM · draftquality-modeling, artificial-intelligence, Scoring-platform-team (Current)

Jul 1 2017

Sumit added a comment to T168369: Add language support for Albanian.

Left the following note on their talk pages:

Hi
Can you goto https://phabricator.wikimedia.org/T168369 and see if you can help in segregating a list of about 250 words in Albanian into badwords and informal words. We need these lists to help build damaging and goodfaith models for Albanian Wikipedia. A good way to do that would be to edit the https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq and simply copy the generated list to badwords and informal words and remove the words that do not fall in the respective category. Your help is much appreciated! Let me know or leave a comment on the task itself in case of any issue.-Thanks!

Jul 1 2017, 2:05 PM · Scoring-platform-team (Current), Bad-Words-Detection-System, revscoring, artificial-intelligence
Sumit added a comment to T168369: Add language support for Albanian.

Hi @Margott @Liridon @Arianit Can you please goto https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq and segregate the generated list words into badwords and informal words. Refer to task description for badwords and informal words definition.

Jul 1 2017, 1:52 PM · Scoring-platform-team (Current), Bad-Words-Detection-System, revscoring, artificial-intelligence

Jun 29 2017

GitHub <noreply@github.com> committed rODQ94090c738a39: Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into… (authored by Sumit).
Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into…
Jun 29 2017, 12:26 AM

Jun 28 2017

GitHub <noreply@github.com> committed rODQe1414c7f1d22: Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into… (authored by Sumit).
Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into…
Jun 28 2017, 5:53 PM

Jun 27 2017

Sumit moved T156503: Build damaging/goodfaith models for Romanian Wikipedia from Active to Review on the Scoring-platform-team (Current) board.
Jun 27 2017, 7:01 PM · Scoring-platform-team (Current), artificial-intelligence, revscoring, editquality-modeling
GitHub <noreply@github.com> committed rOEQfb90656e903c: Merge 1cb2cf67d841a242358c5fc4ddc948b86e21f960 into… (authored by Sumit).
Merge 1cb2cf67d841a242358c5fc4ddc948b86e21f960 into…
Jun 27 2017, 6:42 PM
Sumit committed rOEQ1cb2cf67d841: Add models and tuning reports (authored by Sumit).
Add models and tuning reports
Jun 27 2017, 6:42 PM
Sumit committed rOEQf8fd6bc9b6b7: Add rowiki damaging, goodfaith models to Makefile (authored by Sumit).
Add rowiki damaging, goodfaith models to Makefile
Jun 27 2017, 6:42 PM
Sumit committed rOEQ8b048f106562: Retain reverted autolabelled (authored by Sumit).
Retain reverted autolabelled
Jun 27 2017, 6:42 PM
Sumit committed rOEQ94cfa18df50c: Fetch human labels (authored by Sumit).
Fetch human labels
Jun 27 2017, 6:42 PM
Sumit added a comment to T156503: Build damaging/goodfaith models for Romanian Wikipedia.

https://github.com/wiki-ai/editquality/pull/78

Jun 27 2017, 6:39 PM · Scoring-platform-team (Current), artificial-intelligence, revscoring, editquality-modeling
Sumit added a comment to T156503: Build damaging/goodfaith models for Romanian Wikipedia.

need to retrain the models after the regex update, PR soon.

Jun 27 2017, 2:54 PM · Scoring-platform-team (Current), artificial-intelligence, revscoring, editquality-modeling
Sumit added a comment to T156503: Build damaging/goodfaith models for Romanian Wikipedia.
make models/rowiki.goodfaith.gradient_boosting.model                                     [97/1922]
cat datasets/rowiki.labeled_revisions.w_cache.20k_2016.json | \
        revscoring cv_train \
                revscoring.scorer_models.GradientBoosting \
                editquality.feature_lists.rowiki.goodfaith \
                goodfaith \
                --version=0.3.0 \
                -p 'max_depth=3' \
                -p 'learning_rate=0.1' \
                -p 'max_features="log2"' \
                -p 'n_estimators=300' \
                -s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.9)' -s 'filt
er_rate_at_recall(min_recall=0.75)' -s 'recall_at_precision(min_precision=0.995)' -s 'recall_at_precision(min_precision=0.99)' -s 'recall_at_precision(min_precision=0.98
)' -s 'recall_at_precision(min_precision=0.90)' -s 'recall_at_precision(min_precision=0.75)' -s 'recall_at_precision(min_precision=0.60)' -s 'recall_at_precision(min_pre
cision=0.45)' -s 'recall_at_precision(min_precision=0.15)' \
                --balance-sample-weight \
                --center --scale > models/rowiki.goodfaith.gradient_boosting.model
2017-06-27 13:11:03,053 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2017-06-27 13:11:03,907 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2017-06-27 13:13:54,482 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2017-06-27 13:17:12,485 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2017-06-27 13:19:46,401 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2017-06-27 13:22:17,370 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2017-06-27 13:25:08,119 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2017-06-27 13:27:31,615 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2017-06-27 13:29:51,620 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2017-06-27 13:32:09,126 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2017-06-27 13:34:20,776 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2017-06-27 13:36:25,349 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: max_features="log2", min_samples_leaf=1, min_weight_fraction_leaf=0.0, warm_start=false, balanced_sample=false, balanced_sample_weight=true, center=true, loss
="deviance", min_samples_split=2, max_leaf_nodes=null, verbose=0, max_depth=3, random_state=null, n_estimators=300, learning_rate=0.1, scale=true, subsample=1.0, init=nu
ll, presort="auto"
 - version: 0.3.0
 - trained: 2017-06-27T13:36:32.777290
Jun 27 2017, 1:42 PM · Scoring-platform-team (Current), artificial-intelligence, revscoring, editquality-modeling
Sumit added a comment to T156503: Build damaging/goodfaith models for Romanian Wikipedia.
make models/rowiki.damaging.gradient_boosting.model       
cat datasets/rowiki.labeled_revisions.w_cache.20k_2016.json | \
        revscoring cv_train \
                revscoring.scorer_models.GradientBoosting \
                editquality.feature_lists.rowiki.damaging \
                damaging \
                --version=0.3.0 \
                -p 'max_depth=5' \
                -p 'learning_rate=0.01' \
                -p 'max_features="log2"' \
                -p 'n_estimators=700' \
                -s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.9)' -s 'filt
er_rate_at_recall(min_recall=0.75)' -s 'recall_at_precision(min_precision=0.995)' -s 'recall_at_precision(min_precision=0.99)' -s 'recall_at_precision(min_precision=0.98
)' -s 'recall_at_precision(min_precision=0.90)' -s 'recall_at_precision(min_precision=0.75)' -s 'recall_at_precision(min_precision=0.60)' -s 'recall_at_precision(min_pre
cision=0.45)' -s 'recall_at_precision(min_precision=0.15)' \
                --balance-sample-weight \
                --center --scale > models/rowiki.damaging.gradient_boosting.model
2017-06-27 08:00:43,699 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2017-06-27 08:00:44,352 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2017-06-27 08:03:13,756 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2017-06-27 08:05:56,730 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2017-06-27 08:08:40,903 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2017-06-27 08:11:27,209 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2017-06-27 08:14:17,733 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2017-06-27 08:17:35,238 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2017-06-27 08:20:17,584 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2017-06-27 08:23:33,992 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2017-06-27 08:26:46,826 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2017-06-27 08:29:07,741 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: max_leaf_nodes=null, learning_rate=0.01, min_samples_split=2, verbose=0, center=true, warm_start=false, n_estimators=700, presort="auto", balanced_sample_weig
ht=true, loss="deviance", min_samples_leaf=1, balanced_sample=false, init=null, random_state=null, subsample=1.0, max_features="log2", scale=true, min_weight_fraction_le
af=0.0, max_depth=5
 - version: 0.3.0
 - trained: 2017-06-27T08:29:32.376824
Jun 27 2017, 8:48 AM · Scoring-platform-team (Current), artificial-intelligence, revscoring, editquality-modeling
Sumit added a comment to T165668: Weekly Reports for Outreachy Round-14 project: Allow Programs & Events Dashboard to make automatic edits on connected wikis.

Hi @Medhabansal a gentle reminder to keep your weekly reports updated!

Jun 27 2017, 6:25 AM · Education-Program-Dashboard, Outreachy (Round-14)
Sumit added a comment to T164645: Weekly report of GSoC 2017 Project : Adding Custom features while upgrading and updating Quiz extension .

Hi @Harjotsingh please keep your weekly reports updated.

Jun 27 2017, 6:24 AM · MediaWiki-extensions-Quiz
Sumit added a comment to T164627: Weekly report for Automatic editing suggestions and feedbacks for articles in Wiki Ed Dashboard.

Hi @Keer25 , please keep your weekly reports updated.

Jun 27 2017, 6:24 AM · Education-Program-Dashboard
Sumit added a comment to T164623: Weekly Reports : Add a "hierarchy" type to the Cargo extension [GSoC-2017].

Hi, a gentle reminder to update your weekly report.

Jun 27 2017, 6:23 AM · MediaWiki-extensions-Cargo
Sumit added a comment to T164612: Weekly reports of Wiki Ed Foundation Project-"To provide enhanced usability for Wikimedia Programs & Events Dashboard".

Hi, a gentle reminder to keep weekly reports updated.

Jun 27 2017, 6:23 AM · Education-Program-Dashboard
Sumit added a comment to T164531: Weekly reports for Implement Thanks support in Pywikibot.

Hi, a gentle reminder to update weekly report.

Jun 27 2017, 6:22 AM · Pywikibot-core, Pywikibot-Thanks

Jun 26 2017

GitHub <noreply@github.com> committed rOEQ72119239607a: Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into… (authored by Sumit).
Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into…
Jun 26 2017, 10:21 PM
GitHub <noreply@github.com> committed rODQ134c5c32b528: Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into… (authored by Sumit).
Merge 5f8b47e72814e1deb54710c124b1e4c913dc1b46 into…
Jun 26 2017, 7:31 PM
GitHub <noreply@github.com> committed rOEQ88439263938f: Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into… (authored by Sumit).
Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into…
Jun 26 2017, 7:24 PM
Sumit added a comment to T167305: Experiment with Sentiment score feature for draftquality.

So I could setup a test with the library - https://github.com/kevincobain2000/sentiment_classifier/ that generates raw polarity scores for each document as an aggregate of positive and negative terms in the document. I used the https://github.com/wiki-ai/draftquality/blob/master/datasets/enwiki.draft_quality.75_not_OK_sample.censored.tsv and made the following observations:

Apologies if you've already covered this, but it might be helpful to also do a sentiment analysis of non-damaging edits, to determine our baseline?

Jun 26 2017, 6:13 PM · draftquality-modeling, artificial-intelligence, Scoring-platform-team (Current)

Jun 22 2017

GitHub <noreply@github.com> committed rOEQ74afc8d07e3e: Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into… (authored by Sumit).
Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into…
Jun 22 2017, 10:15 AM
GitHub <noreply@github.com> committed rOEQfaa88074c14d: Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into… (authored by Sumit).
Merge 0247b751f0e47ba3bfb62ed65999783fd4bb2f86 into…
Jun 22 2017, 6:24 AM
Sumit committed rOEQ0247b751f0e4: Take top 20000 labelled instances then shuffle (authored by Sumit).
Take top 20000 labelled instances then shuffle
Jun 22 2017, 6:24 AM