Page MenuHomePhabricator
Feed Advanced Search

Aug 20 2018

Sumit committed rODQa1ba5f4fd755: (WIP) Add feature for polarity using SentiWordnet.
(WIP) Add feature for polarity using SentiWordnet
Aug 20 2018, 6:20 PM
Sumit committed rODQb68cf738d930: Add json2tsv in requirements.txt.
Add json2tsv in requirements.txt
Aug 20 2018, 6:20 PM

Jul 9 2018

Sumit added a comment to T193834: Rebuild drafttopic with corrected data.

ROC_AUC:

roc_auc (micro=0.943, macro=0.948):                                                                                                                              
        -------------------------------------------  -----                                                                                                       
        Geography.Maps                               0.971                                                                                                       
        Geography.Europe                             0.929                                                                                                       
        Culture.Media                                0.951                                                                                                       
        STEM.Physics                                 0.975                                                                                                       
        Geography.Oceania                            0.966                                                                                                       
        STEM.Meteorology                             0.987                                                                                                       
        Culture.Internet culture                     0.969                                                                                                       
        History_And_Society.Military and warfare     0.968                                                                                                       
        Culture.Performing arts                      0.982                                                                                                       
        STEM.Engineering                             0.954
        Culture.Language and literature              0.949
        STEM.Space                                   0.987
        STEM.Geosciences                             0.972
        STEM.Technology                              0.942
        Geography.Landforms                          0.987
        STEM.Biology                                 0.956
        Culture.Broadcasting                         0.973
        Culture.Sports                               0.977
        STEM.Chemistry                               0.98 
        Assistance.Maintenance                       0.838
        Culture.Visual arts                          0.969
        Culture.Plastic arts                         0.966
        History_And_Society.Transportation           0.977
        STEM.Mathematics                             0.98 
        Culture.Entertainment                        0.971 
        STEM.Medicine                                0.974
        STEM.Information science                     0.969 
        STEM.Meteorology                             0.987
        Culture.Internet culture                     0.969                                                                                                       
        History_And_Society.Military and warfare     0.968                                                                                                       
        Culture.Performing arts                      0.982                                                                                                       
        STEM.Engineering                             0.954
        Culture.Language and literature              0.949
        STEM.Space                                   0.987
        STEM.Geosciences                             0.972
        STEM.Technology                              0.942
        Geography.Landforms                          0.987
        STEM.Biology                                 0.956
        Culture.Broadcasting                         0.973
        Culture.Sports                               0.977
        STEM.Chemistry                               0.98 
        Assistance.Maintenance                       0.838
        Culture.Visual arts                          0.969
        Culture.Plastic arts                         0.966
        History_And_Society.Transportation           0.977
        STEM.Mathematics                             0.98 
        Culture.Entertainment                        0.971
        STEM.Medicine                                0.974
        STEM.Information science                     0.969
        STEM.Time                                    0.973
        History_And_Society.Education                0.969
        History_And_Society.Politics and government  0.941
        Culture.Food and drink                       0.975
        Assistance.Contents systems                  0.95 
        History_And_Society.Business and economics   0.948
        Assistance.Article improvement and grading   0.684
        Geography.Countries                          0.893
        History_And_Society.History and society      0.868
        Culture.Philosophy and religion              0.936
        Assistance.Files                             0.773
        STEM.Science                                 0.935
        Geography.Cities                             0.969
        Culture.Crafts and hobbies                   0.965
        Culture.Arts                                 0.985
        Geography.Bodies of water                    0.987
Jul 9 2018, 4:27 AM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)

Jul 1 2018

Sumit renamed T193789: [Discuss] Storage of model training/testing datasets from aodaaaaaaa to [Discuss] Random sampling by PAWS vs API requests.
Jul 1 2018, 10:23 AM · Machine-Learning-Team (Active Tasks), ORES
Sumit lowered the priority of T193789: [Discuss] Storage of model training/testing datasets from High to Low.
Jul 1 2018, 10:23 AM · Machine-Learning-Team (Active Tasks), ORES

Jun 11 2018

Gerrit Code Review <gerrit@wikimedia.org> committed rWDQG4eb1ffcbad9f: Update patch set 1 (authored by Sumit).
Update patch set 1
Jun 11 2018, 5:04 AM

May 5 2018

Sumit added a comment to T193834: Rebuild drafttopic with corrected data.
counts (n=84480):                                                                                                                                      [598/1636]
                        label                                              n          TP    FP    FN     TN                                                              
                        ---------------------------------------------  -----  ---  -----  ----  ----  -----                                                              
                        'STEM.Mathematics'                              1454  -->    938   516    98  82928                                                              
                        'Assistance.Files'                               350  -->     28   322   111  84019                                                              
                        'Culture.Food and drink'                        2264  -->   1559   705   156  82060                                                              
                        'STEM.Biology'                                  3134  -->   1772  1362   266  81080                                                              
                        'History_And_Society.Business and economics'    6075  -->   2993  3082   834  77571                                                              
                        'Assistance.Contents systems'                   1953  -->    686  1267   142  82385                                                              
                        'Culture.Language and literature'              19588  -->  14199  5389  2390  62502                                                              
                        'Culture.Media'                                 2039  -->    596  1443   261  82180
                        'Culture.Philosophy and religion'               3840  -->   1693  2147   451  80189
                        'STEM.Physics'                                  2376  -->   1259  1117   360  81744
                        'STEM.Chemistry'                                2083  -->   1287   796   265  82132
                        'History_And_Society.Military and warfare'      3921  -->   2453  1468   392  80167
                        'Geography.Europe'                             15349  -->   8930  6419  2580  66551
                        'History_And_Society.Education'                 2633  -->   1603  1030   252  81595
                        'Geography.Landforms'                           2148  -->   1710   438   139  82193
                        'Assistance.Article improvement and grading'      67  -->     16    51  3082  81331
                        'Culture.Plastic arts'                          3717  -->   2116  1601   404  80359
                        'STEM.Space'                                    2117  -->   1731   386   102  82261
                        'Geography.Maps'                                2421  -->   1370  1051    69  81990
                        'Culture.Performing arts'                       4180  -->   3313   867   389  79911
                        'Geography.Cities'                               791  -->    493   298   111  83578
                        'Culture.Broadcasting'                          2807  -->   1586  1221   434  81239
                        'STEM.Engineering'                              2133  -->    768  1365   267  82080
                        'Assistance.Maintenance'                        5028  -->   1112  3916   244  79208
                        'History_And_Society.History and society'       7010  -->   1371  5639   520  76950
                        'STEM.Time'                                     2216  -->   1520   696   102  82162
                        'Culture.Sports'                                4844  -->   3970   874   369  79267
                        'Culture.Crafts and hobbies'                    1988  -->   1138   850    64  82428
                        'STEM.Information science'                      2037  -->   1148   889   117  82326
                        'History_And_Society.Politics and government'   4047  -->   1572  2475   508  79925
                        'History_And_Society.Transportation'            3680  -->   2508  1172   341  80459
                        'Culture.Arts'                                  1999  -->   1488   511   101  82380
                        'Geography.Countries'                          24068  -->  14352  9716  4136  56276
                        'Geography.Bodies of water'                     2232  -->   1732   500   154  82094
                        'STEM.Meteorology'                              1753  -->   1360   393    72  82655
                        'Geography.Oceania'                             4025  -->   2479  1546   213  80242
                        'STEM.Medicine'                                 1951  -->   1116   835   266  82263
                        'Culture.Visual arts'                           4563  -->   2594  1969   544  79373
                        'STEM.Science'                                  2133  -->    545  1588   160  82187
                        'Culture.Internet culture'                      1839  -->    922   917   222  82419
                        'STEM.Technology'                               3825  -->   1330  2495   597  80058
                        'Culture.Entertainment'                         5529  -->   3597  1932   577  78374
                        'STEM.Geosciences'                              1987  -->   1183   804   125  82368
                        'STEM.Medicine'                                 1951  -->   1116   835   266  82263
                        'Culture.Visual arts'                           4563  -->   2594  1969   544  79373
                        'STEM.Science'                                  2133  -->    545  1588   160  82187
                        'Culture.Internet culture'                      1839  -->    922   917   222  82419
                        'STEM.Technology'                               3825  -->   1330  2495   597  80058
                        'Culture.Entertainment'                         5529  -->   3597  1932   577  78374
                        'STEM.Geosciences'                              1987  -->   1183   804   125  82368
pr_auc (micro=0.761, macro=0.724):                                                                                                                      [22/1636]
        -------------------------------------------  -----                                                                                
        Culture.Arts                                 0.911                                                                                            
        Culture.Internet culture                     0.685                                                                                                       
        Culture.Language and literature              0.871                                                                                                       
        Culture.Performing arts                      0.912                                                                                                       
        History_And_Society.Transportation           0.858                                                                                                       
        Assistance.Files                             0.042                                                                                                       
        STEM.Science                                 0.498                                                                                                       
        STEM.Medicine                                0.743                                                                                                       
        Culture.Crafts and hobbies                   0.813                                         
        History_And_Society.Military and warfare     0.812                                                                                                       
        STEM.Technology                              0.56                                                                                                        
        STEM.Meteorology                             0.919                                                                                                       
        Assistance.Maintenance                       0.458                                                                                                       
        Culture.Philosophy and religion              0.633                                                                                                       
        STEM.Engineering                             0.578                                                                                                       
        Culture.Entertainment                        0.84                                                                                                        
        History_And_Society.Business and economics   0.7                                           
        Geography.Landforms                          0.927                                                                                              
        STEM.Biology                                 0.748                                                                                
        Assistance.Contents systems                  0.611                                                                                            
        Geography.Maps                               0.835                                                                                                       
        STEM.Geosciences                             0.8                                                                                                         
        History_And_Society.Education                0.777                                                                                                       
        Geography.Bodies of water                    0.914                                                                                                       
        STEM.Mathematics                             0.845                                                                                                       
        History_And_Society.Politics and government  0.615
        Geography.Europe                             0.763
        STEM.Physics                                 0.717
        Assistance.Article improvement and grading   0.004
        STEM.Space                                   0.938
        History_And_Society.History and society      0.486
        Geography.Oceania                            0.838
        Geography.Countries                          0.779
        STEM.Time                                    0.86
        STEM.Chemistry                               0.779
        Geography.Cities                             0.73
        Culture.Food and drink                       0.856
        Culture.Broadcasting                         0.735
        STEM.Information science                     0.79
        Culture.Sports                               0.914
        Culture.Media                                0.497
        Culture.Visual arts                          0.776
        Culture.Plastic arts                         0.774
        -------------------------------------------  -----
May 5 2018, 5:31 AM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)

May 4 2018

Sumit added a comment to T193834: Rebuild drafttopic with corrected data.

https://github.com/wiki-ai/revscoring/pull/398

May 4 2018, 8:27 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a comment to T193834: Rebuild drafttopic with corrected data.

Looks like an issue with [[0]] being returned on an empty string '' by wordvectors instead of the usual null vector of dimensions (300,)

May 4 2018, 8:17 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)

May 3 2018

Sumit created T193789: [Discuss] Storage of model training/testing datasets.
May 3 2018, 7:26 PM · Machine-Learning-Team (Active Tasks), ORES

Apr 17 2018

Sumit closed T183392: Drafttopic: Add utility to extract dependents as Declined.
Apr 17 2018, 5:21 PM · Machine-Learning-Team
Sumit closed T183355: Drafttopic: add article text fetching utility as Declined.
Apr 17 2018, 5:20 PM · Machine-Learning-Team

Apr 2 2018

Sumit moved T189797: Checklist for drafttopic repo from Review to Pending deployment on the Machine-Learning-Team (Active Tasks) board.
Apr 2 2018, 4:47 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit updated the task description for T189797: Checklist for drafttopic repo.
Apr 2 2018, 4:47 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit raised the priority of T191214: Edittypes repo setup from Lowest to Medium.
Apr 2 2018, 4:46 PM · Machine-Learning-Team, artificial-intelligence, edittypes-modeling
Sumit triaged T191214: Edittypes repo setup as Lowest priority.
Apr 2 2018, 4:46 PM · Machine-Learning-Team, artificial-intelligence, edittypes-modeling

Mar 22 2018

Sumit updated the task description for T190288: Investigate runtime of tune with high number of estimators.
Mar 22 2018, 3:55 PM · Machine-Learning-Team (Research), revscoring, artificial-intelligence, drafttopic-modeling

Mar 21 2018

Sumit renamed T190288: Investigate runtime of tune with high number of estimators from Drafttopic estimators take very less time to train but tune hangs up forever to Investigate runtime of tune with high number of estimators.
Mar 21 2018, 2:50 PM · Machine-Learning-Team (Research), revscoring, artificial-intelligence, drafttopic-modeling
Sumit added a comment to T190288: Investigate runtime of tune with high number of estimators.

I wonder if you could figure out where the hangup is happening by adding "--debug" to the tune utility call.

Mar 21 2018, 2:36 PM · Machine-Learning-Team (Research), revscoring, artificial-intelligence, drafttopic-modeling
Sumit updated the task description for T190288: Investigate runtime of tune with high number of estimators.
Mar 21 2018, 2:36 PM · Machine-Learning-Team (Research), revscoring, artificial-intelligence, drafttopic-modeling
Sumit renamed T190288: Investigate runtime of tune with high number of estimators from Drafttopic estimators take very less time but tune hangs up forever to Drafttopic estimators take very less time to train but tune hangs up forever.
Mar 21 2018, 2:33 PM · Machine-Learning-Team (Research), revscoring, artificial-intelligence, drafttopic-modeling
Sumit created T190288: Investigate runtime of tune with high number of estimators.
Mar 21 2018, 2:33 PM · Machine-Learning-Team (Research), revscoring, artificial-intelligence, drafttopic-modeling

Mar 20 2018

Sumit added a comment to T188447: Update ORES wheels for new revscoring requirements.

Yeah we'll need scipy >= 0.18.1 but i see for revscoring scipy is already set as - scipy >= 0.13.3, < 1.0.999

Mar 20 2018, 8:02 PM · Patch-For-Review, ORES, Machine-Learning-Team (Active Tasks)

Mar 16 2018

Sumit updated the task description for T189797: Checklist for drafttopic repo.
Mar 16 2018, 5:14 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)

Mar 15 2018

Sumit added a comment to T189797: Checklist for drafttopic repo.

The recommended order for review should be - 18, 20, 19

Mar 15 2018, 6:26 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit moved T189797: Checklist for drafttopic repo from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Mar 15 2018, 6:25 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit updated the task description for T189797: Checklist for drafttopic repo.
Mar 15 2018, 6:24 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit moved T189364: Investigate word2vec memory issues with multiprocessing from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.
Mar 15 2018, 6:02 PM · Machine-Learning-Team (Active Tasks)
Sumit added a comment to T189364: Investigate word2vec memory issues with multiprocessing.

Final resolution done by using a wrapper function - https://github.com/wiki-ai/revscoring/pull/394

Mar 15 2018, 6:02 PM · Machine-Learning-Team (Active Tasks)
Sumit updated the task description for T189797: Checklist for drafttopic repo.
Mar 15 2018, 5:51 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit updated the task description for T189797: Checklist for drafttopic repo.
Mar 15 2018, 5:15 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit updated the task description for T189797: Checklist for drafttopic repo.
Mar 15 2018, 5:08 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit created T189797: Checklist for drafttopic repo.
Mar 15 2018, 5:07 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)

Mar 13 2018

Sumit added a comment to T189364: Investigate word2vec memory issues with multiprocessing.

I made a demo of this problem to try to see if I could reproduce it in isolation. See https://github.com/halfak/demo_shared_memory

TL;DR: it didn't work. I get the exact same output for both strategies!

Mar 13 2018, 7:32 PM · Machine-Learning-Team (Active Tasks)
Sumit added a comment to T111416: [Education Dashboard] Build an Article Finder tool for program leaders and participants to find good topics to work on.

@Ragesoss there's ongoing work around topic modeling for English Wikipedia using WikiProject topics as bases. If Education Program Dashboard has some similar categorization of articles around pre-defined topics, a similar model can be built to predict topics as well as recommend them. Let me know if you wanna talk more about it.

Mar 13 2018, 6:03 PM · Outreach-Programs-Projects, Outreachy (Round-16), Google-Summer-of-Code (2018), Article-Recommendation, Education-Program-Dashboard
Sumit added a comment to T188892: Improve HTMLForm documentation to cover more classes.

@Jayprakash12345 Could I take up this?

Mar 13 2018, 5:26 PM · User-Jayprakash12345, MediaWiki-Documentation, Documentation
Sumit added a comment to T189364: Investigate word2vec memory issues with multiprocessing.

@Sumit please link to the code changes you're making that seem to improve memory sharing.

Refer to the gist in the first comment for the code changes that make it multiprocessing friendly.

Mar 13 2018, 3:53 PM · Machine-Learning-Team (Active Tasks)

Mar 10 2018

Sumit updated subscribers of T189364: Investigate word2vec memory issues with multiprocessing.
Mar 10 2018, 6:52 AM · Machine-Learning-Team (Active Tasks)
Sumit added a comment to T189364: Investigate word2vec memory issues with multiprocessing.

Test code for benchmarking using word2vec as an external module contained in english_vectors:

from multiprocessing import Pool, cpu_count
import functools
from revscoring.dependencies import solve
from revscoring.datasources.meta import vectorizers
from revscoring.features.meta import aggregators
from revscoring.languages import english
from revscoring.languages.english_vectors import google_news_kvs
from revscoring.datasources import revision_oriented
Mar 10 2018, 6:47 AM · Machine-Learning-Team (Active Tasks)
Sumit added a comment to T189364: Investigate word2vec memory issues with multiprocessing.

Test code for benchmarking vectorizers with a global keyed_vector in the vectorizers file( https://gist.github.com/codez266/bde0d2384ef1cda0e105b8f59d25524a#file-vectors_only_once-py-L21 ):

Mar 10 2018, 6:39 AM · Machine-Learning-Team (Active Tasks)
Sumit created T189364: Investigate word2vec memory issues with multiprocessing.
Mar 10 2018, 6:33 AM · Machine-Learning-Team (Active Tasks)

Mar 8 2018

Sumit added a comment to T188775: Re-train models with revscoring 2.2.0.

with wordvectors blockers now cleared, building drafttopic model on ores-stat-01

Mar 8 2018, 9:03 PM · articlequality-modeling, draftquality-modeling, artificial-intelligence, editquality-modeling, Machine-Learning-Team (Active Tasks)

Feb 27 2018

Sumit added a comment to T187217: [Epic] Support word2vec for production ORES models.

Working on the Debian packaging here: https://phabricator.wikimedia.org/source/word2vec/

@Sumit Is the gensim package able to read the gzipped file, or should we decompress during installation?

Feb 27 2018, 1:20 PM · Epic, Packaging, Machine-Learning-Team (Active Tasks)

Feb 13 2018

Sumit created T187217: [Epic] Support word2vec for production ORES models.
Feb 13 2018, 5:09 PM · Epic, Packaging, Machine-Learning-Team (Active Tasks)

Feb 5 2018

Sumit moved T185896: OneVsRest Classification for revscoring from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Feb 5 2018, 2:56 PM · artificial-intelligence, revscoring, Machine-Learning-Team (Active Tasks)

Jan 29 2018

Sumit edited projects for T181074: Refactor scripts fetching text and other metadata, added: Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.
Jan 29 2018, 3:07 PM · drafttopic-modeling, Machine-Learning-Team
Sumit moved T185896: OneVsRest Classification for revscoring from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Jan 29 2018, 2:47 PM · artificial-intelligence, revscoring, Machine-Learning-Team (Active Tasks)
Sumit added a comment to T185896: OneVsRest Classification for revscoring.

https://github.com/wiki-ai/revscoring/pull/389

Jan 29 2018, 2:46 PM · artificial-intelligence, revscoring, Machine-Learning-Team (Active Tasks)
Sumit created T185896: OneVsRest Classification for revscoring.
Jan 29 2018, 2:46 PM · artificial-intelligence, revscoring, Machine-Learning-Team (Active Tasks)

Jan 22 2018

Sumit added a comment to T181074: Refactor scripts fetching text and other metadata.

A common use case of fetch_text is augmenting the dataset with X info from Y api. This will address:

  • fetching edits - currently supported by revscoring
  • fetching text - currently required by wikiclass, drafttopic and draftquality for getting article text
  • fetch_item_info - currently required by wikiclass for fetching item info from Wikidata.
Jan 22 2018, 5:53 PM · drafttopic-modeling, Machine-Learning-Team
Sumit claimed T181074: Refactor scripts fetching text and other metadata.
Jan 22 2018, 5:46 PM · drafttopic-modeling, Machine-Learning-Team

Jan 17 2018

Sumit added a comment to T185147: Host Google-News-word2vec.bin publicly.

The binary *was* on ores-misc-01 which is now nuked. I'll upload it to ores-staging-01 from my system again from where it can be put somewhere public.

Jan 17 2018, 9:47 PM · Machine-Learning-Team (Active Tasks)
Sumit created T185147: Host Google-News-word2vec.bin publicly.
Jan 17 2018, 9:46 PM · Machine-Learning-Team (Active Tasks)

Jan 16 2018

Sumit added a comment to T184765: Back up ores-misc-01 to ores-staging-01.

I've taken backup of the tuning reports, and the GradientBoosting and RandomForest models.

Jan 16 2018, 4:49 PM · ORES, Machine-Learning-Team (Active Tasks)

Dec 22 2017

Sumit edited projects for T183392: Drafttopic: Add utility to extract dependents, added: Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).
Dec 22 2017, 6:24 PM · Machine-Learning-Team
Sumit edited projects for T183355: Drafttopic: add article text fetching utility, added: Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).
Dec 22 2017, 6:24 PM · Machine-Learning-Team
Sumit moved T183580: class weights support for multilabel classification from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.
Dec 22 2017, 6:23 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, revscoring, drafttopic-modeling
Sumit edited projects for T183580: class weights support for multilabel classification, added: Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.
Dec 22 2017, 6:23 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, revscoring, drafttopic-modeling
Sumit added projects to T183580: class weights support for multilabel classification: drafttopic-modeling, revscoring.

https://github.com/wiki-ai/revscoring/pull/385

Dec 22 2017, 6:22 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, revscoring, drafttopic-modeling
Sumit created T183580: class weights support for multilabel classification.
Dec 22 2017, 6:22 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, revscoring, drafttopic-modeling

Dec 20 2017

Sumit moved T183392: Drafttopic: Add utility to extract dependents from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Dec 20 2017, 5:55 PM · Machine-Learning-Team
Sumit added a comment to T183392: Drafttopic: Add utility to extract dependents.

https://github.com/wiki-ai/drafttopic/pull/15

Dec 20 2017, 5:55 PM · Machine-Learning-Team
Sumit created T183392: Drafttopic: Add utility to extract dependents.
Dec 20 2017, 5:55 PM · Machine-Learning-Team
Sumit moved T183355: Drafttopic: add article text fetching utility from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Dec 20 2017, 12:53 PM · Machine-Learning-Team
Sumit added a comment to T183355: Drafttopic: add article text fetching utility.

https://github.com/wiki-ai/drafttopic/pull/14

Dec 20 2017, 12:52 PM · Machine-Learning-Team
Sumit created T183355: Drafttopic: add article text fetching utility.
Dec 20 2017, 12:52 PM · Machine-Learning-Team

Dec 11 2017

Sumit added a comment to T181163: Revscoring tune does not recognize a set of labels as target.

https://github.com/wiki-ai/revscoring/pull/376

Dec 11 2017, 3:52 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling, research-ideas, artificial-intelligence
Sumit added a comment to T181166: Revscoring: Statistic for multilabel classification.

https://github.com/wiki-ai/revscoring/pull/376

Dec 11 2017, 3:51 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), research-ideas, artificial-intelligence

Nov 29 2017

Sumit committed rODQ9097be964f74: Take most common word sense for polarity score.
Take most common word sense for polarity score
Nov 29 2017, 11:22 PM
Sumit committed rODQ4434ab188ecf: ADD SentiWordnet requirement to README.
ADD SentiWordnet requirement to README
Nov 29 2017, 11:22 PM
Sumit committed rODQa7f323398241: Address review comments in https://github.com/wiki-ai/draftquality/pull/9.
Address review comments in https://github.com/wiki-ai/draftquality/pull/9
Nov 29 2017, 11:22 PM
Sumit committed rODQfd0a6e184361: (WIP) Add feature for polarity using SentiWordnet Adds a library….
(WIP) Add feature for polarity using SentiWordnet Adds a library…
Nov 29 2017, 11:22 PM
Sumit committed rODQ91cde1284bc4: Add json2tsv in requirements.txt.
Add json2tsv in requirements.txt
Nov 29 2017, 11:22 PM
Sumit committed rOEQ026c38534f69: Add label param for enwiki goodfaith in Makefile.
Add label param for enwiki goodfaith in Makefile
Nov 29 2017, 10:50 PM
Sumit committed rOEQ95a9a24c17fa: Take top 20000 labelled instances then shuffle.
Take top 20000 labelled instances then shuffle
Nov 29 2017, 10:50 PM
Sumit committed rOEQf885552f2091: Add sqwiki features and rules to fetch labeled revision to Makefile.
Add sqwiki features and rules to fetch labeled revision to Makefile
Nov 29 2017, 10:50 PM
Sumit committed rOEQ6619bdbf3d3c: Retain reverted autolabelled.
Retain reverted autolabelled
Nov 29 2017, 10:50 PM
Sumit committed rOEQ6ba178bf235c: Add models and tuning reports.
Add models and tuning reports
Nov 29 2017, 10:50 PM
Sumit committed rOEQd5f9c69677eb: Add rowiki damaging, goodfaith models to Makefile.
Add rowiki damaging, goodfaith models to Makefile
Nov 29 2017, 10:50 PM
Sumit committed rOEQee7839d5d36a: Fetch human labels.
Fetch human labels
Nov 29 2017, 10:50 PM

Nov 28 2017

Sumit moved T172321: Build mid-level WikiProject category training set from Parked to Review on the Machine-Learning-Team (Active Tasks) board.
Nov 28 2017, 5:58 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a comment to T172321: Build mid-level WikiProject category training set.

We now have a dataset at figshare - https://doi.org/10.6084/m9.figshare.5640526.v1 \o/

Nov 28 2017, 5:57 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a comment to T181522: Fix response processing logic in drafttopic.fetch_page_wikiprojects.

https://github.com/wiki-ai/drafttopic/pull/13

Nov 28 2017, 5:14 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a comment to T179311: Generate mid-level WikiProject categories.

@Sumit, please move to the "done" column before closing tasks. We need this in order to consistently report what has been "done".

Nov 28 2017, 4:56 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a project to T172321: Build mid-level WikiProject category training set: drafttopic-modeling.
Nov 28 2017, 4:27 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a project to T172325: Efficient method for mapping a WikiProject template to the WikiProject Directory: drafttopic-modeling.
Nov 28 2017, 4:26 PM · drafttopic-modeling, Machine-Learning-Team
Sumit added a project to T172326: Create machine-readable version of the WikiProject Directory: drafttopic-modeling.
Nov 28 2017, 4:26 PM · drafttopic-modeling, research-ideas, Machine-Learning-Team
Sumit edited projects for T175037: Publish Machine-Readable WikiProjects Dataset, added: drafttopic-modeling; removed Machine-Learning-Team (Active Tasks).
Nov 28 2017, 4:26 PM · drafttopic-modeling, research-ideas, Machine-Learning-Team
Sumit added a project to T179311: Generate mid-level WikiProject categories: drafttopic-modeling.
Nov 28 2017, 4:25 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit added a project to T181166: Revscoring: Statistic for multilabel classification: drafttopic-modeling.
Nov 28 2017, 4:25 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), research-ideas, artificial-intelligence
Sumit added a project to T181163: Revscoring tune does not recognize a set of labels as target: drafttopic-modeling.
Nov 28 2017, 4:25 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling, research-ideas, artificial-intelligence
Sumit added a project to T181522: Fix response processing logic in drafttopic.fetch_page_wikiprojects: drafttopic-modeling.
Nov 28 2017, 4:23 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit created T181522: Fix response processing logic in drafttopic.fetch_page_wikiprojects.
Nov 28 2017, 4:22 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit closed T179311: Generate mid-level WikiProject categories as Resolved.

Looks like we don't include the top level category names yet. @Sumit said he'd like to do that in a separate PR.

Nov 28 2017, 3:06 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)
Sumit closed T179311: Generate mid-level WikiProject categories, a subtask of T172321: Build mid-level WikiProject category training set, as Resolved.
Nov 28 2017, 3:06 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks)

Nov 22 2017

Sumit edited projects for T181166: Revscoring: Statistic for multilabel classification, added: Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.
Nov 22 2017, 4:23 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), research-ideas, artificial-intelligence
Sumit edited projects for T181163: Revscoring tune does not recognize a set of labels as target, added: Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.
Nov 22 2017, 4:23 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling, research-ideas, artificial-intelligence
Sumit edited parent tasks for T181163: Revscoring tune does not recognize a set of labels as target, added: T181166: Revscoring: Statistic for multilabel classification; removed: T123327: Train/test draft topic model (new article routing AI).
Nov 22 2017, 4:21 PM · Machine-Learning-Team (Active Tasks), drafttopic-modeling, research-ideas, artificial-intelligence
Sumit removed a subtask for T123327: Train/test draft topic model (new article routing AI): T181163: Revscoring tune does not recognize a set of labels as target.
Nov 22 2017, 4:20 PM · Machine-Learning-Team (Active Tasks), research-ideas, artificial-intelligence
Sumit added a subtask for T181166: Revscoring: Statistic for multilabel classification: T181163: Revscoring tune does not recognize a set of labels as target.
Nov 22 2017, 4:20 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), research-ideas, artificial-intelligence