- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Aug 20 2018
Jul 9 2018
ROC_AUC:
roc_auc (micro=0.943, macro=0.948): ------------------------------------------- ----- Geography.Maps 0.971 Geography.Europe 0.929 Culture.Media 0.951 STEM.Physics 0.975 Geography.Oceania 0.966 STEM.Meteorology 0.987 Culture.Internet culture 0.969 History_And_Society.Military and warfare 0.968 Culture.Performing arts 0.982 STEM.Engineering 0.954 Culture.Language and literature 0.949 STEM.Space 0.987 STEM.Geosciences 0.972 STEM.Technology 0.942 Geography.Landforms 0.987 STEM.Biology 0.956 Culture.Broadcasting 0.973 Culture.Sports 0.977 STEM.Chemistry 0.98 Assistance.Maintenance 0.838 Culture.Visual arts 0.969 Culture.Plastic arts 0.966 History_And_Society.Transportation 0.977 STEM.Mathematics 0.98 Culture.Entertainment 0.971 STEM.Medicine 0.974 STEM.Information science 0.969 STEM.Meteorology 0.987 Culture.Internet culture 0.969 History_And_Society.Military and warfare 0.968 Culture.Performing arts 0.982 STEM.Engineering 0.954 Culture.Language and literature 0.949 STEM.Space 0.987 STEM.Geosciences 0.972 STEM.Technology 0.942 Geography.Landforms 0.987 STEM.Biology 0.956 Culture.Broadcasting 0.973 Culture.Sports 0.977 STEM.Chemistry 0.98 Assistance.Maintenance 0.838 Culture.Visual arts 0.969 Culture.Plastic arts 0.966 History_And_Society.Transportation 0.977 STEM.Mathematics 0.98 Culture.Entertainment 0.971 STEM.Medicine 0.974 STEM.Information science 0.969 STEM.Time 0.973 History_And_Society.Education 0.969 History_And_Society.Politics and government 0.941 Culture.Food and drink 0.975 Assistance.Contents systems 0.95 History_And_Society.Business and economics 0.948 Assistance.Article improvement and grading 0.684 Geography.Countries 0.893 History_And_Society.History and society 0.868 Culture.Philosophy and religion 0.936 Assistance.Files 0.773 STEM.Science 0.935 Geography.Cities 0.969 Culture.Crafts and hobbies 0.965 Culture.Arts 0.985 Geography.Bodies of water 0.987
Jul 1 2018
Jun 11 2018
May 5 2018
counts (n=84480): [598/1636] label n TP FP FN TN --------------------------------------------- ----- --- ----- ---- ---- ----- 'STEM.Mathematics' 1454 --> 938 516 98 82928 'Assistance.Files' 350 --> 28 322 111 84019 'Culture.Food and drink' 2264 --> 1559 705 156 82060 'STEM.Biology' 3134 --> 1772 1362 266 81080 'History_And_Society.Business and economics' 6075 --> 2993 3082 834 77571 'Assistance.Contents systems' 1953 --> 686 1267 142 82385 'Culture.Language and literature' 19588 --> 14199 5389 2390 62502 'Culture.Media' 2039 --> 596 1443 261 82180 'Culture.Philosophy and religion' 3840 --> 1693 2147 451 80189 'STEM.Physics' 2376 --> 1259 1117 360 81744 'STEM.Chemistry' 2083 --> 1287 796 265 82132 'History_And_Society.Military and warfare' 3921 --> 2453 1468 392 80167 'Geography.Europe' 15349 --> 8930 6419 2580 66551 'History_And_Society.Education' 2633 --> 1603 1030 252 81595 'Geography.Landforms' 2148 --> 1710 438 139 82193 'Assistance.Article improvement and grading' 67 --> 16 51 3082 81331 'Culture.Plastic arts' 3717 --> 2116 1601 404 80359 'STEM.Space' 2117 --> 1731 386 102 82261 'Geography.Maps' 2421 --> 1370 1051 69 81990 'Culture.Performing arts' 4180 --> 3313 867 389 79911 'Geography.Cities' 791 --> 493 298 111 83578 'Culture.Broadcasting' 2807 --> 1586 1221 434 81239 'STEM.Engineering' 2133 --> 768 1365 267 82080 'Assistance.Maintenance' 5028 --> 1112 3916 244 79208 'History_And_Society.History and society' 7010 --> 1371 5639 520 76950 'STEM.Time' 2216 --> 1520 696 102 82162 'Culture.Sports' 4844 --> 3970 874 369 79267 'Culture.Crafts and hobbies' 1988 --> 1138 850 64 82428 'STEM.Information science' 2037 --> 1148 889 117 82326 'History_And_Society.Politics and government' 4047 --> 1572 2475 508 79925 'History_And_Society.Transportation' 3680 --> 2508 1172 341 80459 'Culture.Arts' 1999 --> 1488 511 101 82380 'Geography.Countries' 24068 --> 14352 9716 4136 56276 'Geography.Bodies of water' 2232 --> 1732 500 154 82094 'STEM.Meteorology' 1753 --> 1360 393 72 82655 'Geography.Oceania' 4025 --> 2479 1546 213 80242 'STEM.Medicine' 1951 --> 1116 835 266 82263 'Culture.Visual arts' 4563 --> 2594 1969 544 79373 'STEM.Science' 2133 --> 545 1588 160 82187 'Culture.Internet culture' 1839 --> 922 917 222 82419 'STEM.Technology' 3825 --> 1330 2495 597 80058 'Culture.Entertainment' 5529 --> 3597 1932 577 78374 'STEM.Geosciences' 1987 --> 1183 804 125 82368 'STEM.Medicine' 1951 --> 1116 835 266 82263 'Culture.Visual arts' 4563 --> 2594 1969 544 79373 'STEM.Science' 2133 --> 545 1588 160 82187 'Culture.Internet culture' 1839 --> 922 917 222 82419 'STEM.Technology' 3825 --> 1330 2495 597 80058 'Culture.Entertainment' 5529 --> 3597 1932 577 78374 'STEM.Geosciences' 1987 --> 1183 804 125 82368
pr_auc (micro=0.761, macro=0.724): [22/1636] ------------------------------------------- ----- Culture.Arts 0.911 Culture.Internet culture 0.685 Culture.Language and literature 0.871 Culture.Performing arts 0.912 History_And_Society.Transportation 0.858 Assistance.Files 0.042 STEM.Science 0.498 STEM.Medicine 0.743 Culture.Crafts and hobbies 0.813 History_And_Society.Military and warfare 0.812 STEM.Technology 0.56 STEM.Meteorology 0.919 Assistance.Maintenance 0.458 Culture.Philosophy and religion 0.633 STEM.Engineering 0.578 Culture.Entertainment 0.84 History_And_Society.Business and economics 0.7 Geography.Landforms 0.927 STEM.Biology 0.748 Assistance.Contents systems 0.611 Geography.Maps 0.835 STEM.Geosciences 0.8 History_And_Society.Education 0.777 Geography.Bodies of water 0.914 STEM.Mathematics 0.845 History_And_Society.Politics and government 0.615 Geography.Europe 0.763 STEM.Physics 0.717 Assistance.Article improvement and grading 0.004 STEM.Space 0.938 History_And_Society.History and society 0.486 Geography.Oceania 0.838 Geography.Countries 0.779 STEM.Time 0.86 STEM.Chemistry 0.779 Geography.Cities 0.73 Culture.Food and drink 0.856 Culture.Broadcasting 0.735 STEM.Information science 0.79 Culture.Sports 0.914 Culture.Media 0.497 Culture.Visual arts 0.776 Culture.Plastic arts 0.774 ------------------------------------------- -----
May 4 2018
Looks like an issue with [[0]] being returned on an empty string '' by wordvectors instead of the usual null vector of dimensions (300,)
May 3 2018
Apr 17 2018
Apr 2 2018
Mar 22 2018
Mar 21 2018
In T190288#4068904, @Halfak wrote:I wonder if you could figure out where the hangup is happening by adding "--debug" to the tune utility call.
Mar 20 2018
Yeah we'll need scipy >= 0.18.1 but i see for revscoring scipy is already set as - scipy >= 0.13.3, < 1.0.999
Mar 16 2018
Mar 15 2018
The recommended order for review should be - 18, 20, 19
Final resolution done by using a wrapper function - https://github.com/wiki-ai/revscoring/pull/394
Mar 13 2018
In T189364#4047619, @Halfak wrote:I made a demo of this problem to try to see if I could reproduce it in isolation. See https://github.com/halfak/demo_shared_memory
TL;DR: it didn't work. I get the exact same output for both strategies!
@Ragesoss there's ongoing work around topic modeling for English Wikipedia using WikiProject topics as bases. If Education Program Dashboard has some similar categorization of articles around pre-defined topics, a similar model can be built to predict topics as well as recommend them. Let me know if you wanna talk more about it.
In T188892#4023246, @Paarmita wrote:@Jayprakash12345 Could I take up this?
In T189364#4046883, @awight wrote:@Sumit please link to the code changes you're making that seem to improve memory sharing.
Refer to the gist in the first comment for the code changes that make it multiprocessing friendly.
Mar 10 2018
Test code for benchmarking using word2vec as an external module contained in english_vectors:
from multiprocessing import Pool, cpu_count import functools from revscoring.dependencies import solve from revscoring.datasources.meta import vectorizers from revscoring.features.meta import aggregators from revscoring.languages import english from revscoring.languages.english_vectors import google_news_kvs from revscoring.datasources import revision_oriented
Test code for benchmarking vectorizers with a global keyed_vector in the vectorizers file( https://gist.github.com/codez266/bde0d2384ef1cda0e105b8f59d25524a#file-vectors_only_once-py-L21 ):
Mar 8 2018
with wordvectors blockers now cleared, building drafttopic model on ores-stat-01
Feb 27 2018
In T187217#4001947, @awight wrote:Working on the Debian packaging here: https://phabricator.wikimedia.org/source/word2vec/
@Sumit Is the gensim package able to read the gzipped file, or should we decompress during installation?
Feb 13 2018
Feb 5 2018
Jan 29 2018
Jan 22 2018
A common use case of fetch_text is augmenting the dataset with X info from Y api. This will address:
- fetching edits - currently supported by revscoring
- fetching text - currently required by wikiclass, drafttopic and draftquality for getting article text
- fetch_item_info - currently required by wikiclass for fetching item info from Wikidata.
Jan 17 2018
The binary *was* on ores-misc-01 which is now nuked. I'll upload it to ores-staging-01 from my system again from where it can be put somewhere public.
Jan 16 2018
I've taken backup of the tuning reports, and the GradientBoosting and RandomForest models.
Dec 22 2017
Dec 20 2017
Dec 11 2017
Nov 29 2017
Nov 28 2017
We now have a dataset at figshare - https://doi.org/10.6084/m9.figshare.5640526.v1 \o/
In T179311#3793141, @Halfak wrote:@Sumit, please move to the "done" column before closing tasks. We need this in order to consistently report what has been "done".
In T179311#3723831, @Halfak wrote:Looks like we don't include the top level category names yet. @Sumit said he'd like to do that in a separate PR.