Page MenuHomePhabricator

Dibyaaaaax (Dibya)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Mar 10 2020, 3:22 PM (214 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Dibyaaaaax [ Global Accounts ]

Recent Activity

Aug 6 2020

Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.

I tested out some of the above-mentioned models using a testing dataset (with ~150k items) that is completely different from the data the model was trained on. It was observed that all those models performed more or less the same on the new dataset. Since none of these models had a significantly better performance compared to others, and all other models in drafttopic are trained on a balanced dataset using GBC, we have decided to stick to the same for Wikidata too.

Aug 6 2020, 5:16 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research

Aug 4 2020

Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.
ClassifierGradientBoosting
Parametersn_estimators=150 max_depth=5 max_features="log2" learning_rate=0.1
Number of Samples63961, imbalanced samples
Vocab Size50000
Embeddings dimension50
Aug 4 2020, 2:01 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research
Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.
ClassifierFasttext
Parametersloss=ova epoch=25 dim=50 lr=0.1 pretrainedVectors=word2vec/wikidata-20200501-learned_vectors.50_cell.50k.vec minCount=1000000
Number of Samples63961, imbalanced samples
Pretrained vectors- vocab size50000
Pretrained vectors- Embeddings dimension50
Aug 4 2020, 10:47 AM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research
Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.
ClassifierFasttext
Parametersloss=ova epoch=25 dim=50 lr=0.1 pretrainedVectors=word2vec/wikidata-20200501-learned_vectors.50_cell.vec minCount=1000000
Number of Samples63961, imbalanced samples
Pretrained vectors- vocab size10000
Pretrained vectors- Embeddings dimension50
Aug 4 2020, 6:18 AM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research

Jul 30 2020

Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.
ClassifierFasttext
Parametersloss=ova epoch=25 dim=50 lr=0.1 pretrainedVectors=word2vec/wikidata-20200501-learned_vectors.50_cell.50k.vec minCount=1000000
Number of Samples255785, balanced samples (atleast 4000 per label)
Pretrained vectors- vocab size50000
Pretrained vectors- Embeddings dimension50
Jul 30 2020, 9:01 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research
Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.
ClassifierFasttext
Parametersloss=ova epoch=25 dim=50 lr=0.1 pretrainedVectors=word2vec/wikidata-20200501-learned_vectors.50_cell.vec
Number of Samples63944, balanced samples
Pretrained vectors- vocab size10000
Pretrained vectors- Embeddings dimension50
Jul 30 2020, 5:11 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research

Jul 29 2020

Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.
ClassifierGradientBoosting
Parametersn_estimators=150 max_depth=5 max_features="log2" learning_rate=0.1
Number of Samples63944, balanced samples
Vocab Size50000
Embeddings dimension50
Jul 29 2020, 4:44 AM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research

Jul 28 2020

Dibyaaaaax added a comment to T254289: Add wikidata to articletopic pipeline.

Subsequent comments will have the performance reports for different Wikidata models. This is to get an idea about the changes in their performance while varying different factors like classifier (Fasttext vs GradientBoosting), vocab size, training data size, etc.

Jul 28 2020, 5:57 PM · drafttopic-modeling, Machine-Learning-Team (Active Tasks), Research
Dibyaaaaax closed T252775: Write Python util for converting Wikidata claims to features for ML models, a subtask of T245848: Productionize Wikidata-based Topic Model on ORES, as Resolved.
Jul 28 2020, 4:21 PM · Outreachy (Round 20), Outreach-Programs-Projects
Dibyaaaaax closed T252775: Write Python util for converting Wikidata claims to features for ML models as Resolved.
Jul 28 2020, 4:21 PM · Research, Machine-Learning-Team
Dibyaaaaax added a comment to T252775: Write Python util for converting Wikidata claims to features for ML models.

Sort order of Wikidata statements
We decided to order the Wikidata statements instead of randomizing them, for the reasons @Isaac mentioned above. For that, the order of properties is collected from here (SortedProperties) using the API. We then sort the statements based on their properties using the list of SortedProperties as reference.
There were some statements with properties that do not appear in the SortedProperties list. Those statements are simply sent to the end of the list, making them appear after all other statements that are on their correct position.

Jul 28 2020, 4:19 PM · Research, Machine-Learning-Team

Jul 13 2020

Dibyaaaaax added a comment to T252775: Write Python util for converting Wikidata claims to features for ML models.

There are some Wikidata items that represent Wikipedia pages that aren't articles. Eg. Q8207058 sitelinks to Portal:Earth Sciences in English Wikipedia (same case for other languages).

Jul 13 2020, 4:54 PM · Research, Machine-Learning-Team

Mar 12 2020

Dibyaaaaax added a comment to T246013: Outreachy Application Task: Simple example of topic classification.

Could someone help me with this error?

Screenshot from 2020-03-12 15-55-01.png (1×2 px, 442 KB)

Mar 12 2020, 10:32 AM · Outreachy (Round 20)