Page MenuHomePhabricator

Experimental API for unsupervised topic modeling based on reading sessions
Closed, ResolvedPublic

Description

Build a user interface to explore list of related articles from reading sessions. Details:

  • improve pipeline to generate article embeddings from reading sessions (generate reading sessions; train, evaluate, and fine-tune model)
  • host model on cloud-vps to host larger models with more extensive coverage (in terms of number of articles)
  • add different parameters to fine-tune related articles (e.g. filter by keywords)

Event Timeline

Update week 2020-07-27:

Update week 2020-08-03:

  • build pipeline to tune hyperparameters (splitting train-test, prediction and evaluation on testset)
  • need to experiment with smaller sample datasets (eg. smaller wikis) to perform extensive grid search)

Update weel 2020-08-10:

Update weel 2020-08-17:

  • most time spent on building supplementary datasets in order to use as additional features for modeling reading-sessions, specifically relation of articles based on link-network and text; as well as position of links in articles

Update week 2020-09-01:

Update week 2020-09-07:

  • None (worked on link-recommendation)

Update week 2020-09-14:

  • performed thorough analysis of hyperparameters to fine-tune performance of models. proper choice of hyperparameters is crucial as there is strong dependence of performance with particular choice. in addition we have to take into account constraints with respect to the resulting model size which puts limit on, e.g., number of dimensions when hosting the model on a cloud-vps instance
  • started to write up documentation to be moved to meta later https://docs.google.com/document/d/1LsqdacaUnsZhoBKj0mOSEL5Bsuf_HThASXN3O9bpIl0/edit#
    • pre-processing and training model
    • API-hosting
    • in-depth analysis on hyperparameters/convergence/etc.

Update week 2020-09-21:

  • updated model to do additional filtering
  • planning next steps for analysis in coming quarter
    • evaluation (e.g. coverage for different wikis,)
    • capture basic properties (how related to similar model based,e.g. the links-model, how dynamic in time, )
    • how to serve different use-cases such as list-building

Update week 2020-10-05:

  • added interactive user interface to API to generate custom lists of articles

https://reader.toolforge.org/

resolving this task (future work will be captured in follow-up tasks)