Page MenuHomePhabricator

Create a python package to compute wikitext embeddings in the WMF data infra
Closed, ResolvedPublic

Description

This notebook can be used to generate section level embeddings using huggingface's sentence-transformer. This task to move this code into a python package. The goal is to enable faster iteration when working with embeddings, including

  • using different granularity of wikitext
  • using different models to generate embeddings
  • scaling computation of embeddings in wmf infrastructure
  • sharing code between projects making use of embeddings

This is one already well-defined task of a larger effort to create tooling and infrastructure to work with embeddings, which is in planning.

Details

Due Date
Dec 15 2023, 5:00 AM

Event Timeline

fkaelin set Due Date to Dec 15 2023, 5:00 AM.
fkaelin moved this task from Backlog to Staged on the Research board.

@MunizaA / @fkaelin: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!