Page MenuHomePhabricator

Cross-team Blog Content: Running a desktop-based LLM with an Enterprise RAG index
Open, Needs TriagePublic5 Estimated Story Points

Description

User Story:
As a user, I would like a written example of using the Enterprise APIs to generate embeddings for a RAG Index.
As a user, I would like a written example of using a Wikipedia RAG index in a desktop-based LLM.

Objective (O2.KR1):
Documentation and content for Enterprise products is expanded to reduce the barrier to use for, and to enable further outreach efforts towards a broader range of organization reusers.

Acceptance criteria

  1. An EN Wikipedia based RAG index of N (est. <1000) embeddings has been created using the structured contents endpoint.
  2. A desktop-based foundational language model (e.g. Ollama) has used a Wikipedia-based RAG index for N (est. <50) test queries.
  3. Results of generating a Wikipedia-based RAG index and using the index in a desktop LLM experiment have been written up and summarized for content use by the product and growth marketing teams.

ToDo

  • Select N page set and use page set to generate results from structured content endpoint (~500 articles to start experimenting)
  • Use results to generate embeddings and store embeddings in a queryable vector database
  • Select and configure desktop-based LLM/runner to query vector database to use in response mechanism
  • Select and run N queries to test RAG-based Q&A and log results
  • [50%] Summarize steps to reproduce testing framework and review with product and product marketing for handoff
Test Strategy

Notes from engineering discussion [To be refined]:

  • Run the ingestion and embedding on Apple M2 laptops to have zero costs
  • Potentially use Ollama post and model as a framework to follow
  • Use either Simple Wiki or Wikipedia as a data source and keep the page list small for ease of reuse and lower LoE
  • Secondary objective (P2) Publish dataset on huggingface as an initial PoC for other datasets in the future and to set up WME posting process

Checklist for testing
We need good example chat prompts that show different responses when RAG is enabled and disabled

Things to consider:
  • Scope of work for the post and size of dataset
  • Do we want to document this elsewhere as well?
Description (optional)

Event Timeline

I have a work POC that I shared with Chuck.

I'll work on the Python code to save the CSV dataset, first version is in Go. A blog post would be better in Python to give it a broader appeal.

I'll do a second draft of the blog post steps, including:

  1. The dataset steps
  2. The dependency installation steps
  3. Import dataset into ChromaDB
  4. CLI query testing
  5. Bonus steps to build Web UI with Streamlit

There is a new repo in the Experiments group: for-blog-LLM-RAG

@creynolds Can you please let us know if the information presented is enough for you to do your part? Do you need anything else? FYI, the sprint ends next Thursday and we'd like to know if engineering work is done on this one. Thanks so much!

@JArguello-WMF code/readme is great. Only asked ROd for some helper intro talk as precursor to help write copy then I'm good and can take it from here.

Moved to done, because engineering work is done.

I think we cover the intro text in our chat last night. @creynolds do you have what you need?