User Story:
As a user, I would like a written example of using the Enterprise APIs to generate embeddings for a RAG Index.
As a user, I would like a written example of using a Wikipedia RAG index in a desktop-based LLM.
Objective (O2.KR1):
Documentation and content for Enterprise products is expanded to reduce the barrier to use for, and to enable further outreach efforts towards a broader range of organization reusers.
**Acceptance criteria**
# An EN Wikipedia based RAG index of N (est. <1000) embeddings has been created using the structured contents endpoint.
# A desktop-based foundational language model (e.g. Ollama) has used a Wikipedia-based RAG index for N (est. <50) test queries.
# Results of generating a Wikipedia-based RAG index and using the index in a desktop LLM experiment have been written up and summarized for content use by the product and growth marketing teams.
**ToDo**
- [ ] Select N page set and use page set to generate results from structured content endpoint
- [ ] Use results to generate embeddings and store embeddings in a queryable vector database
- [ ] Select and configure desktop-based LLM/runner to query vector database to use in response mechanism
- [ ] Select and run N queries to test RAG-based Q&A and log results
- [ ] Summarize steps to reproduce testing framework and review with product and product marketing for handoff
===== Test Strategy =====
Notes from engineering discussion [To be refined]:
- Run the ingestion and embedding on Apple M2 laptops to have zero costs
- Potentially use Ollama post and model as a framework to follow
- Use either Simple Wiki or Wikipedia as a data source and keep the page list small for ease of reuse and lower LoE
- Secondary objective (P2) Publish dataset on huggingface as an initial PoC for other datasets in the future and to set up WME posting process
**Checklist for testing**
- [ ] Wikimedia content recall?
- [ ] Answering precision?
===== Things to consider: =====
* Scope of work for the post and size of dataset
* Do we want to document this elsewhere as well?
===== Description (optional) =====