Background/Goal
As a follow up to T356608 which was undertaken for Goal T354965, and as an outcome of the hackathon project during a recent DPE offsite, we'd like to incorporate the use of LLM to return task summaries in the baseline metrics tool.
This epic captures the collection of tasks related to integrating LLM capabilities for generating task summaries in the baseline metrics tool.
KR/Hypothesis(Initiative)
SDS 2.5
Four feature teams use shared tools to evaluate and improve user experiences based on empirical data from user interactions.
The data provided by the MVP tool will help inform baseline metrics.
Success metrics
- How we will measure success?
Task summaries included in the results returned from the baseline metrics tool will enable quicker understanding of tickets and their relevance to the search queries performed.
Dependencies
- T354870 Setup Lift Wing to enable easy deployment of HuggingFace models including LLMs - once this is complete, ML team will move to a 70b parameter model.
- This will presumably allow us to build/access an API endpoint that will use this model for generating summaries of Phabricator tickets
- Timeline is maybe 2 months - waiting on order of GPUs - one will be installed imminently, then testing, then order more.
Technical notes
For the hackathon experiment, we used LangChain, llama2 70B on an M2 silicon mac as well as openai (see https://gitlab.wikimedia.org/ahoelzl/phab-summarizer), exposed a port on it for an API endpoint, added another await/async cmd to the Vue project to include task summaries for each Phab ticket id returned. The code updates to the baseline metrics tool are trivial.
Sample code from hackathon running Ollama on an intel mac:
from langchain_community.llms import Ollama from langchain.chains.summarize import load_summarize_chain from langchain_community.document_loaders import WebBaseLoader from langchain_openai import ChatOpenAI loader = WebBaseLoader("https://phabricator.wikimedia.org/T120242") docs = loader.load() llm = Ollama(model="llama2") chain = load_summarize_chain(llm) chain.run(docs)
Example screenshot of baseline metrics tool with LLM-generated task summaries: