[Epic] LLM integration for task summaries in baseline metrics tool
Open, LowPublic
Actions

Assigned To

None

Authored By

	cjming
	Mar 22 2024, 8:43 PM

Description

Background/Goal

As a follow up to T356608 which was undertaken for Goal T354965, and as an outcome of the hackathon project during a recent DPE offsite, we'd like to incorporate the use of LLM to return task summaries in the baseline metrics tool.

This epic captures the collection of tasks related to integrating LLM capabilities for generating task summaries in the baseline metrics tool.

KR/Hypothesis(Initiative)

SDS 2.5
Four feature teams use shared tools to evaluate and improve user experiences based on empirical data from user interactions.

The data provided by the MVP tool will help inform baseline metrics.

Success metrics

How we will measure success?

Task summaries included in the results returned from the baseline metrics tool will enable quicker understanding of tickets and their relevance to the search queries performed.

Dependencies

T354870 Setup Lift Wing to enable easy deployment of HuggingFace models including LLMs - once this is complete, ML team will move to a 70b parameter model.
- This will presumably allow us to build/access an API endpoint that will use this model for generating summaries of Phabricator tickets
- Timeline is maybe 2 months - waiting on order of GPUs - one will be installed imminently, then testing, then order more.

Technical notes

For the hackathon experiment, we used LangChain, llama2 70B on an M2 silicon mac as well as openai (see https://gitlab.wikimedia.org/ahoelzl/phab-summarizer), exposed a port on it for an API endpoint, added another await/async cmd to the Vue project to include task summaries for each Phab ticket id returned. The code updates to the baseline metrics tool are trivial.

Sample code from hackathon running Ollama on an intel mac:

from langchain_community.llms import Ollama
from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import ChatOpenAI

loader = WebBaseLoader("https://phabricator.wikimedia.org/T120242")
docs = loader.load()

llm = Ollama(model="llama2")
chain = load_summarize_chain(llm)

chain.run(docs)

Example screenshot of baseline metrics tool with LLM-generated task summaries: