Goal: Learn about ways to fine-tune/adapt OOB models vs train our own vs prompt-engineer
Past & in-progress Research team work:
- Web team's summarization model approach: https://docs.google.com/document/d/1-qem2rPT43QKo2ZVUF7Xik5IqB2xfHWwCzGxpOSQn_k/edit?tab=t.0#heading=h.smgo78rlng06
- Aya-based approach: https://public-paws.wmcloud.org/User:MGerlach%20(WMF)/text-simplification/section-gists_v01.ipynb
- Initial in-house model: https://gitlab.wikimedia.org/repos/research/text-simplification/-/blob/main/tutorial-simplification-inference_v01.ipynb
- Recommendations: https://docs.google.com/document/d/1tQQaPcn5_Ph7qA_xMNGwjO3pablNfNzaibzSsFeJdko/edit?tab=t.0#heading=h.smgo78rlng06
- Other - OOB vector dbs/Wikipedia embeddings? (e.g. Cohere's https://cohere.com/blog/embedding-archives-wikipedia)
Questions to answer:
- Would any of the above be suitable for a quick (~couple of weeks to build & launch) experiment? How much added benefit would they give us over prompt engineering?
- What is a sensible way to test these out? Does Research team have a test query set and/or method for scoring desirability of output that they've been using that we could use? (Talk to Xiao about this)