As a relevance engineer, I want a canonical dataset that I can use as a basis for specific usecases. I don't want to redo the same cleanup, merge, etc... work for each use cases.
AC:
- Query completion, Glent and MjoLniR all use the same intermediate dataset