Set up a pipeline/framework for human annotators to label relevance of different search results.
Different available tools from similar previous experiments, which we could hopefully re-use and adapt to our use-case
* Discernatron
* https://www.mediawiki.org/wiki/Discernatron
* https://github.com/wikimedia/wikimedia-discovery-discernatron
* Media search
* https://media-search-signal-test.toolforge.org/
* https://media-search-signal-test.toolforge.org/synonyms_bak.html
* https://toolsadmin.wikimedia.org/tools/id/media-search-signal-test
* https://github.com/cormacparle/media-search-signal-test
* Article level image suggestion
* https://gitlab.wikimedia.org/toolforge-repos/alis-evaluation
* https://alis-evaluation.toolforge.org/
* https://toolsadmin.wikimedia.org/tools/id/alis-evaluation
* Annotool
* https://annotool.toolforge.org/projects/13
* https://toolsadmin.wikimedia.org/tools/id/annotool
* https://gitlab.wikimedia.org/mnz/annotool