The https://github.com/cormacparle/media-search-signal-test repo was created specifically to test elasticsearch scores, and as a result of using it we have a set of rated image search results.
The way the data is created and stored makes it awkward to update and analyse those rated results outside the original context, so it would be best to reorganise things to make adding to the data, and analysing it, easier
Step 1
---
Move the data into its own dedicated store, in a structure similar to this. Tagging will allow us to analyse subsets of labeled results based on tags
```
create table ratedSearchResult (
id int not null auto_increment,
searchTerm varchar(255) not null, # the search term
language varchar(5) not null default 'en', # the ISO code for the language of the search term
result varchar(255) not null, # file page title string for the result
rating tinyint not null, # -1 if result is a bad match, 0 if it's indifferent, +1 if it's good
) engine=innodb;
create table tag (
id int not null auto_increment,
text varchar(255) not null, # the tag name
) engine=innodb;
create table ratedSearchResult_tag (
ratedSearchResultId int not null,
tagId int not null,
unique key (ratedSearchResultId, tagId)
) engine=innodb;
```
Step 2
---
Create a simple http api to allow more data to be added. The endpoint should accept a single `ratedSearchResult` entity, or an array of entities
```
POST /ratedSearchResults/
{
"searchTerm": "dog",
"language": "en",
"result": "File:Heterochromia_dog,_Struga.jpg",
"rating": 1,
"tags": [
"some tag",
"some other tag"
]
}
```
The fields `language` and `tags` are optional. `language` defaults to 'en'. If a tag is sent that doesn't exist in the `tag` table then a new entry for that tag will be created in the table
Step 3
---
Deploy on toolforge
Possible other steps
---
- A simple user interface where a user can enter a search term in a text box, and a list of `File:x` strings in 3 textareas - one for good matches, one for indifferent, one for bad. Writes data via the API
- a gadget where you can click on a search result on the MediaSearch page and say if it's a good/bad match, and send the data to the API