Page MenuHomePhabricator

[L] Re-organise media-search-signal-test, and create new interface to add new labeled data
Closed, ResolvedPublic

Description

The https://github.com/cormacparle/media-search-signal-test repo was created specifically to test elasticsearch scores

The way we tested the scores is now obsolete, but the rated image search results that came from the testing are very useful, so let's re-organise things so as to center those

Step 1

Rename the table results_by_component and get rid of no-longer-useful columns so that the structure is something like this

(note that languages for the existing search terms can be found in input/searchTerms.csv)

N.B. will need to make sure other scripts (in jobs/) are updated to handle the db changes)

create table ratedSearchResult (
  id int not null auto_increment,
  searchTerm varchar(255) not null,           # the search term
  language varchar(5) not null default 'en',  # the ISO code for the language of the search term
  result varchar(255) not null,               # file page title string for the result
  rating tinyint not null,                    # -1 if result is a bad match, 0 if it's indifferent, +1 if it's good
) engine=innodb;

Also create some tables to allow tagging:

create table tag (
  id int not null auto_increment,
  text varchar(255) not null,                 # the tag name
) engine=innodb;

create table ratedSearchResult_tag (
  ratedSearchResultId int not null,
  tagId int not null, 
  unique key (ratedSearchResultId, tagId)
) engine=innodb;

Step 2

Add a new simple user interface where a user can enter a search term in a text box, and a list of file page urls strings in 3 textareas - one for good matches, one for indifferent, one for bad (with optionally 1 or more tags). Data added in this way will be added to labeled data

Also probably we can archive the script GetImagesForClassification.php

Event Timeline

Cparle renamed this task from Create dedicated application for storing labeled image search results to Re-organise media-search-signal-test, and create new interface to add new labeled data.Apr 21 2021, 8:40 AM
Cparle updated the task description. (Show Details)
CBogen renamed this task from Re-organise media-search-signal-test, and create new interface to add new labeled data to [L] Re-organise media-search-signal-test, and create new interface to add new labeled data.Apr 21 2021, 4:55 PM