Page MenuHomePhabricator

Weekly Report for Reinvent Translation Search
Closed, ResolvedPublic

Description

May 25 - May 29

  • Had meeting with Nikerabbit and Nemo_bis to discuss mainly T98665
  • Work in progress for T100175 (reuse TUX elements) and case sensitive search T100013
  • Investigating better approach to display translated and untranslated messages:
    1. Fetch all resultset at once without size parameter: Easy for displaying translated messages. May prove to be inefficient as documents grow. We may get result within 1st 100 resultset or so, then fetching remaining resultsets is not useful.
    2. Fetch 100 results(apply size parameter), if a minimum no. of translations(i.e. 25) are not found then search for next 100 and so on: Will not make unnecessary fetch but may require quite number of search iterations in the worst case.
    3. Third approach Data Denormalization better than the above two: Add new field to documents for list of languages for which translations exist. We can use the same field as missing query for finding untranslated messages. This would also require code changes for translations update.
  • As per discussion, review in progress for submitted patch T97961.
  • Sorted the group list based on search result count instead of alphabetically T100393

June 1 - June 5

We should try to use MessageCollection if we could not come up with any better approach.

June 8 - June 12

  • The above mentioned procedure would require update of more than 300 documents for a translation, in comparison to one document update.
  • https://gerrit.wikimedia.org/r/#/c/217239/
  • Discussion with the mentors for a better solution: If we are unable to retrieve the required results with ES indexed documents, then we may at last use MessageCollection( which requires db hit ) to collect data.
  • Algorithm currently working on to filter out translated and untranslated messages( no extra fields are required ):
    • Use "Filtered query" to search for a string in a source language and collect 'localid' and 'scores'( keep scores for documents to not lose its relevance )
    • Find translated messages from the list of 'localid' and use Function score script to replace the scores of the documents returned with the scores from step 1.
    • The list of id's from step 2 not found in the list from step 1 are the untranslated messages.
    • Here the fuzzy messages are counted in untranslated messages because currently we do not keep the indexed documents for fuzzy messages. So to keep track for fuzzy messages we have to filter out messages from untranslated messages retrieved in step 3. We can achieve this by using MessageCollection->filterFuzzy(). The main problem lies in efficiency to retrieve fuzzy messages which involves steps 1->2->3.
      • I think what might work is to keep fuzzy messages instead of deleting it from the index and to help us filter out fuzzy messages from translated ones, add a extra field fuzzy with no data.( If we are totally against adding an extra field due to space constraint, we may try using different format for localid , e.g. fuzzy-MediaWiki:Config, though not sure as it may break some functionality )

June 15 - June 21

June 22 - June 28

June 29 - July 3

  • Midterm evaluation T103155
  • Integrate search features here
  • Deploy code in Labs-instance.
  • User Interface enhancements.
  • Read elasticsearch documents.

July 4 - July 10

July 13 - July 19

July 20 - July 26

July 27 - Aug 2

  • Review in progress for T100345
  • Developed API module for translated, untranslated and outdated messages T106931
  • Worked on UI improvement T106319

Aug 3 - Aug 9

Aug 10 - Aug 16

Aug 17 - Aug 23

Related Objects

Event Timeline

Phoenix303 claimed this task.
Phoenix303 raised the priority of this task from to Medium.
Phoenix303 updated the task description. (Show Details)
Phoenix303 added a project: Translation-search.
Phoenix303 set Security to None.
Phoenix303 updated the task description. (Show Details)

Hi @Phoenix303! I hope you haven't forgotten to keep this task updated. :)

Hi @NiharikaKohli, I didn't realize that this task was not up-to-date. Thanks for letting me know :)

Thanks. That was the last one, according to the parent task!