Page MenuHomePhabricator

Remove duplicate Wikidata items from article recommendations
Closed, ResolvedPublic

Description

When you visit article recommendations for Kitob (Book in Uzbek) you'll see that Q159964 is being recommended twice, with different normalized scores. This is probably caused by us retrieving recommendations from multiple source languages and not filtering out duplicates. Recommended Wikidata IDs should be unique.

[{"wikidata_id":"Q125576","normalized_rank":0.930232},{"wikidata_id":"Q125576","normalized_rank":0.927625},{"wikidata_id":"Q82622","normalized_rank":0.924255},{"wikidata_id":"Q226697","normalized_rank":0.919053},{"wikidata_id":"Q133036","normalized_rank":0.919053},{"wikidata_id":"Q159964","normalized_rank":0.919053},{"wikidata_id":"Q49848","normalized_rank":0.917277},{"wikidata_id":"Q170124","normalized_rank":0.917277},{"wikidata_id":"Q29334","normalized_rank":0.913319},{"wikidata_id":"Q159964","normalized_rank":0.912596}]

Mentors

Skills Required

  • Javascript

Acceptance Criteria

  • Each Wikidata ID is recommended only once, and the one with the highest normalized rank among competing suggestions is recommended. The change needs to be done in the article.creation.morelike.js file. Clone the repository from here.

Event Timeline

bmansurov added a subscriber: Usmanmuhd.

@Usmanmuhd would you be interested in working on this task too?

bmansurov renamed this task from morelike article recommendation: same wikidata item is being recommended twice to Remove duplicate Wikidata items from article recommendations.Mar 4 2019, 2:56 PM
bmansurov removed bmansurov as the assignee of this task.
bmansurov updated the task description. (Show Details)

Yeah, I could merge the different tasks into one single project as they are all not very big and work on it. Can I sync up on mail for the proposal and detailed discussion?

@Usmanmuhd sure, email is fine. You can also join us at #wikimedia-research on freenode. It's probably best to keep these tasks separate and create a parent task.

Could you please provide me your email?

It's probably best to keep these tasks separate and create a parent task.

Yeah, that's what I meant. Maybe I worded it wrong.
It would be a project like Improvements to morelike and then all subtasks.

It's bmansurov at wikimedia dot org.

Hi , I am Ayush Pratap Singh I am a freshman year student of Computer Science and I would like to work on this project . Can you assign me some microtasks ? Further Can I work on the the microtask already assigned to someone?

@Dantraztrev thanks for the interest, but we already have two students for this project. If a task is assigned to someone, you cannot work on that task.

@bmansurov When I reproduced this error along with source_language it turns out they are different source languages.
This is the output I got:

[{"wikidata_id":"Q125576","normalized_rank":0.930232,"source_language":"en"},{"wikidata_id":"Q125576","normalized_rank":0.927625,"source_language":"ru"},{"wikidata_id":"Q159964","normalized_rank":0.919053,"source_language":"en"},{"wikidata_id":"Q226697","normalized_rank":0.919053,"source_language":"en"},{"wikidata_id":"Q170124","normalized_rank":0.917277,"source_language":"en"},{"wikidata_id":"Q49848","normalized_rank":0.917277,"source_language":"en"},{"wikidata_id":"Q83357","normalized_rank":0.913319,"source_language":"en"},{"wikidata_id":"Q29334","normalized_rank":0.913319,"source_language":"en"},{"wikidata_id":"Q159964","normalized_rank":0.912596,"source_language":"ru"},{"wikidata_id":"Q178659","normalized_rank":0.906879,"source_language":"en"}]

How do I handle this case?

We should probably output the top ranking item from all languages. So the second Q125576 (Russian) would not be output.

Change 512913 had a related patch set uploaded (by Usmanmuhd; owner: Usmanmuhd):
[mediawiki/services/recommendation-api@master] Remove duplicate Wikidata items from article recommendations

https://gerrit.wikimedia.org/r/512913

Output for http://localhost:6927/uz.wikipedia.org/v1/article/creation/morelike/Kitob

[{"wikidata_id":"Q125576","normalized_rank":0.930232,"source_language":"en"},{"wikidata_id":"Q159964","normalized_rank":0.919053,"source_language":"en"},{"wikidata_id":"Q226697","normalized_rank":0.919053,"source_language":"en"},{"wikidata_id":"Q49848","normalized_rank":0.917277,"source_language":"en"},{"wikidata_id":"Q170124","normalized_rank":0.917277,"source_language":"en"},{"wikidata_id":"Q29334","normalized_rank":0.913319,"source_language":"en"},{"wikidata_id":"Q178659","normalized_rank":0.906879,"source_language":"en"},{"wikidata_id":"Q191529","normalized_rank":0.898231,"source_language":"ru"},{"wikidata_id":"Q179254","normalized_rank":0.893395,"source_language":"ru"},{"wikidata_id":"Q173242","normalized_rank":0.883799,"source_language":"en"}]

@Usmanmuhd let's also add tests. I think you can use the test/lib/article.creation.morelike.js file for that.

Change 512913 merged by Bmansurov:
[mediawiki/services/recommendation-api@master] Remove duplicate Wikidata items from article recommendations

https://gerrit.wikimedia.org/r/512913

Fixed in master. We'll close the task once the change is production and the fix is verified.

@Usmanmuhd your fix is live. Good job.