Page MenuHomePhabricator

'morelike' recommendation API: Bulk import data to MySQL in chunks
Open, Needs TriagePublic

Description

We have an article recommendation API that suggests articles for creation based on a seed article. For example, here you can see that articles similar to 'Book' and identified by Wikidata item IDs are missing from enwiki and thus are being suggested for creation.

The API pulls data from various places, and one of those places is MySQL. Data gets into MySQL by the article-recommender/deploy repository. Since we run the import script in a shared host and import data to a shared database, we'd like to not block other processes while importing large quantities of data. For this reason we'd like to import data in chunks.

Mentors

Skills required

  • MySQL, Python

Acceptance Criteria

Event Timeline

bmansurov updated the task description. (Show Details)Jan 18 2019, 7:47 PM
bmansurov updated the task description. (Show Details)Feb 27 2019, 2:57 PM

@bmansurov Thank you for listing this project! Could you maybe add more details to the project description, also add skills required and then add it to the our ideas page https://www.mediawiki.org/wiki/Google_Summer_of_Code/2019?

As a sidenote I'm not sure what "A/C" in the task description means :)

bmansurov updated the task description. (Show Details)Feb 28 2019, 3:24 PM

@srishakatux thanks for the feedback. Please let me know if anything else is needed for this task to be considered ready for work.

Hey, I'm Muhammad Usman. I am interested in this project. I have a good experience with working on Python and MySQL as well. I have quite a few number of contributions to open source projects.

  • Are there any micro-tasks for this project?
  • Could you please fix the link of the repository as it's currently broken?

Thanks.

bmansurov updated the task description. (Show Details)Mar 1 2019, 1:20 PM

@Usmanmuhd, welcome! The link's been fixed. There are no micro tasks for this project, unless you want to split up the work into meaningful parts and work on them separately. But I think the project is self containing.

In the README file it says Data is in stat1007:/home/bmansurov/tp9/article-recommender-deploy/ Where can I get this file?
Is there some kind of project board for this and also any recommended issues to work on?

In the README file it says Data is in stat1007:/home/bmansurov/tp9/article-recommender-deploy/ Where can I get this file?

I've updated the readme. Here's the file: https://analytics.wikimedia.org/datasets/one-off/article-recommender/20181130.tar.gz

Is there some kind of project board for this and also any recommended issues to work on?

Yes, we have https://phabricator.wikimedia.org/project/view/1351/, but this is the only task ready for work. I'll be creating more tasks before the program starts.

This comment was removed by Usmanmuhd.

@Usmanmuhd on IRC you mentioned that you'd submit a patch for this task. If you started working on the task, feel free to assign it to yourself. Also feel free to ask questions here, on IRC, or via email.

Usmanmuhd added a comment.EditedMar 5 2019, 2:17 PM

I would first prefer to work on https://phabricator.wikimedia.org/T216721 as it kind of looks simpler on the first look.