Software Engineer, Cisco. Google Summer of Code 2019 with Wikimedia Foundation.
User Details
- User Since
- Feb 25 2019, 1:41 PM (264 w, 12 h)
- Availability
- Available
- LDAP User
- Usmanmuhd
- MediaWiki User
- Muhdusman [ Global Accounts ]
Jun 20 2021
Hi Team,
Dec 4 2020
Please delete the last 2 accounts from LDAP. I'll use the first one.
Sep 4 2019
Aug 28 2019
Aug 26 2019
T216750: Article recommendation API: replace WDQS with MW API fix has been merged. This will no longer be an issue.
Aug 24 2019
Final Evaluation Summary:
Aug 23 2019
@Aklapper Wanted to get the patches merged before the GSoC deadline, nothing else.
Aug 22 2019
I also have to get another patch (https://gerrit.wikimedia.org/r/#/c/research/article-recommender/deploy/+/527571/) merged. The deadline is in ~ 3 days from today. It would be great if both the patches could be merged before the deadline.
Aug 19 2019
Aug 18 2019
@bmansurov Updated the patch. Instead of committing the change after each chunk, it is committing all the chunks at the end. So if it fails at any point running the same previous command will import the tsv file without any problems.
Aug 17 2019
@bmansurov How would I move from the temp table to the actual table?
Aug 13 2019
@bmansurov Updated the patch. Please take a look.
Aug 11 2019
@bmansurov Made the changes as required. Please take a look.
Aug 9 2019
@bmansurov I'm deleting the chunks once it's imported to keep a track of which all chunks have already been imported and which are yet to be imported. I did not really understand the advantages of placing it in /tmp if I'm going to delete the files anyway.
Aug 4 2019
@bmansurov Added the ability to continue when it stops at a given chunk.
Example:
Aug 2 2019
@bmansurov The working currently is:
- Chunks the file and stores in temp/<dir>/chunk-<i>.tsv
- Imports each chunk, executes the sql command and commits each transaction at the end of each chunk.
- Deletes the directory.
@bmansurov Thanks, figured out the error. I was using python3 along with the python-mysql.connector which is for python2.
Aug 1 2019
@bmansurov I shifted to a different system and now when I run python3 deploy.py import_languages 20181130 localhost 3306 recommendationapi usman db_password.txt --language_file 20181130/language.tsv I get the error as below. Tried various settings after searching online. None seem to work. Basically the error is on LOAD DATA LOCAL INFILE commands.
Jul 29 2019
Yeah, I think the UNIQUE constraint would be (wikidata_id, normalized_rank, source_id, target_id).
Yeah, that's a great idea. In case of a failure while importing a chunk (which can arise due to invalid data present in the chunk), how do we make sure that the values already entered inside that chunk are not entered again?
One approach would be to validate the chunk each time before importing it.
Jul 27 2019
@bmansurov As we are using LOAD DATA there is no way apart from splitting the file into chunks and then inserting each file separately.
@bmansurov How do I get the chunks that need to be imported? If I need to generate the chunks by myself, what's the procedure to generate it?
Jul 26 2019
@bmansurov pushed the fix. Please take a look now.
@bmansurov We had reduced the gpvimlimit in this case to 50 because we were having problems with ruwiki. I tried increasing the gpvimlimit and looks like it's been resolved.
@bmansurov There is a problem with handling it this way. We don't know the value that has to be passed until the previous request is finished. Trying to figure out the right implementation for this.
Jul 25 2019
@bmansurov Should I handle the llcontinue in getArticles itself? i.e. if llcontinue then call mwApiGet again from the same place.
@bmansurov made the changes and also enabled the tests.
Jul 24 2019
@bmansurov I think that is required when we have more than 10 languages being returned. In our case it's just one that is being returned. Should we really have that still?
Jul 22 2019
@bmansurov The Q406 item is coming from the following query: https://ru.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops|langlinks|langlinkscount&ppprop=wikibase_item|disambiguation&generator=mostviewed&gpvimlimit=50&gpvimoffset=450&lllang=uz .
Jul 16 2019
@bmansurov Pushed the patch for getArticlesBySeed.
Edit1: Figured it out.
Jul 15 2019
@bmansurov Is this query right for getting items by pageviews? https://uz.wikipedia.org/w/api.php?format=json&action=query&prop=pageprops|langlinks&ppprop=wikibase_item|disambiguation&lllang=en&generator=mostviewed
Jul 12 2019
@bmansurov Oh I confused sitelink_count with index. In our case we don't have sitelink count. Should we fetch it from the API or should we just phase it out?
@bmansurov Given a seed, I am retrieving the wikibase_item, title and index. I should sort using index in ascending or descending?
How do I pass multiple props and ppprops? When I do this it's always returning 404. Same when I swapped out | for a ,.
const parameters = { format: 'json', action: 'query', prop: 'pageprops|langlinks', ppprop: 'wikibase_item|disambiguation', lllang: target, generator: 'search', gsrlimit: 500, gsrsearch: `morelike:${seed}` };
Yeah, it works now. Thanks.
Even after deleting the node_modules and npm install, the error still persists.
@bmansurov When I run npm run test | bunyan I am getting the following error:
Jun 29 2019
Thanks for the great feedback!
Jun 28 2019
I can't use ppprop or lllang along with action='wbgetentities'. It gives "*": "Unrecognized parameters: ppprop, lllang."
Oh okay.
So the flow would be:
- Get all the entities and corresponding data.
- Remove all the items containing enwiki in sitelinks.
- Remove all the items having 'disambiguation' in the labels.
Jun 26 2019
I meant to say that SparQL is excluding the entity.
I removed all entities which have 'enwiki', I got ['Q4077077', 'Q4427926', 'Q24287657', 'Q52686724']. The extra entity I am getting is 'Q4077077'. Seems like it's getting excluded in SparQL due to ?article schema:about ?item ..
Which attribute is being checked in this part?
Jun 24 2019
I don't think there is a way to get only the counts.
For this: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q3986754|Q4224|Q4429859|Q306403|Q2498038|Q271534|Q274306|Q205707|Q229651|Q222|Q4077077|Q4427926|Q2983218|Q166502|Q3023357|Q1924847|Q34436|Q19865538|Q24287657|Q42296351|Q52686724|Q47300912|Q64768584&props=sitelinks&format=json
we get the counts correctly but also there are extra items other than the query(https://query.wikidata.org/#SELECT%20%3Fitem%20%28COUNT%28%3Fsitelink%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20VALUES%20%3Fitem%20%7B%20wd%3AQ3986754%20wd%3AQ4224%20wd%3AQ4429859%20wd%3AQ306403%20wd%3AQ2498038%20wd%3AQ271534%20wd%3AQ274306%20wd%3AQ205707%20wd%3AQ229651%20wd%3AQ222%20wd%3AQ4077077%20wd%3AQ4427926%20wd%3AQ2983218%20wd%3AQ166502%20wd%3AQ3023357%20wd%3AQ1924847%20wd%3AQ34436%20wd%3AQ19865538%20wd%3AQ24287657%20wd%3AQ42296351%20wd%3AQ52686724%20wd%3AQ47300912%20wd%3AQ64768584%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20FILTER%20NOT%20EXISTS%20%7B%20%3Fitem%20wdt%3AP31%20wd%3AQ4167410%20.%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20OPTIONAL%20%7B%20%3Fsitelink%20schema%3Aabout%20%3Fitem%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20FILTER%20NOT%20EXISTS%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3Farticle%20schema%3Aabout%20%3Fitem%20.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3Farticle%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E%20.%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%20GROUP%20BY%20%3Fitem). Searching for 'ruwiki' or 'enwiki' in ['sitelinks'] also does not help either.
@bmansurov Is this API query a right replacement for the SparQL?
Jun 22 2019
Hopefully will get it done by Monday or Tuesday.
@bmansurov I think T216750: Article recommendation API: replace WDQS with MW API should solve the issue. If the issue still persists, I'll come back to it.
Jun 20 2019
Evaluation 1 summary:
- Issues completed:
- Issue currently being worked upon: T216750: Article recommendation API: replace WDQS with MW API
- Biweekly reports published at : usmanmuhd.com:gsoc2019
Jun 18 2019
Thanks!
Minor point:
- Why is the count:24 even after passing count=5? It works as expected on my local env.
Thanks, shall I move on to the next task or do I have something else to do before moving on to the next one?
Jun 17 2019
A few observations:
- https://en.wikipedia.org/api/rest_v1/data/recommendation/article/creation/translation/ru?count=5 returns 24 items itself. Works as expected on local machine.
- Should we explore the tests for this API?
- Using 50 for sparql currently suffices, but we risk running into 429 error due to this. Should we increase the limit?
Jun 15 2019
Pushed the changes after making the filter() use batches as well.
@bmansurov http://localhost:6927/en.wikipedia.org/v1/article/creation/translation/ru works perfectly.
I checked http://localhost:6927/uz.wikipedia.org/v1/article/creation/translation/ru with both fix-T215222 branch and the master branch. Both give an error.
Further investigation reveals:
This error is being caused in https://query.wikidata.org/sparql.
Jun 14 2019
Updated the code as requested. Please take a look.
npm run test | bunyan giving this error:
Pushed the quick fix for the error. Will report the bug in a while.
ru.wikipedia.org API behaves much differently from the others.
Example:
- https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops&generator=mostviewed&ppprop=wikibase_item&gpvimlimit=50&gpvimoffset=950 : 50 items are not present here yet it does not throw a server error.
- https://ru.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops&generator=mostviewed&ppprop=wikibase_item&gpvimlimit=50&gpvimoffset=200 : Here also 50 items are not present but it throws a server error.(https://ru.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops&generator=mostviewed&ppprop=wikibase_item&gpvimlimit=12&gpvimoffset=200 : this works fine, but it has the *continue* attribute yet throws an error.).
There is a "continue" attribute in the body of the request when we have more items than the limit. Example:
{"batchcomplete":"","continue":{"gpvimoffset":250,"continue":"gpvimoffset||"},"query":{"pages":{"-4":{"ns":2,"title":"User:Geilamir","missing":"","known":""},"-5":{"ns":2,"title":"User:Logan","missing":"","known":""},"-6":{"ns":2,"title":"User:Courcelles","missing":"","known":""},"-7":{"ns":6,"title":"File:Tanzania in its region.svg","missing":"","known":""},"-1":{"ns":-1,"title":"Special:Contributions/84.198.31.211","special":""},"-2":{"ns":-1,"title":"Special:EmailUser/Troubled asset","special":""},"-3":{"ns":-1,"title":"Special:NewPages","special":""},..........................................."4474":{"pageid":4474,"ns":828,"title":"Module:Citation/CS1/Utilities","pageprops":{"wikibase_item":"Q21993353"}},"4445":{"pageid":4445,"ns":828,"title":"Module:No globals","pageprops":{"wikibase_item":"Q16748603"}}}}}
Should I make use of this attribute to fetch all the items or should we limit it to 500 or the number of items which ever is lesser?
Jun 13 2019
Yeah, higher sitelink_count is important. It is being sorted here https://github.com/wikimedia/mediawiki-services-recommendation-api/blob/master/lib/article.creation.translation.js#L182.
I tested for different number of items. It gives a different output. Basically the elements returned by the API are retrieved from the db along with other data. Example:
Yeah, it works. How do we handle this case?
Sent the same request as the one being sent through sandbox:
Jun 12 2019
Yeah, just tested with the valid API call and the invalid API call. response.body.error will not be null in case of error and will be null in case of no error. Will send in a patch in sometime.
Jun 10 2019
How do I reproduce this error?
- http://localhost:6927/en.wikipedia.org/v1/article/creation/translation/uz/Kolloid
- http://localhost:6927/en.wikipedia.org/v1/article/creation/translation/uz
Both above work fine.
May 30 2019
May 28 2019
Output for http://localhost:6927/uz.wikipedia.org/v1/article/creation/morelike/Kitob
May 6 2019
Thanks a lot for selecting me! Looking forward to working on it.
Apr 9 2019
@Tgr Added 2 coding tasks. Will go ahead and submit the proposal.