Page MenuHomePhabricator

Recommendation API translation endpoint stopped working
Closed, ResolvedPublic

Description

Translation endpoint fails intermittently with:

/{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - normal source and target) is CRITICAL: Test article.creation.translation - normal source and target returned the unexpected status 404 (expecting: 200)

Further investigation revealed that it's caused by an error in MW API we started hitting

{"error":{"code":"internal_api_error_BadMethodCallException","info":"[XFjJ5wpAIDYAAFo57poAAACF] Caught exception of type BadMethodCallException","errorclass":"BadMethodCallException"},"servedby":"mw1342"}

Example log on MW side can be found here.

It appears to be a bug on multiple levels - mostviewed API in MW should not crumble like this, Recommendation-API should probably report situations like that as an internal server error (500) instead of a 404, even though the alert was seen in icinga, it didn't go to any mailing lists or IRC channels AFAIK.

Mentors

Skills Required

  • Javascript

Acceptance Criteria

  • Find out what's causing the problem and fix it. The codebase is here. The bug is most likely in lib/article.creation.translation.js or routes/article.creation.translation.js.

Details

Related Gerrit Patches:

Event Timeline

Pchelolo created this task.Feb 4 2019, 11:38 PM
bmansurov triaged this task as Medium priority.Feb 5 2019, 7:02 PM

This end-point is not being used in production, thus normal priority. Hopefully, I can debug the issue and fix it once I have time.

bmansurov updated the task description. (Show Details)
bmansurov added a subscriber: Stabgan.

@Stabgan would you be interested in working on this task too?

bmansurov removed bmansurov as the assignee of this task.Mar 5 2019, 5:50 PM
bmansurov edited subscribers, added: bmansurov; removed: Stabgan.

hey @bmansurov I am interested in this task can you assign me this task

@Ashutoshmishra255941 sorry, someone else is already working on this project. Same applies to other micro tasks that you commented on.

@Usmanmuhd, any progress on this? Do you have any questions?

The error is happening in production. When I visit this page, I'm seeing the following error:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found","title":"no results found","method":"GET","detail":"{\"error\":{\"code\":\"internal_api_error_BadMethodCallException\",\"info\":\"[XP47pApAMEYAAIU@zy0AAABD] Caught exception of type BadMethodCallException\",\"errorclass\":\"BadMethodCallException\"},\"servedby\":\"mw1235\"}","uri":"/en.wikipedia.org/v1/article/creation/translation/ru/"}

Here are the details:

{
  "_index": "logstash-2019.06.10",
  "_type": "mediawiki",
  "_id": "AWtBFo8l-pVO1Pag5uNn",
  "_version": 1,
  "_score": null,
  "_source": {
    "exception": {
      "trace": "#0 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php(798): ApiPageSet->processTitlesArray(array)\n#1 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php(720): ApiPageSet->initFromTitles(array)\n#2 /srv/mediawiki/php-1.34.0-wmf.8/extensions/PageViewInfo/includes/ApiQueryMostViewed.php(52): ApiPageSet->populateFromTitles(array)\n#3 /srv/mediawiki/php-1.34.0-wmf.8/extensions/PageViewInfo/includes/ApiQueryMostViewed.php(24): MediaWiki\\Extensions\\PageViewInfo\\ApiQueryMostViewed->run(ApiPageSet)\n#4 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php(176): MediaWiki\\Extensions\\PageViewInfo\\ApiQueryMostViewed->executeGenerator(ApiPageSet)\n#5 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php(140): ApiPageSet->executeInternal(boolean)\n#6 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiQuery.php(234): ApiPageSet->execute()\n#7 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiMain.php(1595): ApiQuery->execute()\n#8 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiMain.php(531): ApiMain->executeAction()\n#9 /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()\n#10 /srv/mediawiki/php-1.34.0-wmf.8/api.php(87): ApiMain->execute()\n#11 /srv/mediawiki/w/api.php(3): include(string)\n#12 {main}",
      "code": 0,
      "file": "/srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php:1186",
      "message": "Call to a member function getPrefixedText() on a non-object (null)",
      "class": "BadMethodCallException"
    },
    "server": "ru.wikipedia.org",
    "phpversion": "5.6.99-hhvm",
    "wiki": "ruwiki",
    "channel": "exception",
    "exception_id": "XP47BQpAEMMAAFVt618AAAAQ",
    "program": "mediawiki",
    "type": "mediawiki",
    "message_checksum": "43b3c8e55a90cb1993db083c5704ee11",
    "caught_by": "mwe_handler",
    "exception_url": "/w/api.php",
    "http_method": "POST",
    "host": "mw1314",
    "@version": 1,
    "shard": "s6",
    "timestamp": "2019-06-10T11:12:05+00:00",
    "severity": "err",
    "unique_id": "XP47BQpAEMMAAFVt618AAAAQ",
    "level": "ERROR",
    "ip": "10.64.16.195",
    "mwversion": "1.34.0-wmf.8",
    "logsource": "mw1314",
    "message": "[XP47BQpAEMMAAFVt618AAAAQ] /w/api.php   BadMethodCallException from line 1186 of /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php: Call to a member function getPrefixedText() on a non-object (null)",
    "normalized_message": "[{exception_id}] {exception_url}   BadMethodCallException from line 1186 of /srv/mediawiki/php-1.34.0-wmf.8/includes/api/ApiPageSet.php: Call to a member function getPrefixedText() on a non-object (null)",
    "url": "/w/api.php",
    "reqId": "XP47BQpAEMMAAFVt618AAAAQ",
    "tags": [
      "input-kafka-rsyslog-udp-localhost",
      "rsyslog-udp-localhost",
      "kafka",
      "truncated_by_filter_truncate",
      "es"
    ],
    "referrer": null,
    "@timestamp": "2019-06-10T11:12:05.000Z",
    "facility": "user"
  },
  "fields": {
    "@timestamp": [
      1560165125000
    ]
  },
  "sort": [
    1560165125000
  ]
}

See if you can pinpoint the code that's causing the above error. Let me know if you have questions.

@Usmanmuhd I saw your messages on IRC. Let's continue the conversation here.

Looks like you can inspect the MW response and figure out that it's returning an error even with the status code 200 by inspecting this:

"body": {
        "error": {
          "code": "internal_api_error_BadMethodCallException",
          "info": "[XQDagwpAAK4AALo@JI4AAABO] Caught exception of type BadMethodCallException",
          "errorclass": "BadMethodCallException"
        },

Do you think this is enough for you to return 500?

Usmanmuhd added a comment.EditedJun 12 2019, 1:00 PM

Yeah, just tested with the valid API call and the invalid API call. response.body.error will not be null in case of error and will be null in case of no error. Will send in a patch in sometime.

Is there some valid API example to actually test the 404 case as well?

Change 516732 had a related patch set uploaded (by Usmanmuhd; owner: Usmanmuhd):
[mediawiki/services/recommendation-api@master] Throw appropriate error when wmAPI returns internal server error.

https://gerrit.wikimedia.org/r/516732

It looks like the Russian Wikipedia has a lower limit. Try lowering 500 to 50.

Usmanmuhd added a comment.EditedJun 13 2019, 12:28 PM

Yeah, it works. How do we handle this case?

Just checked for the limits. Its 230.

We need to understand why we need 500 items. If the algorithm works with 50 items, then we should use 50. If not, let's discuss it further.

Usmanmuhd added a comment.EditedJun 13 2019, 1:18 PM

I tested for different number of items. It gives a different output. Basically the elements returned by the API are retrieved from the db along with other data. Example:

For 500 items:

{"count":24,"items":[{"wikidata_id":"Q1485500","title":"Tovar","sitelink_count":26},{"wikidata_id":"Q2993278","title":"Kino sanʼati","sitelink_count":9},{"wikidata_id":"Q4056778","title":"Ogahiy","sitelink_count":7},{"wikidata_id":"Q30895343","title":"Diniy ekstremizm","sitelink_count":3},{"wikidata_id":"Q4374398","title":"Oʻzbekistondagi shaharchalar","sitelink_count":2},{"wikidata_id":"Q16656759","title":"Oʻzbekiston Respublikasi Vazirlar Mahkamasi","sitelink_count":2},{"wikidata_id":"Q18405999","title":"Oʻzbekiston Respublikasining orden va medallari","sitelink_count":2},{"wikidata_id":"Q21645204","title":"Said Ahmad","sitelink_count":2},{"wikidata_id":"Q12823382","title":"Innovatsiya","sitelink_count":1},{"wikidata_id":"Q12823527","title":"Iqtisodiy sikl","sitelink_count":1},{"wikidata_id":"Q12823585","title":"Ishlab chiqarish harajatlari","sitelink_count":1},{"wikidata_id":"Q12824019","title":"Jinsiy gigiyena","sitelink_count":1},{"wikidata_id":"Q12824578","title":"Kategoriyalar","sitelink_count":1},{"wikidata_id":"Q12826477","title":"Mahalliy davlat hokimiyati organlari","sitelink_count":1},{"wikidata_id":"Q12826909","title":"Mavluda Asalxoʻjayeva","sitelink_count":1},{"wikidata_id":"Q12827041","title":"Mehnat muhofazasi","sitelink_count":1},{"wikidata_id":"Q12827182","title":"Metallarni ishlash","sitelink_count":1},{"wikidata_id":"Q12827344","title":"Milliy iqtisodiyot","sitelink_count":1},{"wikidata_id":"Q12827576","title":"Moliya siyosati","sitelink_count":1},{"wikidata_id":"Q12827577","title":"Moliya tizimi","sitelink_count":1},{"wikidata_id":"Q12828009","title":"Mustaqillik deklaratsiyasi (Oʻzbekiston SSR)","sitelink_count":1},{"wikidata_id":"Q12828285","title":"Narx shakllanishi","sitelink_count":1},{"wikidata_id":"Q12828316","title":"Nasos stansiyasi","sitelink_count":1},{"wikidata_id":"Q12828845","title":"Nutq","sitelink_count":1}]}

For 230 items:

{"count":24,"items":[{"wikidata_id":"Q1485500","title":"Tovar","sitelink_count":26},{"wikidata_id":"Q4056778","title":"Ogahiy","sitelink_count":7},{"wikidata_id":"Q4374398","title":"Oʻzbekistondagi shaharchalar","sitelink_count":2},{"wikidata_id":"Q16656759","title":"Oʻzbekiston Respublikasi Vazirlar Mahkamasi","sitelink_count":2},{"wikidata_id":"Q18405999","title":"Oʻzbekiston Respublikasining orden va medallari","sitelink_count":2},{"wikidata_id":"Q21645204","title":"Said Ahmad","sitelink_count":2},{"wikidata_id":"Q12822498","title":"Global muammolar","sitelink_count":1},{"wikidata_id":"Q25529405","title":"Oʻzbekiston taʼlim tizimi","sitelink_count":1},{"wikidata_id":"Q25531103","title":"Makroiqtisodiyot","sitelink_count":1},{"wikidata_id":"Q25531108","title":"Iqtisodiyot nazariyasi","sitelink_count":1},{"wikidata_id":"Q25533577","title":"Yorugʻlik difraksiyasi","sitelink_count":1},{"wikidata_id":"Q25533678","title":"Jahon xoʻjaligi","sitelink_count":1},{"wikidata_id":"Q25534163","title":"Ishlab chiqarish","sitelink_count":1},{"wikidata_id":"Q25847115","title":"Yulduzli tunlar","sitelink_count":1},{"wikidata_id":"Q25527855","title":"Oʻzgaruvchan tok","sitelink_count":1},{"wikidata_id":"Q25528285","title":"Qobiliyat","sitelink_count":1},{"wikidata_id":"Q25528577","title":"Qutadgʻu bilig","sitelink_count":1},{"wikidata_id":"Q12822357","title":"Gidrotexnika inshootlari","sitelink_count":1},{"wikidata_id":"Q25527886","title":"Oʻlchash asboblari","sitelink_count":1},{"wikidata_id":"Q12823585","title":"Ishlab chiqarish harajatlari","sitelink_count":1},{"wikidata_id":"Q12826909","title":"Mavluda Asalxoʻjayeva","sitelink_count":1},{"wikidata_id":"Q12828009","title":"Mustaqillik deklaratsiyasi (Oʻzbekiston SSR)","sitelink_count":1},{"wikidata_id":"Q12830270","title":"Poraxoʻrlik","sitelink_count":1},{"wikidata_id":"Q12832057","title":"Xamsa","sitelink_count":1}]}

I believe 500 was selected as it is the limit of API. More the data returned from the API better the results.

Can you look into how these items are being used? If a higher sitelink_count is important, then can we request the API to return the sorted site link count? I'm worried that if we reduce the number to 50, then we may be returning lower quality results.

Usmanmuhd added a comment.EditedJun 13 2019, 2:09 PM

Yeah, higher sitelink_count is important. It is being sorted here https://github.com/wikimedia/mediawiki-services-recommendation-api/blob/master/lib/article.creation.translation.js#L182.

There's a problem to actually do it.

In the first request (where we have to limit the number) we are only getting:

candidates: {
      "Q5296": "Bosh Sahifa",
      "Q265": "Oʻzbekiston",
      "Q269": "Toshkent",
...................
      "Q7272797": "Qunut duosi"
    }

Then we pass it to a second API request where we get the sitelink_count.

First request being sent here: https://github.com/wikimedia/mediawiki-services-recommendation-api/blob/master/lib/article.creation.translation.js#L176

Second request being sent here: https://github.com/wikimedia/mediawiki-services-recommendation-api/blob/master/lib/article.creation.translation.js#L180

OK, then, let's split up the big request into multiple small requests. Here's a similar example. Each time you'd request 50 items until you reach the total number of candidates.

There is a "continue" attribute in the body of the request when we have more items than the limit. Example:

{"batchcomplete":"","continue":{"gpvimoffset":250,"continue":"gpvimoffset||"},"query":{"pages":{"-4":{"ns":2,"title":"User:Geilamir","missing":"","known":""},"-5":{"ns":2,"title":"User:Logan","missing":"","known":""},"-6":{"ns":2,"title":"User:Courcelles","missing":"","known":""},"-7":{"ns":6,"title":"File:Tanzania in its region.svg","missing":"","known":""},"-1":{"ns":-1,"title":"Special:Contributions/84.198.31.211","special":""},"-2":{"ns":-1,"title":"Special:EmailUser/Troubled asset","special":""},"-3":{"ns":-1,"title":"Special:NewPages","special":""},..........................................."4474":{"pageid":4474,"ns":828,"title":"Module:Citation/CS1/Utilities","pageprops":{"wikibase_item":"Q21993353"}},"4445":{"pageid":4445,"ns":828,"title":"Module:No globals","pageprops":{"wikibase_item":"Q16748603"}}}}}

Should I make use of this attribute to fetch all the items or should we limit it to 500 or the number of items which ever is lesser?

Yes, at most 500.

Usmanmuhd added a comment.EditedJun 14 2019, 12:42 PM

ru.wikipedia.org API behaves much differently from the others.
Example:

How do I handle all these cases?

Currently I have split the requests into 50 items at a time, but I think the problem is different. My solution is not even needed I guess. It's a different problem with Russian Wiki. Even with if the limit is lower and we are requesting more items it should just return the lower limit(example: https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops&generator=mostviewed&ppprop=wikibase_item&gpvimlimit=1000).

Change 516732 merged by Bmansurov:
[mediawiki/services/recommendation-api@master] Throw appropriate error when wmAPI returns internal server error.

https://gerrit.wikimedia.org/r/516732

What you describe in T215222#5258956 looks like a bug in ruwiki. Can you create a task and describe the issue and tag it with MediaWiki-API? Hopefully the core team can help us here.

In the mean time, we should handle the error. As I understand it, only the last request is resulting in an error if the number of results is fewer than 50. So we'd make requests up to 500 items, then filter out errors. Take a look at the second answer here.

Change 517078 had a related patch set uploaded (by Usmanmuhd; owner: Usmanmuhd):
[mediawiki/services/recommendation-api@master] Splits the request to WMAPI in batches

https://gerrit.wikimedia.org/r/517078

Pushed the quick fix for the error. Will report the bug in a while.

npm run test | bunyan giving this error:

......./recommendation-api/lib/article.creation.translation.js
  119:7  error  Parsing error: Unexpected token function

The line 119 is async function getArticlesByPageviews(app, source, projectDomain) {. How to resolve this?

Replied in Gerrit.

Updated the code as requested. Please take a look.

Usmanmuhd added a comment.EditedJun 15 2019, 5:58 AM

@bmansurov http://localhost:6927/en.wikipedia.org/v1/article/creation/translation/ru works perfectly.
I checked http://localhost:6927/uz.wikipedia.org/v1/article/creation/translation/ru with both fix-T215222 branch and the master branch. Both give an error.
Further investigation reveals:
This error is being caused in https://query.wikidata.org/sparql.

Request being sent is:

request: {
	"method": "post",
	"uri": "https://query.wikidata.org/sparql",
	"headers": {
		"user-agent": "recommendation-api"
	},
	"body": {
		"format": "json",
		"query": "SELECT ?item (COUNT(?sitelink) as ?count) WHERE {\n                     VALUES ?item { wd:Q159 wd:Q1432329 wd:Q2614 wd:Q5296 wd:Q486 wd:Q129677 wd:Q170456 wd:Q43416 wd:Q980503 wd:Q840847 wd:Q132971 wd:Q377461 wd:Q207735 wd:Q192218 wd:Q4343990 wd:Q488020 wd:Q392108 wd:Q4062251 wd:Q1377734 wd:Q76421 wd:Q715587 wd:Q4528921 wd:Q639444 wd:Q373501 wd:Q4197404 wd:Q1795024 wd:Q4284234 wd:Q2006869 wd:Q171178 wd:Q4503118 wd:Q3182559 wd:Q15070762 wd:Q215419 wd:Q16147074 wd:Q13580495 wd:Q23884186 wd:Q29564107 wd:Q2060146 wd:Q48741246 wd:Q64486414 wd:Q64485147 wd:Q656 wd:Q649 wd:Q1899 wd:Q8646 wd:Q30487 wd:Q7747 wd:Q361 wd:Q485207 wd:Q36450 wd:Q8682 wd:Q38022 wd:Q319449 wd:Q191779 wd:Q37079 wd:Q15869 wd:Q55172 wd:Q236132 wd:Q200881 wd:Q21652028 wd:Q2808 wd:Q866 wd:Q124179 wd:Q1050926 wd:Q116933 wd:Q4189571 wd:Q4089615 wd:Q275459 wd:Q214204 wd:Q3874799 wd:Q23572 wd:Q4097472 wd:Q3497205 wd:Q54314 wd:Q642878 wd:Q259434 wd:Q4096389 wd:Q362500 wd:Q558112 wd:Q16162037 wd:Q16148930 wd:Q30346628 wd:Q13582368 wd:Q43377320 wd:Q45740018 wd:Q23688132 wd:Q59784750 wd:Q677855 wd:Q15180 wd:Q362 wd:Q1394 wd:Q30 wd:Q212 wd:Q52 wd:Q883 wd:Q232 wd:Q8479 wd:Q855 wd:Q1377159 wd:Q9682 wd:Q8409 wd:Q40787 wd:Q14974 wd:Q211872 wd:Q170963 wd:Q380267 wd:Q1783598 wd:Q79948 wd:Q11571 wd:Q44496 wd:Q842813 wd:Q13182 wd:Q208667 wd:Q1349539 wd:Q209926 wd:Q178190 wd:Q4216344 wd:Q22686 wd:Q3786540 wd:Q201989 wd:Q3378469 wd:Q262613 wd:Q3526023 wd:Q455462 wd:Q590252 wd:Q4470718 wd:Q861211 wd:Q16979983 wd:Q1996546 wd:Q28666047 wd:Q25207350 wd:Q30887543 wd:Q23758108 wd:Q4630358 wd:Q61050091 wd:Q64481622 wd:Q145 wd:Q7200 wd:Q7996 wd:Q148 wd:Q1520 wd:Q183 wd:Q458 wd:Q462 wd:Q1246 wd:Q487125 wd:Q134447 wd:Q112707 wd:Q352 wd:Q236 wd:Q186161 wd:Q36704 wd:Q80877 wd:Q153348 wd:Q590183 wd:Q13909 wd:Q5167679 wd:Q3279825 wd:Q488 wd:Q381884 wd:Q176846 wd:Q46333 wd:Q4030360 wd:Q229535 wd:Q4399463 wd:Q11239 wd:Q317521 wd:Q183134 wd:Q4104525 wd:Q208026 wd:Q184 wd:Q235132 wd:Q209330 wd:Q155979 wd:Q4410005 wd:Q983419 wd:Q271500 wd:Q18406980 wd:Q18340732 wd:Q20483301 wd:Q20968495 wd:Q19873404 wd:Q29413154 wd:Q62079006 wd:Q34266 wd:Q189266 wd:Q1829 wd:Q891 wd:Q16 wd:Q801 wd:Q120180 wd:Q151789 wd:Q553302 wd:Q34453 wd:Q35314 wd:Q2685 wd:Q41112 wd:Q38111 wd:Q39864 wd:Q15189 wd:Q83171 wd:Q154852 wd:Q38404 wd:Q190029 wd:Q472664 wd:Q898983 wd:Q615 wd:Q4482111 wd:Q210603 wd:Q135134 wd:Q4302852 wd:Q192156 wd:Q186304 wd:Q142794 wd:Q4054802 wd:Q1861079 wd:Q170645 wd:Q4160262 wd:Q25177 wd:Q69573 wd:Q1287048 wd:Q2632900 wd:Q1000592 wd:Q240573 wd:Q15616276 wd:Q15732802 wd:Q48948659 wd:Q47037314 wd:Q52951815 wd:Q27188178 wd:Q56355429 wd:Q61740435 wd:Q63985408 wd:Q5187 wd:Q887 wd:Q189 wd:Q17 wd:Q230 wd:Q668 wd:Q20 wd:Q265 wd:Q39 wd:Q5118 wd:Q79793 wd:Q765165 wd:Q3772 wd:Q194154 wd:Q130734 wd:Q131755 wd:Q47657 wd:Q188538 wd:Q15193 wd:Q847687 wd:Q79031 wd:Q886518 wd:Q190845 wd:Q194121 wd:Q23633 wd:Q39233 wd:Q4391645 wd:Q2498491 wd:Q1047836 wd:Q4244908 wd:Q45875 wd:Q7835 wd:Q3657739 wd:Q4360641 wd:Q4234753 wd:Q15640304 wd:Q14944179 wd:Q5162259 wd:Q21546165 wd:Q27950674 wd:Q28561969 wd:Q49883590 wd:Q7822327 wd:Q52722431 wd:Q60851027 wd:Q56355467 wd:Q62399533 wd:Q142 wd:Q46 wd:Q187846 wd:Q408 wd:Q2359642 wd:Q146 wd:Q150 wd:Q403 wd:Q7252 wd:Q192457 wd:Q7156 wd:Q41083 wd:Q4616 wd:Q178750 wd:Q3884 wd:Q186341 wd:Q44727 wd:Q37175 wd:Q2831 wd:Q199943 wd:Q219937 wd:Q1372387 wd:Q35791 wd:Q1275963 wd:Q36107 wd:Q1123288 wd:Q754673 wd:Q4234472 wd:Q169963 wd:Q713439 wd:Q289212 wd:Q126599 wd:Q355 wd:Q193659 wd:Q706497 wd:Q4310876 wd:Q4224 wd:Q1191065 wd:Q4158245 wd:Q49740 wd:Q32045 wd:Q222 wd:Q4173083 wd:Q4098374 wd:Q18161349 wd:Q21621995 wd:Q20735644 wd:Q9095390 wd:Q20071491 wd:Q79911 wd:Q399 wd:Q41591 wd:Q794 wd:Q839920 wd:Q544 wd:Q363371 wd:Q12192 wd:Q31628 wd:Q8683 wd:Q7525 wd:Q459216 wd:Q179250 wd:Q4340209 wd:Q8043 wd:Q51993 wd:Q1744 wd:Q93312 wd:Q15862 wd:Q80510 wd:Q13874224 wd:Q311374 wd:Q178194 wd:Q178598 wd:Q83085 wd:Q4390259 wd:Q133087 wd:Q81819 wd:Q185657 wd:Q35073 wd:Q165219 wd:Q11835640 wd:Q209499 wd:Q10738 wd:Q7243 wd:Q208415 wd:Q103946 wd:Q279119 wd:Q1079 wd:Q234458 wd:Q262495 wd:Q15218282 wd:Q130585 wd:Q18395957 wd:Q53164830 wd:Q6276385 wd:Q7779 wd:Q38 wd:Q33 wd:Q227 wd:Q7184 wd:Q208460 wd:Q192962 wd:Q1286 wd:Q11085 wd:Q181915 wd:Q23530 wd:Q4516623 wd:Q4132614 wd:Q6216 wd:Q32522 wd:Q726080 wd:Q8398 wd:Q213115 wd:Q70231 wd:Q214601 wd:Q4275889 wd:Q205764 wd:Q29269 wd:Q131191 wd:Q264783 wd:Q206901 wd:Q25589 wd:Q4393561 wd:Q295542 wd:Q1123836 wd:Q3986754 wd:Q4163746 wd:Q816695 wd:Q8539 wd:Q1172164 wd:Q981074 wd:Q242352 wd:Q1052459 wd:Q271534 wd:Q298276 wd:Q1369019 wd:Q573463 wd:Q2896171 wd:Q16335075 wd:Q4630361 wd:Q28500396 wd:Q42728914 wd:Q55621609 }\n                     FILTER NOT EXISTS { ?item wdt:P31 wd:Q4167410 . }\n                     OPTIONAL { ?sitelink schema:about ?item }\n                     FILTER NOT EXISTS {\n                       ?article schema:about ?item .\n                       ?article schema:isPartOf <https://uz.wikipedia.org/> .\n                     }\n                   } GROUP BY ?item"
	}
}

It's returning an error as the query is being timed out.

Edit 1: Tried the above query with 1 item and still it takes a very long time. Seems like an error in the query.wikipedia.org.
Edit 2: Splitting it into smaller batches seems to work.

Pushed the changes after making the filter() use batches as well.

Change 517078 merged by Bmansurov:
[mediawiki/services/recommendation-api@master] Splits the request to MediaWiki API and Wikidata query service in batches.

https://gerrit.wikimedia.org/r/517078

Usmanmuhd added a comment.EditedJun 17 2019, 1:46 PM

A few observations:

  1. https://en.wikipedia.org/api/rest_v1/data/recommendation/article/creation/translation/ru?count=5 returns 24 items itself. Works as expected on local machine.
  2. Should we explore the tests for this API?
  3. Using 50 for sparql currently suffices, but we risk running into 429 error due to this. Should we increase the limit?

Edit 1: As we are going to complete T216750: Article recommendation API: replace WDQS with MW API, should we even explore the 3rd point?

  1. We'll need to deploy the changes and see if that fixes the problem in production.
  1. I left some comments about tests in the gerrit patch. Basically, we want tests, but this end point may go away, so let's not spend time on tests now. We can revisit this point later.
  1. Yes, let's not worry about Sparql for now. If we cannot replace Sparql with the MediaWiki API, then we can come back to it.

Gerrit seems down for me. I'll try to deploy your changes later today/tomorrow.

@Usmanmuhd you're offline on IRC, so I'm replying here:

<muhdusman> [06-17 09:22:03] bmansurov: Made the changes as requested. I
edited the commit message in the gerrit web page as I could
not figure out a way to edit it in my local machine. How do
I actually do it locally and how do I pull the changes on my
local machine now?

Locally, you'd do git commit --amend and edit the commit message (if your patch has already been merged you cannot push your edits). If you're on master it's best to reset the head to an earlier commit. Something like this: git reset --hard HEAD~5 then do git pull origin master. If you're in a different branch, switch to master and then pull the changes (no need to reset if you didn't work on master directly).

Thanks, shall I move on to the next task or do I have something else to do before moving on to the next one?

Yes, please move on to the next task.

bmansurov closed this task as Resolved.EditedJun 18 2019, 4:04 PM
bmansurov assigned this task to Usmanmuhd.
bmansurov moved this task from Staged to Done (current quarter) on the Research board.

@Usmanmuhd good job! The end point is working after deployment:

{"count":24,"items":[{"wikidata_id":"Q15640304","title":"Корж, Макс","sitelink_count":10},{"wikidata_id":"Q4517998","title":"Чумаков, Алексей Георгиевич","sitelink_count":9},{"wikidata_id":"Q47037314","title":"Элджей","sitelink_count":8},{"wikidata_id":"Q3179330","title":"Guf","sitelink_count":7},{"wikidata_id":"Q16162037","title":"Black Star Inc.","sitelink_count":7},{"wikidata_id":"Q348481","title":"Троицын день","sitelink_count":6},{"wikidata_id":"Q4160262","title":"Джиган","sitelink_count":6},{"wikidata_id":"Q4234472","title":"Корчевников, Борис Вячеславович","sitelink_count":5},{"wikidata_id":"Q4097472","title":"Брюханов, Виктор Петрович","sitelink_count":5},{"wikidata_id":"Q48948659","title":"Ивлеева, Настя","sitelink_count":5},{"wikidata_id":"Q4030360","title":"250 лучших фильмов по версии IMDb","sitelink_count":5},{"wikidata_id":"Q4522923","title":"Шепелев, Дмитрий Андреевич (телеведущий)","sitelink_count":4},{"wikidata_id":"Q4069842","title":"Аронова, Мария Валерьевна","sitelink_count":4},{"wikidata_id":"Q4347544","title":"Пашинин, Анатолий Анатольевич","sitelink_count":4},{"wikidata_id":"Q4150122","title":"Грозовые ворота","sitelink_count":4},{"wikidata_id":"Q3786540","title":"RuTracker.org","sitelink_count":4},{"wikidata_id":"Q4187597","title":"Зарубина, Ольга Владимировна","sitelink_count":4},{"wikidata_id":"Q12111927","title":"Комаров, Дмитрий Константинович","sitelink_count":3},{"wikidata_id":"Q64486414","title":"Дело Ивана Голунова","sitelink_count":3},{"wikidata_id":"Q61050091","title":"Белорусских, Тима","sitelink_count":3},{"wikidata_id":"Q18699323","title":"Федермессер, Анна Константиновна","sitelink_count":3},{"wikidata_id":"Q4089615","title":"Богомолов, Константин Юрьевич","sitelink_count":3},{"wikidata_id":"Q28665437","title":"Старшенбаум, Ирина Владимировна","sitelink_count":3},{"wikidata_id":"Q4427091","title":"Соколов, Андрей Алексеевич","sitelink_count":3}]}
bmansurov moved this task from Backlog to Done on the Recommendation-API board.Jun 18 2019, 4:04 PM
bmansurov moved this task from Backlog to Done on the Article-Recommendation board.

Thanks!
Minor point:

  1. Why is the count:24 even after passing count=5? It works as expected on my local env.

The production server has a different configuration file in this repository. Maybe that's overriding the query parameter.

I think the batching introduced here is indeed causing 429s in CI: T226264: Recommendation-API CI testing is flaky due to frequent 429s from Wikidata Query Service

I like the idea of dropping the WDQS queries in favor of using the MW API. In the meantime, since this endpoint isn't used in production, any objection to temporarily marking the test as skipped?

Sorry about that, @Mholloway. We'll take care of the failing tests.

@Usmanmuhd could you take a look at the above task and prioritize it? Thanks.

@bmansurov I think T216750: Article recommendation API: replace WDQS with MW API should solve the issue. If the issue still persists, I'll come back to it.

Marking tests as skipped doesn't take a lot of time and allows us not to get false alerts. How long before you can get T216750 done?

Hopefully will get it done by Monday or Tuesday.