Recommendation API in beta labs doesn't work
Open, HighPublic

Description

While the recommendation API is working in production, A similar URL in beta labs isn't working. It would be nice to get it working in beta labs for testing before deployment

bmansurov triaged this task as High priority.

It is setup on deployment-sca, but I could not make it work. Needs more investigation why.

bmansurov renamed this task from Setup recommendation API in beta labs to Recommendation API in beta labs doesn't work.Fri, Dec 7, 12:11 AM
bmansurov updated the task description. (Show Details)
mobrovac added a subscriber: mobrovac.

Please describe what exactly is not working.

bmansurov updated the task description. (Show Details)Fri, Dec 7, 1:25 PM
mobrovac closed this task as Invalid.Fri, Dec 7, 2:03 PM

The service in beta can work only for domains available in beta. In the task description, however, you are trying to use a production project's domain.

@mobrovac where can I see the list of domains available in beta?

@mobrovac where can I see the list of domains available in beta?

The list of domains that RESTBase is enabled for in Beta is available here.

bmansurov added a comment.EditedFri, Dec 7, 2:15 PM

I'm still getting an error for an existing domain: https://recommendation-api-beta.wmflabs.org/es.wikipedia.beta.wmflabs.org/v1/article/morelike/translation/Libro

{"status":504,"type":"internal_http_error","detail":"504: internal_http_error","method":"post","uri":"http://deployment-mediawiki04.deployment-prep.eqiad.wmflabs/w/api.php"}
bmansurov updated the task description. (Show Details)Fri, Dec 7, 2:17 PM
mobrovac reopened this task as Open.Fri, Dec 7, 3:18 PM

Indeed. The mediawiki host has changed. Will fix it.

Ok, the mediawiki host issue has been fixed, but now we are running into WDQS not being available:

{"status":504,"type":"internal_http_error","detail":"504: internal_http_error","method":"post","uri":"http://wdqs-test.wmflabs.org/sparql"}

Mmmm.

First things first, article 'Libro' does not exist in the beta Spanish wiki.

The more interesting issue is that there's no WDQS in deployment-prep and wdqs-test is some old domain that's not even resolvable anymore. @Smalyshev suggested going to the production via query.wikidata.org. Changing it helps a bit - the service is returning 404 instead of 503 now, but I guess since the production query service has no idea about beta, it will never work.

@bmansurov could you please give a little more context on how the recommendation service use WDQS?

We use WDQS when we need to get article titles in a set of languages give a Wikidata item ID.

I'll update the task description to point to an existing article.

Other pages, for example this one, are returning a 404 and I cannot create Libro because the wiki is locked: T109157: Put beta eswiki to read-only mode.

Could you add some more details about how the API uses WDQS so I could see how this could be fixed/changed/improved?

Pchelolo added a comment.EditedFri, Dec 7, 8:20 PM

We use WDQS when we need to get article titles in a set of languages give a Wikidata item ID.

Wouldn't using something like https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q1&props=sitelinks&sitefilter=enwiki|frwiki|ruwiki be easier or I still do not understand what's going on?

If that's a reasonable replacement - that would probably make the service much faster and remove dependency on WDQS

Wouldn't using something like https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q1&props=sitelinks&sitefilter=enwiki|frwiki|ruwiki be easier or I still do not understand what's going on?

If that's a reasonable replacement - that would probably make the service much faster and remove dependency on WDQS

Yes that works too as long as both APIs return the same thing. I'm not sure if WDQS data is newer than the MW API.

No, WDQS data can't be newer than Wikidata data because WDQS is updated from Wikidata.

I think we have a consensus here. @Smalyshev would you agree with my proposal?

Recommendation API should use MW API to grab article names in selected wikis instead of WDQS. This will be much more efficient, remove the dependency from WDQS, probably reduce the latency for the recommendation API and will allow us to make the beta instance of the service work.

The exact query that's being used:

`SELECT ?item (COUNT(?sitelink) as ?count) WHERE {
                     VALUES ?item { ${items} }
                     FILTER NOT EXISTS { ?item wdt:P31 wd:Q4167410 . }
                     OPTIONAL { ?sitelink schema:about ?item }
                     FILTER NOT EXISTS {
                       ?article schema:about ?item .
                       ?article schema:isPartOf <https://${target}.${projectDomain}/> .
                     }
                   } GROUP BY ?item`;

So it's not exactly the MW API call, but I bet it's replaceable

This is the Sparql query I was talking about.

@Pchelolo Yes, in this case MWAPI is probably better because WDQS has no idea about secondary domains like beta, test, etc. We could in theory set it up, but using MWAPI is probably much easier.

The query that @bmansurov quotes seems to be replaceable by MWAPI call. My one comment to this query that it doesn't seem to distinguish between projects - e.g. it returns links for Wikipedia, Wikisource, Wikiquote, etc. and makes no distinction between them. Not sure whether it's the intended result or no. But if you switch to MWAPI that's irrelevant I presume.

@Pchelolo Yes, in this case MWAPI is probably better because WDQS has no idea about secondary domains like beta, test, etc. We could in theory set it up, but using MWAPI is probably much easier.

The query that @bmansurov quotes seems to be replaceable by MWAPI call. My one comment to this query that it doesn't seem to distinguish between projects - e.g. it returns links for Wikipedia, Wikisource, Wikiquote, etc. and makes no distinction between them. Not sure whether it's the intended result or no. But if you switch to MWAPI that's irrelevant I presume.

Oh, I only wanted Wikipedias ;) I'll submit a patch to use MW API.

The one that I've posted I think is not easily replaceable with MW API, however, I think it's possible with a bit of code and might be faster than going to WDQS.

We got quite side-tracked from the original goal here, maybe let's file a separate ticket to remove the dependency on WDQS. It will be really nice for production too, simplifying setting up Beta is not the only goal of that work.

OK, I'll create a subtask. I'll remove dependency fro the new API endpoint for now. We can come back to the one you posted.

The one that I've posted I think is not easily replaceable with MW API, however

I am not completely sure what it's supposed to do, but I guess it can probably be replaced, if you'd like.

I think it's possible with a bit of code and might be faster than going to WDQS.

Depends on how many items do you have. For one item, it's probably faster, especially if you already have item data loaded. For multiple ones, not so sure.