Page MenuHomePhabricator

Update the section suggestion database from latest CX corpus dump
Closed, ResolvedPublic

Description

The section suggestion feature of cxserver is based on the database of section title pairs prepared from all the translations in Content Translations. This was prepared more than an year ago, and we have bigger and latest CX Corpus dumps.

Updating the database will help better section title suggestions

Event Timeline

santhosh triaged this task as Medium priority.Apr 1 2021, 9:53 AM

Change 674753 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Update section title mapping database based on latest CX Corpus

https://gerrit.wikimedia.org/r/674753

Change 674753 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Update section title mapping database based on latest CX Corpus

https://gerrit.wikimedia.org/r/674753

Change 682032 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2021-04-21-044024-production

https://gerrit.wikimedia.org/r/682032

Change 682032 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2021-04-21-044024-production

https://gerrit.wikimedia.org/r/682032

Mentioned in SAL (#wikimedia-operations) [2021-04-26T03:43:04Z] <kart_> Updated cxserver to 2021-04-21-044024-production (T279045)

https://cxserver.wikimedia.org/v2/suggest/sections/Salt/en/es (you can try changing params) is the API that this database serves. We expanded the database with a little bit more languages and section titles as part of this ticket. The suggestions API should be functional as it used to be - that is the only user facing thing to check

https://cxserver.wikimedia.org/v2/suggest/sections/Salt/en/es (you can try changing params) is the API that this database serves. We expanded the database with a little bit more languages and section titles as part of this ticket. The suggestions API should be functional as it used to be - that is the only user facing thing to check

the only issue I could find is this, although I'm not sure if it is a real issue:
when I call the API copying the name of the article from the URL I get this error:

image.png (1×1 px, 142 KB)

it seems to have trouble decoding the URL, which should be this one:
image.png (978×1 px, 129 KB)

is this a real issue or just me not using it properly?