Update the section suggestion database from latest CX corpus dump
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	santhosh
	Apr 1 2021, 9:53 AM

Description

The section suggestion feature of cxserver is based on the database of section title pairs prepared from all the translations in Content Translations. This was prepared more than an year ago, and we have bigger and latest CX Corpus dumps.

Updating the database will help better section title suggestions

Details

	Subject	Repo	Branch	Lines +/-
	Update cxserver to 2021-04-21-044024-production	operations/deployment-charts	master	+1 -1
	Update section title mapping database based on latest CX Corpus	mediawiki/services/cxserver	master	+3 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T279064 Observations from first research study for Section Translation on Bengali Wikipedia
Open	None	T276212 Improve section mapping for Section Translation
Resolved	santhosh	T279045 Update the section suggestion database from latest CX corpus dump

Event Timeline

santhosh created this task.Apr 1 2021, 9:53 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 1 2021, 9:53 AM

santhosh triaged this task as Medium priority.Apr 1 2021, 9:53 AM

Change 674753 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Update section title mapping database based on latest CX Corpus

https://gerrit.wikimedia.org/r/674753

gerritbot added a project: Patch-For-Review.Apr 1 2021, 9:56 AM

santhosh added a parent task: T276212: Improve section mapping for Section Translation.Apr 1 2021, 9:57 AM

Pginer-WMF moved this task from Backlog to General infrastructure on the SectionTranslation board.Apr 12 2021, 1:10 PM

Change 674753 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Update section title mapping database based on latest CX Corpus

https://gerrit.wikimedia.org/r/674753

Maintenance_bot removed a project: Patch-For-Review.Apr 21 2021, 5:10 AM

Change 682032 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2021-04-21-044024-production

https://gerrit.wikimedia.org/r/682032

gerritbot added a project: Patch-For-Review.Apr 23 2021, 4:59 AM

Change 682032 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2021-04-21-044024-production

https://gerrit.wikimedia.org/r/682032

KartikMistry added a project: Language-Team (Language-2021-April-June).Apr 26 2021, 3:38 AM

KartikMistry moved this task from Quarter Backlog to Needs QA on the Language-Team (Language-2021-April-June) board.

Mentioned in SAL (#wikimedia-operations) [2021-04-26T03:43:04Z] <kart_> Updated cxserver to 2021-04-21-044024-production (T279045)

Maintenance_bot removed a project: Patch-For-Review.Apr 26 2021, 4:10 AM

How can I check this?

https://cxserver.wikimedia.org/v2/suggest/sections/Salt/en/es (you can try changing params) is the API that this database serves. We expanded the database with a little bit more languages and section titles as part of this ticket. The suggestions API should be functional as it used to be - that is the only user facing thing to check

In T279045#7051907, @santhosh wrote:

https://cxserver.wikimedia.org/v2/suggest/sections/Salt/en/es (you can try changing params) is the API that this database serves. We expanded the database with a little bit more languages and section titles as part of this ticket. The suggestions API should be functional as it used to be - that is the only user facing thing to check

the only issue I could find is this, although I'm not sure if it is a real issue:
when I call the API copying the name of the article from the URL I get this error:

it seems to have trouble decoding the URL, which should be this one:

is this a real issue or just me not using it properly?

I just used my browser to test this. Both links below gave results without error:

Not just with diacritics mark, with all diacritics marks too. Example:

The above behvior you noticed - may be related to the API testing tool you are using?

it's possible that is the tool, thanks.

Pginer-WMF closed this task as Resolved.Jul 1 2021, 2:23 PM

	F34453783: image.png
	May 14 2021, 10:43 AM

	F34453785: image.png
	May 14 2021, 10:43 AM

Update the section suggestion database from latest CX corpus dumpClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Update the section suggestion database from latest CX corpus dump
Closed, ResolvedPublic
Actions

Related Objects
Search...