Page MenuHomePhabricator

Create basic Mirandese analysis chain
Closed, ResolvedPublic

Description

The goal is to create a basic Mirandese (mwl) analysis chain, with elision processing for d', l', and qu' and a basic stop word list, and set it up running on Mirandese Wikipedia data on RelForge.

See discussion on Mirandese Village Pump for more. @Athena has created a Mirandese stop word list (adapted from a Portuguese one), available on GitHub.

Once it is up on RelForge, if everything looks good, we'll work on getting it deployed to prod and then reindex the Mirandese Wikipedia!

Details

Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : masterCreate basic Mirandese analysis chain

Related Objects

Event Timeline

TJones created this task.May 18 2018, 12:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 18 2018, 12:52 PM

We have a working prototype on RelForge! Note that the prototype only includes the index, not the content of the Mirandese Wikipedia, so all links on the search results page are red. It's running in WMF Labs, so it has the unicorn logo instead of the Wikipedia logo.

The elision processing (handling l', d', and qu') allows for more recall: searching for acupa in prod gives 115 results, in labs it gives 122. Searching for l'acupa in prod gives 0 results, in labs it gives the same 122 results.

The stop words improve recall and change scoring and thus ranking of results. In prod, la almanha gives 278 results. In labs, it gets 282 results, and the article "Seclo XX" moves up from 6th on the list to 3rd, presumably because it has more matches to almanha, and la doesn't add to the full text scoring.

Change 441253 had a related patch set uploaded (by Tjones; owner: Tjones):
[mediawiki/extensions/CirrusSearch@master] Create basic Mirandese analysis chain

https://gerrit.wikimedia.org/r/441253

Change 441253 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Create basic Mirandese analysis chain

https://gerrit.wikimedia.org/r/441253

Vvjjkkii renamed this task from Create basic Mirandese analysis chain to ascaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii removed TJones as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
TJones renamed this task from ascaaaaaaa to Create basic Mirandese analysis chain.Jul 2 2018, 2:56 PM
TJones claimed this task.
TJones raised the priority of this task from High to Needs Triage.
TJones updated the task description. (Show Details)
TJones added subscribers: Aklapper, Gerrit.
TJones edited subscribers, added: GerritBot; removed: Gerrit.
TJones edited subscribers, added: gerritbot; removed: GerritBot.

So many things labelled "gerrit" and "gerritbot". Sorry for the extra notifications.

debt closed this task as Resolved.Jul 9 2018, 11:54 PM
debt added a subscriber: debt.

Closing this as it rides the train this week. The follow-up ticket is T194941 to re-index the Mirandese Wikipedia site.