Extension to provide access via the dict protocol
Open, LowPublic

Description

After talking with someone on Wiktionary, I had a great idea for an
extension: a MediaWiki implementation of the dict protocol
http://en.wikipedia.org/wiki/DICT

I think this would involve writing an API client that could listen on
port 2628 for dict queries, ask mediawiki to fulfill the request, and
then send a reply to the dict client. Really, just a SMOP. ;)

Cf. https://www.mediawiki.org/wiki/Content_translation/FAQ#What_dictionaries_will_be_available.3F for an "internal" use case.


Version: unspecified
Severity: normal

Details

Reference
bz29229
bzimport raised the priority of this task from to Low.
bzimport set Reference to bz29229.
bzimport added a subscriber: Unknown Object (MLST).

It appears that this protocol expects responses in text/plain (unless OPTION MIME is used, in which case it could have multiple types, but the impression i get [after not very much reading, so could be wrong] is dict servers should support plain text)

Converting wiktionary definitions to plain text is not exactly trivial.

(In reply to comment #2)

Cough cough http://puszcza.gnu.org.ua/software/dico/modules.html

Case in point:

From http://dicoweb.gnu.org.ua/?q=food&db=en-wiktionary&define=1 :

External links

(2, '{{')

projectlinks

pedia

}}commons


Seems as if the plugin still has something to be desired.

Converting wiktionary definitions to plain text is not exactly
trivial.

Which is why I wouldn't want the module to do it, necessarily. The
output of

links -dump http://en.wiktionary.org/wiki/quotidian

looks at least partly usable.

How significant is the interest in this, what is the target auditory? Any use cases not covered by web interface or the API?

  • Bug 57800 has been marked as a duplicate of this bug. ***

I wonder if https://www.mediawiki.org/wiki/Extension:TextExtracts has some code which can be recycled for this purpose (i.e. if this API could be provided within that extension).

Cf. https://www.mediawiki.org/wiki/Content_translation/FAQ#What_dictionaries_will_be_available.3F for an "internal" use case.

Qgil added a subscriber: Qgil.
Sumit added a subscriber: Sumit.Feb 10 2015, 2:53 PM

I'm interested in helping develope this. I feel the feature of instant word lookup would greatly help mobile users by avoiding page reload. I have very rough ideas related to implementing it both in mobile and desktop. I'd like to know if this could be taken up as a GSOC project, for I'm willing to bring this feature to mediawiki?

I'm interested in helping develope this. I feel the feature of instant word lookup would greatly help mobile users by avoiding page reload. I have very rough ideas related to implementing it both in mobile and desktop. I'd like to know if this could be taken up as a GSOC project, for I'm willing to bring this feature to mediawiki?

We need a mentor. You could try and contact @MaxSem to figure out whether the feature fits TextExtracts and/or whether he's interested in mentoring.

I feel the feature of instant word lookup would greatly help mobile users by avoiding page reload.

Eh, we already have nice HTTP APIs for that, about billion times more useful and powerful than the outdated DICT protocol.

whether the feature fits TextExtracts

You mean whether TextExtracts can be used s a text provider for DICT? It can, however it is currently able only to extract from WIkipedia, not Wiktionary.

and/or whether he's interested in mentoring.

I'm not unless someone provides a real use case that will benefit a significant number of end users.

Sumit added a comment.EditedFeb 10 2015, 7:52 PM

My idea is slightly different in the sense, that I intend to use the existing mediawiki api - http://en.wiktionary.org/w/api.php, to query for a selected text when a user selects a text and demands meaning. The meaning could be fetched using an ajax requet without reload of any content, and displayed on something like a tooltip or a floating popup thereby providing the meaning in place. The main focus here lies in deciding the best possible meaning or extract that could be displayed in the small space provided for the meaning. We can do away with developing on the dict protocol, if the api already provides for a robust mechanism for fetching data...and if possible maybe, get meanings in other languages too

Qgil added a comment.Feb 10 2015, 8:55 PM

@MaxSem, just a bit of context. This project idea was listed at https://www.mediawiki.org/wiki/Outreach_programs/Possible_projects#Make_Wiktionary_definitions_available_via_the_dict_protocol, and this is why @Sumit got interested. Maybe this project is not a good candidate for GSoC after all, but this won't be Sumit's fault. :)

Let's decide whether this is a good GSoC 2015 project or not, and if not whether we should close this proposal as Declined or not.

Yeah. My point is that there has been nothing in this request that indicates why we should spend on it even the amount of time we're spending on chatting in this bug.

Nemo_bis updated the task description. (Show Details)Feb 11 2015, 8:50 AM
Nemo_bis set Security to None.
Qgil added a subscriber: Yurik.Feb 11 2015, 12:33 PM
Qgil added a comment.Feb 11 2015, 1:44 PM

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

People wanted dict protocol so that applications that support it could use wiktionary as a data source without any modifications.

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

There was already T59800 which I marked as duplicate. I was not aware that separate applications needed separate tasks :-(

Qgil added a comment.Mar 18 2015, 1:39 PM

If different possible-tech-projects are in fact the same, it is good to merge them. GSoC / Outreachy applications from candidates should be filed as subtasks and not merge them, but this was not the case of T59800. You did right there.

mxn added a subscriber: mxn.Apr 6 2015, 2:07 PM

That's what Wikidata should do, but we are still far from it.

The next best thing in the meantime would be to reuse existing Wiktionary parsers, create dict databases (or other formats, e.g. T93340 for fr.wikt) and simply host them on Toolforge.

That's what Wikidata should do, but we are still far from it.

I don't think Wikidata is supposed to work as DICT server (at most, WikibaseRepo could, but on OmegaWiki)
@Lydia_Pintscher please correct me if I'm wrong.

@Ricordisamoa: I used a shortcut.

Currently Wiktionary pages only have a plain-text structure, and since one page can include a lot of different languages, homographs and meanings, a parser is necessary to extract every information.

With Wikidata it could be possible to get those data (e.g. a list of meanings for a word) without the extra parser step, because the data would be stored in a structured way.

The DICT server part would be a separate matter (via an extension or Tool).

Agreed.

With whom? On what?

Sorry. Should have been more verbose. I agree that we currently don't plan to provide a dict server. And that it would be possible with Wikidata support for Wiktionary to provide this -- potentially as a standalone service -- without having to do all the parsing.

Sorry. Should have been more verbose. I agree that we currently don't plan to provide a dict server. And that it would be possible with Wikidata support for Wiktionary to provide this -- potentially as a standalone service -- without having to do all the parsing.

Thanks :)

Qgil added a comment.Sep 23 2015, 9:10 AM

This is a message posted to all tasks under "Re-check in September 2015" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 23 2015, 9:10 AM
Qgil added a comment.Sep 23 2015, 9:36 AM

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

This is the last call for Possible-Tech-Projects missing mentors. The application deadline for Outreachy-Round-11 is 2015-11-02. If this proposal doesn't have two mentors assigned by the end of Thursday, October 22, it will be moved as a candidate for the next round.

Interested in mentoring? Check the documentation for possible mentors.

As previously mentioned, this task is moved to 'Recheck in February 2016' as it doesn't have two mentors assigned to it as of today, October 23 - 2015. The project will be included in the discussion of next iteration of GSoC/Outreachy, and is excluded from #Outreachy-11. Potential candidates are discouraged from submitting proposals to this task for #Outreachy-11 as it lacks mentors in this round.

Sumit added a comment.Sep 11 2016, 4:20 PM

Any consensus to provide access via dict protocol? Can we have this for Outreachy-13 as an internship project?

I don't think it makes sense to do this without the more important work being done first: making structured data for Wiktionary with the help of Wikidata work.

Sumit added a comment.Sep 11 2016, 4:29 PM

I don't think it makes sense to do this without the more important work being done first: making structured data for Wiktionary with the help of Wikidata work.

@Lydia_Pintscher any task tracking the above?

I don't think it makes sense to do this without the more important work being done first: making structured data for Wiktionary with the help of Wikidata work.

Since that is a big task, I think it is better to remove this one from Possible-Tech-Projects in the meantime.