Page MenuHomePhabricator

Provide an efficient API for Lexeme read usage
Open, Needs TriagePublic0 Estimated Story Points

Description

Use case: Given a grapheme (string of text), and probably a contextual language, I'd like to get a "sense" response from an efficient Wikidata.org API.

Current situation:

action=wbsearchentities &type=lexeme &language=en &search=first; the search response is:

"search": [
    {
        "repository": "",
        "id": "L2",
        "concepturi": "http://www.wikidata.org/entity/L2",
        "title": "Lexeme:L2",
        "pageid": 54386964,
        "url": "//www.wikidata.org/wiki/Lexeme:L2",
        "label": "first",
        "description": "English, noun",
        "match": {
            "type": "label",
            "language": "en",
            "text": "first"
        }
    },
    {
        "repository": "",
        "id": "L14410",
        "concepturi": "http://www.wikidata.org/entity/L14410",
        "title": "Lexeme:L14410",
        "pageid": 56141960,
        "url": "//www.wikidata.org/wiki/Lexeme:L14410",
        "label": "firstly",
        "description": "English, adverb",
        "match": {
            "type": "label",
            "language": "en",
            "text": "firstly"
        }
    },
    {
        "repository": "",
        "id": "L34200",
        "concepturi": "http://www.wikidata.org/entity/L34200",
        "title": "Lexeme:L34200",
        "pageid": 57898177,
        "url": "//www.wikidata.org/wiki/Lexeme:L34200",
        "label": "firsthand",
        "description": "English, adjective",
        "match": {
            "type": "label",
            "language": "en",
            "text": "firsthand"
        }
    }
],

Then sling the resultant id into action=wbgetentities &language=en &ids=L2; the senses response is:

"senses": [
	{
		"id": "L2-S1",
		"glosses": {
			"en": {
				"language": "en",
				"value": "Element in an ordered list which comes before all others according to the ordering"
			},
			"de": {
				"language": "de",
				"value": "einer Ordnung folgend das Element vor allen anderen"
			},
			"es": {
				"language": "es",
				"value": "Elemento que se ubica antes que todos los dem\u00e1s en una lista ordenada."
			},
			"eu": {
				"language": "eu",
				"value": "Ordenatutako zerrenda batean besteen aurretik dagoen elementua"
			},
			"te": {
				"language": "te",
				"value": "\u0c2e\u0c4a\u0c26\u0c1f\u0c3f "
			},
			"pt-br": {
				"language": "pt-br",
				"value": "Elemento em uma lista ordenada que precede todos os demais conforme o ordenamento"
			}
		},

Problems:

  • Two API calls which the user has to daisy-chain.
  • Lots of extraneous data I don't care about.
  • (?) There's no language filter for limiting the language describing the lexeme/language/sense tuple.

Desired outcome:

  • A single, light-weight API call that given a grapheme and optionally a contextual language can respond with one or more lexeme/sense responses in a given output language.

e.g. ?action=wblgetsenses &grapheme=first &glanguage=en &language=de

"lexemes": {
	"L2": {
		"pageid": 54386964,
		"id": "L2",
		"lemmas": {
			"en": {
				"language": "en",
				"value": "first"
			}
		},
		"lexicalCategory": "Q1084",
		"language": "Q1860",
		"senses": [
			{
				"id": "L2-S1",
				"glosses": {
					"language": "de",
					"value": "einer Ordnung folgend das Element vor allen anderen"
				}
			}
		]
	}
}

Or, for a more complex situation, e.g. ?action=wblgetsenses &grapheme=Gift &language=de

"lexemes": {
	"L16827": {
		"pageid": 56340177,
		"id": "L16827",
		"language": "Q1860",
		"lemmas": {
			"en": {
				"language": "en",
				"value": "gift"
			}
		},
		"lexicalCategory": "Q24905",
		"language": "Q1860",
		"senses": [
			{
				"id": "L16827-S1",
				"glosses": {
					"language": "de",
					"value": "Als Geschenk oder Spende etwas geben."
				}
			}
		]
	},
	"L7166": {
		"pageid": 55653109,
		"id": "L7166",
		"lemmas": {
			"en": {
				"language": "en",
				"value": "gift"
			}
		},
		"lexicalCategory": "Q1084",
		"language": "Q1860",
		"senses": [
			{
				"id": "L7166-S1",
				"glosses": {
					"language": "de",
					"value": "etwas das ohne Gegenleistung freiwillig gegeben wird"
				}
			}
		]
	},		
	"L<I just made this up>": {
		"pageid": 123456,
		"id": "L<I just made this up>",
		"lemmas": {
			"en": {
				"language": "de",
				"value": "Gift"
			}
		},
		"lexicalCategory": "Q1084",
		"language": "Q188",
		"senses": [
			{
				"id": "L<I just made this up>-S1",
				"glosses": {
					"language": "de",
					"value": "gesundheitsschädliche oder potenziell tödliche Substanz"
				}
			}
		]
	}
}

Is this a sane request? :-)

Related Objects