I've been using Claude Code to help me do functioneering on Wikifunctions, focused mostly on building composition functions that work based on Wikidata relationships as a way to explore and improve modeling on Wikidata. (See https://github.com/ragesoss/wikifunctioneering). My focus so far has been music theory.
I got stuck trying to make a function that uses Wikidata relationships to go from one word to another. https://www.wikifunctions.org/view/en/Z26184 ("solfege to sargam") is intended to take a solfege syllable (do, re, mi) and return the corresponding sargam syllable from Indian Classical Music, as a way to validate the ways that relevant Lexemes and Items and concept relationships in this area are modeled (and just because I think it's interesting).
I'm currently stuck with a hard-coded Python function to map a solfege syllable to the corresponding Sense ID: https://www.wikifunctions.org/view/en/Z29515. The missing builtin that would unlock a fully Wikidata-based 'solfege to sargam' (just as a pet example of a general pattern) would be 'find lexemes by lemma'.
I used Claude Code to create an implementation, and it seems like it is very closely parallel to other builtins that go through the Wikidata API to fetch objects. This might be just slop; I don't know enough about Wikifunctions internals to understand this, but thought I'd give it a shot.
-Sage
Claude Code's explanation
Summary
Proposing a new Wikifunctions built-in that maps a lemma string to the
Wikidata lexemes having it as a lemma in a given language. This is the
reverse of the existing Z6830 ("find lexemes for a Wikidata item") and
plugs a gap that currently forces composition functions to fall back on
hardcoded Python dicts.
A tested implementation (orchestrator + function-schemata) is already
complete and ready to submit as MRs pending ZID allocation — see
"Implementation status" at the bottom.
Motivation
Today there is no way to go from a string to a Wikidata lexeme inside
a composition. Z6830 goes item → lexemes; Z22138, Z21806 and friends
go lexeme → lemma; but nothing goes lemma → lexemes.
Concretely this blocks Z26184 ("solfege to sargam"). Its current code
implementation Z29517 uses a hardcoded Python dict mapping each
solfège syllable to a sense ID (e.g. 'sol' → 'L328094-S2'). Every new
solfège variant — Italian do/ut, the si/ti split, foreign-language
syllables — requires a Wikifunctions code edit, even when the relevant
Wikidata lexeme and sense already exist. With this primitive, Z26184 can
be rewritten as a pure composition: the syllable string resolves to a
lexeme via this new function, its P5137 sense is walked to find the
scale-degree item, and the downstream Z6830 already covers the rest of
the pipeline to the sargam sense. After that, supporting a new
variant is a Wikidata edit only.
Proposed signature
Z???? find lexemes by lemma K1: Z6 (lemma) K2: Z60 (language) returns: List<Z6095> (lexeme references)
Parallels Z6830 deliberately: the K2: Z60 language convention matches,
and the return type is the same list-of-lexeme-refs shape so results
can flow straight into existing lexeme-walking helpers.
Implementation approach
No new Wikidata-side work required — the CirrusSearch keywords this
needs are already in production via T271776 ("Allow limiting lexeme
searches by language"):
- haslemma:"<lemma>" — exact lemma match
- haslang:Q<id> — language filter (via the Z60's language item)
The orchestrator handler is a thin wrapper over the same
findEntitiesByStatements / dereferenceWithCaching path Z6830 uses,
just with a different srsearch keyword (haslemma: instead of
haswbstatement:) and a distinct cache-key prefix in the shared
ReferenceType.WIKIDATA_SEARCH namespace.
One MediaWiki API call per invocation, namespace 146, response size
bounded by Cirrus (no pagination needed at the signatures people will
realistically use).
Scope and non-scope (v1)
In scope:
- Exact lemma match only (not prefix / fuzzy).
- Filter by language.
- Returns lexeme references (Z6095); callers fetch full lexemes only as needed.
Out of scope for v1 (can follow as separate primitives):
- Lexical category filter. v1 returns all matching lexemes across categories; composition callers can filter downstream by inspecting each candidate's lexical category. The CirrusSearch haswbstatement: path is available for a v2 that takes a category Z6091 if that turns out to be painful in practice.
- Prefix / fuzzy matching (haslemmaprefix: etc.).
- Form-level lemmas — only head lemma is considered.
Implementation status
A complete, tested implementation exists on local feature branches,
ready to push as MRs once this task has confirmed ZID allocation.
Summary of what's ready:
- function-schemata: two new definition files (Z????.json function shell + Z????.json built-in impl) and two new dependencies.json entries.
- function-orchestrator:
- src/fetchObject.js — new findLexemesByLemma, findLexemesByLemmaQuery, fetchLexemesByLemma(s), and keyForWikidataLemmaSearchID methods on ReferenceResolver, parallel to the Z6830 path.
- src/builtins.js (v1) — new BUILTIN_FIND_LEXEMES_BY_LEMMA_ handler registered in builtinFunctions and added to implementationZIDs / functionZIDs.
- src/transpilation/builtins.js (v2) — builtinFindLexemesByLemma handler, with the bare-Z6 unwrap via getNestedValueOrThrow (not realizeStringMemberOrThrow, which assumes the 2-level nesting of Z6091/Z6092 wrappers and throws Z516 on a bare Z6 — worth mentioning as a subtle pitfall for anyone adding bare-Z6 builtins in the future).
- test/utils/mockUtils.js — extends the WikidataQueryStub and mockWikidataActionAPI to dispatch haslemma: searches alongside the existing haswbstatement: ones.
- Unit tests (5) and end-to-end tests across v1 and v2 (6) paralleling the Z6830 suite.
Baseline 816 tests → 827 passing, no regressions, npm test clean
(includes lint under the zero-warning policy).
Request
- Please confirm or assign the user-facing Z8 ZID and the built-in Z14 ZID. Slots Z6832 and Z6932 are unused in the current function-schemata/data/definitions/ and would sit naturally next to Z6830/Z6930 and Z6831/Z6931, but happy to use whatever range the team prefers.
- Any concerns about the signature (especially the decision to drop category-filtering from v1) best raised here before the MRs land.
Happy to rework the signature if there's a better shape; the code
changes are small and easy to adjust.
Related
- T271776 — "Allow limiting lexeme searches by language" (productionised the haslemma: / haslang: CirrusSearch keywords this depends on).
- T370072 — parent task for the existing Wikidata-lexeme built-ins (Z6820/Z6825/Z6826, Z6830/Z6931). This is the natural follow-up.
- T230833 — wbsearchentities lexeme language-filter bug; the reason we go through CirrusSearch rather than wbsearchentities.