Page MenuHomePhabricator

Add per-language Caption columns (multi-lang captions) (!52)
Closed, ResolvedPublic

Description

Scope: Let the user choose which language(s) the Caption column edits, by adding additional Caption columns each bound to a different language; never two columns for the same language; choices persist across reloads.

What's wanted

  • The Caption column is currently single-language (English, hard-coded). The header reads "Caption EN" via a placeholder DEFAULT_DESCRIPTION_LANG = "EN" constant in src/table.jsx:1266 whose own comment flags this as the missing piece.
  • Users should be able to (a) change the active Caption column's language, or (b) add another Caption column for a different language. Two visible Caption columns for the same language must be impossible.
  • The set of languages the user has enabled (and the column count) is part of user preferences, persisted across reloads and across devices via the existing Preferences.json user-store page (no new persistence schema required — see Persistence below).

Where in the code

  • src/table.jsx
    • TABLE_COLUMNS (~line 70-90): the single description column descriptor with its caption-specific header info popover. Per-language variants need column descriptors that share the same look but each carry a lang field.
    • Cell() case "description": (~line 1841) and CellEditor case "description": (~line 2198) read/write item.description. New per-lang cells need to read/write item.descriptions[lang].
    • HeaderCell (~line 1268) currently appends DEFAULT_DESCRIPTION_LANG to the label. Replace with the column's lang (uppercased, e.g. "EN", "NL").
    • EDITABLE_KEYS (~line 141): the editable check is currently keyed on a static Set<string>. Per-lang columns will need a per-key suffix (description:nl) or a more general predicate that recognises caption keys.
    • getAllColumns(customProps) (~line 203): today returns [...TABLE_COLUMNS, ...customProps]. It should also expand the canonical description template into one descriptor per active caption language, so visibility/order/widths flow through without special-casing.
  • src/api/normalize.js
    • normalizeStashFile (~line 213): descriptions: { en: '', nl: '' } is the existing partial shape. Initialise descriptions as {} and populate from SDC labels in normalizePublishedFile (~line 489) for every language present, not just en.
  • src/wikitext-templates.js formatDescription (~line 161): currently emits {{en|...}} and {{nl|...}} only. Make it emit one block per language present in item.descriptions. Keep the legacy fallback for item.description (treat as en).
  • src/api/user-store.js DRAFT_FIELDS (~line 395): add descriptions so per-language drafts persist alongside the legacy description string.
  • src/columns-modal.jsx: add a way to pick which language to show when adding a Caption column (so two columns can't share a language).

Persistence model

Per the maintainer's "Don't persist derived data" rule, only the user's choices belong in Preferences.json:

  • The list of caption-column descriptors (which languages, which order, which widths) is already stored as part of columnState.visible / columnState.order / columnState.widths. No new pref shape needed — the column key is the language identifier (e.g. description:nl).
  • The legacy "description" key (no suffix) continues to mean "English caption" for backward compat with stored prefs / drafts.
  • Per-language draft text (descriptions[lang]) goes through setDraft like any other field, into Metadata.json (user-authored content).

Default language

  • The base column key description keeps meaning English (zero migration cost, matches the existing data flow).
  • Adding a new caption column defaults to the user's browser locale (navigator.language) two-letter prefix, falling back to en. The duplicate-language guard ensures the picker shows only languages not already on screen.
  • The set of offered languages comes from a small static list (the most common Commons languages: en, nl, de, fr, es, it, pt, pl, ru, ja, zh, ar). The list is curated, not exhaustive — this matches the column-defaults approach for licences (a curated catalog).

Acceptance criteria

  • User can change the language of the existing Caption column from a header control or the columns modal.
  • User can add a second (third, …) Caption column with a different language.
  • The duplicate-language guard prevents two visible Caption columns from sharing a language (the picker hides already-used languages).
  • Edits to one language do not affect the other.
  • The choice (language set, order, count) persists across page reloads in the same browser.
  • The choice persists across browsers/devices for the same user (via Preferences.json round-trip).
  • The published wikitext emits one {{<lang>|1=...}} block per non-empty caption language, in the order shown in the table.
  • Existing rows with the legacy description: "..." field still render under the English column (no data loss).
  • npm run build (which runs the undefined-identifier scanner) passes.

Out of scope

  • Pushing captions as SDC labels via wbeditentity.labels (currently only claims are pushed via addStructuredData). The on-wiki rendering still relies on the wikitext {{lang|1=...}} blocks. SDC label publish is a separate task.
  • Per-row "this row uses these languages" overrides — every visible column applies to every row.
  • Unbounded language picker (the curated list is the v1; an autocomplete over MediaWiki's site list can come later if requested).

Event Timeline

Daanvr moved this task from To do to Doing on the Tool-upload-workbench board.

Grooming pass (AI)

Description rewritten with investigation findings (see task description above). Original preserved here verbatim:

enable the user to change the language of the caption column or to add an other caption column with a different language. it should never be possible to have to columns visible with the same language. the choice of language and how many should be saved in user preferences. and persist through reloads and sessions.

Investigation summary:

  • Today the Caption column has key description and is hard-coded as English. src/table.jsx:1266 even has a DEFAULT_DESCRIPTION_LANG = "EN" placeholder with a comment saying "a real per-column language picker is blocked on the multi-language description data model (separate task)" — this is that separate task.
  • The underlying data is multilingual by spec (Commons SDC labels). src/api/normalize.js:319 already has an sdcLabel(entity, prefLang) helper that picks the best-language label, but currently only stores one string under item.description.
  • A multi-language shape item.descriptions = { en, nl, ... } already exists half-wired in src/api/normalize.js:213, src/data.js:20, src/wikitext-templates.js:163-167. formatDescription already prefers descriptions[lang] over the flat string when present, and emits {{en|1=...}}{{nl|1=...}} blocks. Today only en and nl are hardcoded.
  • Column visibility/order is string[] of column keys, persisted in Preferences.json via setPref('columnState', …) in src/app.jsx:155-158 (and localStorage stashhub.columns.v9 as fast-path). So "the choice of language and how many" is just the visible columns list — no new persistence schema needed.
  • SDC label publish (i.e. actually pushing labels to wbeditentity.labels so other tools see them) is out of scope for this task. addStructuredData in src/api/commons.js:244 only sends claims today; captions only land in wikitext via formatDescription. Wikitext multi-lang output is already supported, just needs to be language-agnostic.
Daanvr renamed this task from implement the caption language to Add per-language Caption columns (multi-lang captions).Fri, May 15, 3:57 PM
Daanvr updated the task description. (Show Details)
Daanvr renamed this task from Add per-language Caption columns (multi-lang captions) to Add per-language Caption columns (multi-lang captions) (!52).Fri, May 15, 4:23 PM
Daanvr moved this task from Doing to Reviewing on the Tool-upload-workbench board.

it sould not be possible to remove the last caoption column. also removing a column with values in it should warn the user the values will be removed.
reuploading a file to the stash with pre-existing captions data (in the user namespace metadata) should add the caption columns with existing values.
In short the user should not be able to have caption values linked to a file that is not visible in the table.

Addressed feedback on !52:

  1. At least one caption column must always be visible. Both removal paths now enforce the invariant: the header-menu "Remove this caption column" entry hides itself when there is only one visible caption column (captionUsedLangs.length > 1 gate), and the columns-modal eye toggle on any caption column refuses with an explanatory alert when toggling it off would leave zero caption columns. The English column is no longer hard-coded as un-removable — the rule is "always at least one", not "always English" — so a user can downgrade to a non-English-only setup if they want to.
  1. Removing a caption column with stored values now warns first. The header-menu "Remove this caption column" entry surfaces the affected count both inline ("discards N caption values") and switches to the destructive-confirm style; on click it runs a confirm() dialog naming the language and the file count. Same flow for the columns-modal eye toggle (a separate confirm() path with parallel copy). On confirm, the values are cleared from each item via the new clearCaptionFromItem helper, which deletes the language slot from item.descriptions and (for English) zeroes the legacy item.description field — necessary so the auto-promote sweep below doesn't see them as user-typed content and re-add the column.
  1. Re-uploading a file with pre-existing captions auto-promotes the missing language columns. A new App-level useEffect on items runs after every items mutation (bootstrap, drag-and-drop upload, draft merges, history refresh) and ensures every language carrying non-empty caption text has a corresponding visible caption column. Bailout returns the same colState reference when nothing to add, per the cell-commit-freeze lesson in CLAUDE.md — no infinite loop risk. Satisfies the maintainer's invariant: "the user should not be able to have caption values linked to a file that is not visible in the table."

Verification:

  • npm run build passes (incl. undefined-identifier scanner, with three new window.X exports added to scripts/window-globals.json).
  • Deployed bundle at /mr-52/ contains the new logic — verified with curl + str.count() for countItemsWithCaption, collectCaptionLangsFromItems, clearCaptionFromItem, and the user-facing alert/confirm strings (all present, no misses).
  • Could not exercise the modal/header-menu interactions in a browser from the agent shell — please test interactively that the confirm dialogs fire with the right counts, that the alert appears when trying to hide the only caption column from the columns modal, and that uploading a file with a pre-existing draft in NL/DE/etc. auto-adds the matching column on first paint.

Surprising for the merger:

  • The English column can now be removed via the header menu when a sibling caption column exists. Previously canRemove={col.key !== 'description'} blocked it; now the gate is captionUsedLangs.length > 1. The auto-promote sweep brings English back if the user still has English values stored, so the only way to land English-less is to also confirm the discard.

CHANGELOG.md [Unreleased] entry was widened (single bullet under "Per-language Caption columns") to cover the new removal-guard + auto-promote behaviour. No version bump.

New commit (7679487) pushed to feat/T426422-caption-language; preview redeployed at https://upload-workbench.toolforge.org/mr-52/.

Instead of enabling the user to remove the last caption column and just adding a new default caption column at the end, let's just gray out the option with a small text indicating that removing captions entirely is not possible.

Rebased onto current main (v0.31.0) — merge resolved a 3-region conflict in src/table.jsx where T426428's Title-column "Restore from original filename" menu entries landed in the same HeaderMenuPopover prop list / locally-derived consts that T426422 was modifying for the per-language Caption menu entries. The two features are orthogonal in the popover (different columns), so both were taken; npm run build (including the undefined-identifier scanner) is clean against the merge result. CI green, preview verified live.

MR: https://gitlab.wikimedia.org/daanvr/upload-workbench/-/merge_requests/52
Preview: https://upload-workbench.toolforge.org/mr-52/