Page MenuHomePhabricator

Labels/aliases/descriptions in Toki Pona need to be removed
Closed, ResolvedPublic5 Estimated Story Points

Description

Apparently some time ago Toki Pona was removed as a supported language in Wikimedia. https://www.wikidata.org/wiki/Q11466925 and a bunch of other items still have labels/descriptions/aliases in the language though and can't be edited now via the UI because the validation of the language codes fails.
We need someone to edit all these items via the API to remove them so the items can be edited again by everyone.


T200432#5065475 and the linked ticket describes how this really is an annoying problem.

Acceptance criteria:

  • none of Wikidata items or properties has a label, a description, nor an alias in language toki pona

Note:

  • likely can be done by using wbeditentity API and its clear option
  • use the user ''Maintenance script' for the API call

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I just tried editing https://www.wikidata.org/wiki/Q11466925 and had no problems doing so. Could add/remove a statement and could remove/add an alias.

You can edit other languages, but you can't edit tokipona at all (and the alias is wrong). You can't use any gadgets dealing with all languages, like LabelLister or NameGuzzler, or Empty, or namescript (etc.)

It is possible to remove the tokipana terms by undoing the edits in which they were added https://www.wikidata.org/wiki/Special:Diff/716262791 .

Yeah but this is not possible if there is a sitelink conflict for example that has been introduced since. This is the case for the item mentioned in the description. Not sure for how many others.

I managed to drop this to around 40 instances. I did it using action=wbeditentity&clear=1&data={entity_without_tokipona}. But in some cases, it isn't possible because of inconsistent data (missing precision value attributes, links to deleted pages, label-description conflicts etc.).

For curiosity, this is the Pywikibot script:

tokipona.py
# -*- coding: utf-8 -*-
import pywikibot

from pywikibot import pagegenerators

pywikibot.handle_args()

repo = pywikibot.Site('wikidata', 'wikidata')

query = ''
query += 'SELECT 0 AS ns, term_full_entity_id FROM wb_terms'
query += ' WHERE term_language = \'tokipona\' AND term_entity_type = \'item\''

summary = 'remove tokipona terms (see [[phab:T200432]])'

gen = pagegenerators.MySQLPageGenerator(query, site=repo)

def handle_entity(entity):
    if 'tokipona' in entity._content['labels']:
        entity._content['labels'].pop('tokipona')
    if 'tokipona' in entity._content['descriptions']:
        entity._content['descriptions'].pop('tokipona')
    if 'tokipona' in entity._content['aliases']:
        entity._content['aliases'].pop('tokipona')
    entity.editEntity(entity._content, clear=True, summary=summary)

for item in pagegenerators.PreloadingEntityGenerator(gen):
    handle_entity(item)

query = query.replace("'item'", "'property'")

for page in pagegenerators.MySQLPageGenerator(query, site=repo):
    p = pywikibot.PropertyPage(repo, page.title(with_ns=False))
    p.get()
    handle_entity(p)

This ticket was touched on with development of the new temrbox in T219375.

Really we should write a maintenance script to remove these terms in old languages from current revisions to avoid issues.

WMDE-leszek set the point value for this task to 5.May 7 2019, 1:03 PM
noarave triaged this task as Medium priority.May 7 2019, 1:28 PM
noarave updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2019-05-13T16:14:17Z] <Amir1> removing tokipona language terms from items using maintenance script (T200432)

I wrote a quick maintenance script and ran it on production and removed all of those cases. There's nothing left so far: https://quarry.wmflabs.org/query/36132