Page MenuHomePhabricator

Cleaning via bot: Naming convention has inadequacies (field separator)
Open, HighPublic

Description

Current state : LinguaLibre's naming convention is largely based on - as the main field separator.
( and ) are also field informers.

Review

LinguaLibre Queries Services let you see the current situation.

List of speaker by name and presense of - as separator :

sparql
SELECT *
WHERE {
  ?id prop:P2 entity:Q3 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    ?id rdfs:label ?name .
  }
  BIND (regex(STR(?name),"-") AS ?has_separator)
}
ORDER BY DESC (?has_separator)

How many:

sparql
SELECT ?has_separator (COUNT(?has_separator) AS ?found)
WHERE {
  ?id prop:P2 entity:Q3 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    ?id rdfs:label ?name .
  }
  
  BIND (regex(STR(?name),"-") AS ?has_separator)
  # filter( regex(?name, "-" ))
}
#ORDER BY DESC (?has_separator)
GROUP BY (?has_separator)

51/1000 (5%) of the speakers' username contain -, which makes regex on their files more unpredictable. A better field separator would be welcome.

Suggestion

On peut/doit prendre un plus rare, a minima qui n'est pas un des caractères présents sur nos claviers.
U+FF0D - FULLWIDTH HYPHEN-MINUS LL-Q150 (fra)-Roll-Morton-vert.wav
Ou doubler le séparateur, pour créer une pratique unique. LL--Q150 (fra)--Roll-Morton--vert.wav

Event Timeline

Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug triaged this task as High priority.Jul 6 2022, 11:20 AM
Yug renamed this task from Naming convention has inadequacies (field separator) to Cleaning via bot: Naming convention has inadequacies (field separator).Jul 7 2022, 8:22 PM