Page MenuHomePhabricator

Detect and remove unneeded translations in things like en-ca and en-gb
Open, LowestPublic

Description

I was just looking at en-ca, and noticed there are many messages identical to en (I've started deleting some on translatewiki).

Similar can be said for en-gb

I'm sure similar can be said for other languages...

The "translations" are unnecessary, and should be removed

There'd be some use for doing this all the way down the translation chains, potentially

Event Timeline

<?php

$en = (array)json_decode( file_get_contents( 'en.json' ) );
$en_gb = (array)json_decode( file_get_contents( 'en-gb.json' ) );

$messagesIdentical = [];

foreach ( $en_gb as $k => $v ) {

	if ( isset( $en[$k] ) && $en[$k] === $v ) {
		$messagesIdentical[] = $k;
	}
}

echo count( $messagesIdentical ) . " identical messages.";
echo "\n";
echo implode( "\n", $messagesIdentical );
echo "\n";

There's too many in en-gb to fix manually. There's not so many in en-ca (so I have fixed those)

en-gb

399 identical messages.
tog-underline
tog-hideminor
tog-hidepatrolled
tog-newpageshidepatrolled
tog-extendwatchlist
tog-usenewrc
tog-numberheadings
tog-showtoolbar
tog-editondblclick
tog-editsectiononrightclick
tog-watchcreations
tog-watchdefault
tog-watchmoves
tog-watchdeletion
tog-watchuploads
tog-watchrollback
tog-minordefault
tog-previewontop
tog-previewonfirst
tog-enotifminoredits
tog-enotifrevealaddr
tog-shownumberswatching
tog-oldsig
tog-fancysig
tog-uselivepreview
tog-forceeditsummary
tog-watchlisthideown
tog-watchlisthidebots
tog-watchlisthideminor
tog-watchlisthideliu
tog-watchlisthideanons
tog-watchlisthidepatrolled
tog-ccmeonemails
tog-diffonly
tog-showhiddencats
tog-norollbackdiff
tog-useeditwarning
tog-prefershttps
underline-always
underline-never
underline-default
editfont-style
editfont-default
editfont-monospace
editfont-sansserif
editfont-serif
sunday
monday
tuesday
wednesday
thursday
friday
saturday
sun
mon
tue
wed
thu
fri
sat
january
february
march
april
may_long
june
july
august
september
october
november
december
january-gen
february-gen
march-gen
april-gen
may-gen
june-gen
july-gen
august-gen
september-gen
october-gen
november-gen
december-gen
jan
feb
mar
apr
may
jun
jul
aug
sep
oct
nov
dec
january-date
february-date
march-date
april-date
may-date
june-date
july-date
august-date
september-date
october-date
november-date
december-date
pagecategories
subcategories
category-empty
hidden-categories
hidden-category-category
category-subcat-count
category-subcat-count-limited
category-article-count
category-article-count-limited
category-file-count
category-file-count-limited
listingcontinuesabbrev
index-category
noindex-category
broken-file-category
about
article
newwindow
cancel
moredotdotdot
mypage
mytalk
anontalk
navigation
and
qbfind
qbbrowse
qbedit
qbpageoptions
qbmyoptions
faq
faqpage
actions
namespaces
variants
navigation-heading
errorpagetitle
returnto
tagline
help
search
searchbutton
go
searcharticle
history
history_short
updatedmarker
printableversion
permalink
print
view
view-foreign
edit
edit-local
create
create-local
editthispage
create-this-page
delete
deletethispage
undeletethispage
undelete_short
viewdeleted_short
protect
protect_change
protectthispage
unprotect
unprotectthispage
newpage
talkpage
talkpagelinktext
specialpage
personaltools
articlepage
talk
views
toolbox
userpage
projectpage
imagepage
mediawikipage
templatepage
viewhelppage
categorypage
viewtalkpage
otherlanguages
redirectedfrom
redirectpagesub
redirectto
lastmodifiedat
viewcount
protectedpage
jumpto
jumptonavigation
jumptosearch
view-pool-error
generic-pool-error
pool-timeout
pool-queuefull
pool-errorunknown
pool-servererror
poolcounter-usage-error
aboutsite
aboutpage
copyright
copyrightpage
currentevents
currentevents-url
disclaimers
disclaimerpage
edithelp
helppage-top-gethelp
mainpage
mainpage-description
policy-url
portal
portal-url
privacy
privacypage
badaccess
badaccess-group0
badaccess-groups
versionrequired
versionrequiredtext
ok
youhavenewmessages
youhavenewmessagesfromusers
youhavenewmessagesmanyusers
newmessageslinkplural
newmessagesdifflinkplural
youhavenewmessagesmulti
editsection
editold
viewsourceold
editlink
viewsourcelink
editsectionhint
toc
showtoc
hidetoc
collapsible-collapse
collapsible-expand
confirmable-confirm
confirmable-yes
confirmable-no
thisisdeleted
viewdeleted
restorelink
feedlinks
feed-invalid
feed-unavailable
site-rss-feed
site-atom-feed
red-link-title
sort-descending
sort-ascending
nstab-main
nstab-user
nstab-media
nstab-special
nstab-project
nstab-image
nstab-mediawiki
nstab-template
nstab-help
nstab-category
mainpage-nstab
nosuchaction
nosuchactiontext
nosuchspecialpage
nospecialpagetext
error
protectedinterface
cascadeprotected
pt-login
pt-createaccount
savearticle
loginreqlink
noarticletext
template-protected
revisionasof
previousrevision
cur
lineno
editundo
searchresults
searchresults-title
prevn
nextn
shown-title
searchprofile-articles
searchprofile-images
searchprofile-everything
searchprofile-advanced
searchprofile-articles-tooltip
searchprofile-images-tooltip
searchprofile-everything-tooltip
searchprofile-advanced-tooltip
search-result-size
prefs-i18n
right-writeapi
newuserlogpage
recentchanges
recentchanges-legend
recentchanges-label-newpage
recentchanges-label-minor
recentchanges-label-bot
rclistfrom
rcshowhidebots
rcshowhideliu
rclinks
diff
hist
minoreditletter
newpageletter
boteditletter
rc-change-size-new
recentchangeslinked-toolbox
recentchangeslinked-summary
upload
file-anchor-link
filehist
filehist-help
filehist-current
filehist-datetime
filehist-thumb
filehist-thumbtext
filehist-user
filehist-dimensions
filehist-comment
imagelinks
linkstoimage
sharedupload-desc-here
randompage
nbytes
newpages
allpagessubmit
rollbacklink
namespace
invert
tooltip-invert
namespace_association
tooltip-namespace_association
blanknamespace
whatlinkshere
blocklink
contribslink
databaselocked
movecategorypage-warning
allmessagestext
thumbnail-more
tooltip-pt-login
tooltip-pt-createaccount
tooltip-ca-talk
tooltip-ca-edit
tooltip-ca-addsection
tooltip-ca-history
tooltip-ca-watch
tooltip-search
tooltip-search-go
tooltip-p-logo
tooltip-n-mainpage
tooltip-n-mainpage-description
tooltip-n-currentevents
tooltip-n-recentchanges
tooltip-n-randompage
tooltip-n-help
tooltip-t-whatlinkshere
tooltip-t-recentchangeslinked
tooltip-feed-atom
tooltip-t-upload
tooltip-t-specialpages
tooltip-t-print
tooltip-t-permalink
tooltip-ca-nstab-main
tooltip-ca-nstab-special
tooltip-ca-nstab-image
tooltip-ca-nstab-category
pageinfo-toolboxlink
file-info-size
show-big-image
show-big-image-preview
show-big-image-other
show-big-image-size
metadata
namespacesall
specialpages
tag-filter
tag-list-wrapper
logentry-newusers-create
searchsuggest-search

en-ca (which I've fixed already)

6 identical messages.
talk
aboutsite
mainpage
editsection
editsectionhint
red-link-title

At least for en-gb, am I ok to deleteBatch these on the translatewiki server as my user account?

A quick look at https://translatewiki.net/w/i.php?title=Special:Translate&group=mediawiki&language=en-ca&filter=translated&action=translate suggests a lot more en-ca have been translated, but not exported in the last 3 years... T162009#3149673

So a load more would actually need deleting

At least for en-gb, am I ok to deleteBatch these on the translatewiki server as my user account?

Should be ok for MediaWiki. Please use the -r option to link some page (like this report) where translators can find more information about the reason.

I should probably turn this into a proper maintenance script (probably living in Translate?) that follows fallback chains

Nemo_bis triaged this task as Lowest priority.Oct 15 2017, 7:27 PM

We could do two things:

  • A message checker that marks identical translations for these special languages as fuzzy - ideally preventing saving in the first place.
  • A script that deletes those - ideally also deleting those from the source repositories.

This probably doesn't "need to be done" since if they are "unneeded" they aren't hurting anything, but defining unneeded may be hard - "unused" could be possibly slightly useful (i.e. local sub-language variants of messages that have no base language message anymore) . And simply being identical (e.g. X/en == X/en-ca) is not a good enough test. Because of problems called out in tasks such as T229992 these may be necessary in this scenario:

  1. A project wants to localize a message for any reason other than translation purposes - So they initiate MediaWiki:Foo
  2. Their readers get Foo
  3. Their readers that have their interface set to project primary language variants (e.g. en --> en-ca, en-au) don't fall back to the base when the variant isn't initiated, they get the default
  4. The project either copies the base to the variants, or the project transcludes the base in to the variants -- in either case the variants are now identical to each other, and possibly to the base