Page MenuHomePhabricator

Autocomment-like snippets in labels, descriptions or aliases are rendered like additional autocomments
Open, LowPublic

Description

MediaWiki core apparently supports multiple autocomments in a single edit message, which means that if you set an entity label to something like

deutsche Beschriftung /* wbsetlabel-set:1|en */ english Label /* wbcreateclaim-create:1| */ [[Property:P31]]: [[Q5]] a b

then the final summary will be

/* wbsetlabel-set:1|de */ deutsche Beschriftung /* wbsetlabel-set:1|en */ english Label /* wbcreateclaim-create:1| */ [[Property:P31]]: [[Q5]] a b

which will render like

Changed [de] label: deutsche Beschriftung Changed [en] label: english Label ‎Created claim: instance of (P31): human (Q5) a b

which can be slightly misleading :)

(See here for such an example edit on test.wikidata.org.)

I’m not sure if there’s an easy solution to this. We should escape the label before appending it to the summary when creating the revision, but I couldn’t find any escaping format that wouldn’t still show up in the rendered summary (i. e., changing /* to /\* breaks the autocomment interpretation, but also shows the \ in the rendered message even though it’s not part of the actual label).

Are there any upcoming or planned core changes (perhaps part of MCR) that would split up the summary into autocomment and free-text field, or something else that might help here?

NOTE: it seems like we’ll use multiple autocomments for T184702 or T220696, so we probably shouldn’t completely disable that feature even if it was in our power.

Event Timeline

I can not think of a good way that utilizes escaping to avoid this.

What we could do is to apply some validation to user-provided summaries, and deny to accept user-provided text that looks like one of our autosummaries.

An other possibility is to add this to our FormatAutocomments hook handler at the top of RepoHooks::onFormat:

if ( $pre ) {
	return;
}

With this, only the first autocomment in a summary line gets parsed and localized. Since all summaries our code produces conform to this (see SummaryFormatter::assembleSummaryString), I believe introducing this limitation would actually be consistent.

Note that T184702 might start using the fact such a limitation was never implemented as a feature.

At https://www.wikidata.org/wiki/Special:Diff/626986649 I created a live example by submitting /* wbsetaliases-set:2|a totally unrelated bunch of */ but no, these are not aliases.

image.png (76×636 px, 20 KB)

User-provided summaries are meant to be appended at the end of the localizable "automatic summary". This actually happens as expected. But when displaying such a summary, the localization does not stop after the first localizable element.

Edit: NEVERMIND, Thiemo had all this figured out already 🤦


There is a possible workaround we can apply. The Linker tells us whether there’s any comment text before or after the autocomment currently being formatted, and since as far as I’m aware we only ever use autocomments at the very beginning of the summary, we can skip all others when formatting:

diff --git a/repo/RepoHooks.php b/repo/RepoHooks.php
index 71dcf27da..16cd20ca2 100644
--- a/repo/RepoHooks.php
+++ b/repo/RepoHooks.php
@@ -694,6 +694,11 @@ public static function onFormat( &$comment, $pre, $auto, $post, $title, $local )
             return;
         }
 
+        if ( $pre ) {
+            $comment = '/* ' . $auto . ' */';
+            return;
+        }
+
         $namespaceLookup = WikibaseRepo::getDefaultInstance()->getEntityNamespaceLookup();
         $entityType = $namespaceLookup->getEntityType( $title->getNamespace() );
         if ( $entityType === null ) {

(Setting $comment skips Linker::formatAutocomments’s usual logic, which would otherwise turn this into a normal, hyperlinked autocomment.) This doesn’t quite faithfully preserve the label/description/alias – Linker::formatAutocomments’s regex allows any amount of whitespace, including none at all, within the /* … */, whereas we reconstruct a single space on each side – but since the output is HTML anyways, it barely makes a difference.

On the other hand, RepoHooks::onFormat and AutoCommentFormatter apparently take great care to handle the $pre case correctly, so I’m skeptical if we can really throw that part away and skip formatting of such autocomments. Do we use additional or non-initial autocomments anywhere?