Page MenuHomePhabricator

Use proper messages for log entries
Open, LowPublic20 Estimated Story Points

Description

Instead of just showing the type and action, use the existing messages to clarify
Additionally, use the log parameters (when relevant)

Rather than trying to implement a version of LogFormatter in javascript, just send the display to use in the api

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 set the point value for this task to 4.Aug 12 2020, 6:13 AM
DannyS712 moved this task from Unsorted to Next on the User-DannyS712 board.
DannyS712 updated the task description. (Show Details)

...so it turns out that each log type has its own handling of the log parameters, etc. Rather than recreating the handling in javascript, I suggest adding an additional property to the watchlist api for the log display instead.

DannyS712 changed the task status from Open to Stalled.Aug 12 2020, 7:24 AM
DannyS712 triaged this task as Low priority.
DannyS712 moved this task from Next to Later on the User-DannyS712 board.

Stalled on subtask

DannyS712 renamed this task from Use `logentry-*` messages for log entries to Use proper messages for log entries.Aug 12 2020, 7:30 AM
DannyS712 updated the task description. (Show Details)
DannyS712 changed the task status from Stalled to Open.Nov 22 2020, 5:24 AM
DannyS712 moved this task from Backlog to Later on the MediaWiki-extensions-GlobalWatchlist board.

Ready to go once extension requires 1.36+

Hmm, started looking into doing this now, and it seems that its not the most compatible with how we're currently formatting things - doesn't give a good place to add the links to the log entry and pages logs, or to unwatch. Might not be able to use the logdisplay returned by the api after all

Dumped a bunch of notes at T287929#7265674, but some more:
There are some log entry messages that will be easy to deal with, because they don't have any more than the 3 basic parameters (user, user's gender, target page), and others that will be much harder. It'll probably start with slowly adding support for some log entries, and using the existing format for the others.

We'll want to load the messages we need via the api, not via ResourceLoader modules, at least for non-core log entries, to avoid a bunch of warnings for missing messages. We should add some logic to avoid trying to load a missing message multiple times, based on the existing messages stuff at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/8d5ed3572dbd642ebd0788c6441e683d460b2da4/resources/src/mediawiki.api/messages.js

Log entries that will be easy (just those three params) (basically https://codesearch.wmcloud.org/deployed/?q=%22logentry-%5CS%2B%22%3A%20%22%5B%5E4%5D%2B%22&i=nope&files=%5C%2Fen%5C.json%24&excludeFiles=&repos= but only listing the most common ones for now)

  • logentry-delete-delete
  • logentry-delete-delete_redir
  • logentry-delete-delete_redir2
  • logentry-protect-unprotect
  • logentry-upload-upload
  • logentry-upload-overwrite
  • logentry-upload-revert
  • logentry-pagetranslation-mark
  • logentry-pagetranslation-unmark
  • logentry-pagetriage-curation-reviewed
  • logentry-pagetriage-curation-unreviewed
  • logentry-pagetriage-curation-enqueue

More complicated

  • Move logs with $4 (logentry-move-move, logentry-move-move-noredirect, logentry-move-move_redir, logentry-move-move_redir-noredirect, logentry-protect-move_prot)
  • blocks with restrictions
  • revision deletion
  • protection settings
  • flagged revs

We will also need some logic to figure out which message to use (move over redirect or without leaving a redirect is for the same log type and action)

DannyS712 changed the point value for this task from 4 to 20.Aug 6 2021, 11:41 PM

Basic sketch of gender support (needed for all log entries):

Since we only need it for log entries, filter entries to get unique names of users having performed a log action. Ideally, we will cache this between sites in MultiSiteWrapper, with the assumption that a user's gender is consistent across wikis, but at least should be cached for a single site. Anonymous users are assumed to have unknown.

Processing (examples use $2 but this will need to be done for other variables too in some messages) (https://gerrit.wikimedia.org/g/mediawiki/core/+/8d5ed3572dbd642ebd0788c6441e683d460b2da4/languages/Language.php#3965):

Strip out translations where there is no form specified and replace with empty string
// '{{GENDER:$2}}' -> ''
if ( message.match( /{{GENDER:\$2}}/ ) ) {
	return message.replace( /{{GENDER:\$2}}/, '' );
}
Strip out translations that only specify one form and use that
// '{{GENDER:$2|foo}}' -> 'foo'
if ( message.match( /{{GENDER:\$2\|[^|]+}}/ ) ) {
	return message.replace( /{{GENDER:\$2\|([^|]+)}}/, '$1' );
}
Convert 2 form to 3 form using male for unknown
// '{{GENDER:$2|bar|baz}}' -> '{{GENDER:$2|bar|baz|bar}}'
if ( message.match( /{{GENDER:\$2\|[^|]+\|[^|]+}}/ ) ) {
	message = message.replace( /{{GENDER:\$2\|([^|]+)\|([^|]+)}}/, '{{GENDER:$$2|$1|$2|$1}}' );
}
Replace 3 form with correct one based on gender
// '{{GENDER:$2|foo|bar|baz}}' -> 'foo' or 'bar' or 'baz'
return message.replace(
	/{{GENDER:\$2\|([^|]+)\|([^|]+)\|([^|]+)}}/,
	function ( match, form1, form2, form3, offset, string ) {
		switch ( gender ) {
			case 'male':
				return form1;
			case 'female':
				return form2;
			default:
				return form3;
		}
	}
);

Initial sketch of PLURAL support:

  1. We substitute count variables directly into the text, those aren't going to be jQuery objects or VNode instances, but rather plain text. We do this for all occurances in the message, not just the first, because some times the variable is reused. Eg for English logentry-delete-revision (focusing just on the plural part $5, not the much more complicated $4)
"$1 {{GENDER:$2|changed}} visibility of {{PLURAL:$5|a revision|$5 revisions}} on page $3: $4"
= "$1 {{GENDER:$2|changed}} visibility of {{PLURAL:1|a revision|1 revisions}} on page $3: $4"
   OR
= "$1 {{GENDER:$2|changed}} visibility of {{PLURAL:3|a revision|3 revisions}} on page $3: $4"

extract the relevant part

"{{PLURAL:1|a revision|1 revisions}}" or "{{PLURAL:3|a revision|3 revisions}}"

and convert to forms like mediawiki.jqueryMsg.js, then call mw.language.convertPlural which should handle things.
Looking at some of the more complicated uses of PLURAL in deployed log entry messages (https://codesearch.wmcloud.org/deployed/?q=%22logentry-.*PLURAL&i=nope&files=&excludeFiles=&repos=) it seems Arabic has the most forms and we should try and run tests with that too:

"logentry-delete-revision" -> "غيّر{{GENDER:$2||ت}} $1 إمكانية مشاهدة {{PLURAL:$5||مراجعة واحدة|مراجعتين|$5 مراجعات|$5 مراجعة}} في صفحة $3: $4"

For general collection of log entries to investigate parameters, viewers watchlist: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=watchlist&formatversion=2&wllimit=50&wlprop=ids%7Ctitle%7Cuser%7Ctimestamp%7Ccomment%7Cloginfo&wltype=log

Initial skitch of replacing non-string variables:

var message = "$1 deleted page $3"; // logentry-delete-delete for English after GENDER processing
var groups = message.split( /(\$\d+)/ );
// groups = [ "", "$1", " deleted page ", "$3", "" ]
// handle messages starting and ending with variables, don't create extra text nodes (TODO is this needed?)
if ( groups[0] === '' ) {
	groups.shift();
}
if ( groups[ groups.length - 1 ] === '' ) {
	groups.pop();
}
// groups = [ "$1", " deleted page ", "$3" ]
// TODO do we care about leading and trailing whitespace in the array elements?
if ( groups.indexOf( '$1' ) !== -1 ) {
	// userLink variable is what we want to have for $1
	groups[ groups.indexOf( '$1' ) ] = userLink;
}
if ( groups.indexOf( '$3' ) !== -1 ) {
	// targetPage variable is what we want to have for $3
	groups[ groups.indexOf( '$3' ) ] = targetPage;
}
// And so on for any other parameters, message-specific
return groups;
// returns array of string or (jQuery or VNode) objects to pass to jQuery.append() or a custom render function

Overall for the simple case:

  1. Replace string variables (gender and plural placeholders)
  2. Run updates for GENDER and PLURAL text
  3. Split and handle other variables

Danny, why don't you use the Watchlist messages, instead of inventing your own?

Danny, why don't you use the Watchlist messages, instead of inventing your own?

I'm not inventing my own, I'm going to use the existing messages, but they need to be formatted correctly - eg for page deletion the message is "$1 {{GENDER:$2|deleted}} page $3"

Sketch of gender fetching:

class GenderLookup {
	// Cache is for a specific wiki, at least for now - it would be pretty complicated to support a single cache for multiple wikis that are being processed in parallel, especially with dealing with users who may not have set their gender on all sites
	// Cache assumes gender doesn't change for any user during the viewing, including between refreshes - it is unlikely that a user, who has made a log entry that is being shown, changes their gender preference in the time between the initial fetching of their gender and any subsequent refresh, and it makes things a whole lot easier
	this.cache = mw.Map; // key is username, value is gender "male", "female", or "unknown"

	this.api = mw.ForeignApi; // api to use for this wiki

	saveResults: function ( apiResult ) {
		// Convert raw api result to the format that mw.Map.set will accept
		toSet = {};
		apiResult.query.users.forEach( u => toSet[ u.name ] = u.gender )
		this.cache.set( toSet );
	}

	loadGenders: function ( usernames ) : Promise {
		// Like SiteBase.actuallyGetWatchlist() except simpler recursion, load the first 50 and
		// then call saveResults(), either resolve immediately or call recursively with the next,
		// Promise doesn't include the actual genders, this is called after we analyze all of the
		// log entries to figure out which genders we needed to fetch 
	}


	getGender: function ( username ) : string {
		// Fallback to "unknown" in case we missed a lookup
		return this.cache.get( username, 'unknown' );
	}

}

Basic sketch of gender support (needed for all log entries):

(... too long, see above for original)

I realized that in order to simplify things, and to easily handle variables other than $2 without needing to create RegExp objects, we should do some simplifications of the message when its first loaded, and then handle needing multiple variables with genders by substituting them and then using a regex with a global modifier:

Initial updates for all messages
// Handle {{GENDER}} tags that don't actually depend on the gender given
// 1. For translations where there is no form specified, replace with empty string: '{{GENDER:$2}}' -> ''
message = message.replace( /{{GENDER:\$\d}}/g, '' );
// 2. For translations where there is only one form specified, use that: '{{GENDER:$2|foo}}' -> 'foo'
message = message.replace( /{{GENDER:\$\d\|([^|]+)}}/g, '$1' );
// 3. For translations that only specify 2 forms, add the 3rd form using male for unknown: '{{GENDER:$2|bar|baz}}' -> '{{GENDER:$2|bar|baz|bar}}'
message = message.replace( /{{GENDER:\$(\d)\|([^|]+)\|([^|]+)}}/g, '{{GENDER:$$$1|$2|$3|$2}}' );
Actual updates with genders for rendering
// Starting from the pre-processed message above, substitute all gender variables, which should each have a value of "male", "female", or "unknown" (code should enforce this)
// Then:
message = message.replace(
	/{{GENDER:(male|female|unknown)\|([^|]+)\|([^|]+)\|([^|]+)}}/g,
	function ( match, gender, form1, form2, form3, offset, string ) {
		switch ( gender ) {
			case 'male':
				return form1;
			case 'female':
				return form2;
			case 'unknown':
				return form3;
		}
	}
);

This should be able to handle all valid cases, and works for messages with multiple gender variables, eg block logs, user right logs.

For substituting gender and plural variables, use

Replace all
// Where paramName is '$2', '$5' or similar, and paramValue is the result to replace it with, and message is the source string
// Need to replace all matches, string.replaceAll is not always available, so use a regex
var regex = new RegExp( mw.util.escapeRegExp( paramName ), 'g' );
return message.replace( regex, paramValue );

For extracting plural into a form usable by mw.language.convertPlural, based on jqueryMsg

// Where message is a string like '{{PLURAL:10|one edit|12=a dozen edits|10 edits}}' from {{PLURAL:$5|one edit|12=a dozen edits|$5 edits}}
message = message.replace(
	/{{PLURAL:(\d+)\|([^}]+)}}/g,
	function ( match, num, formsStr, offset, string ) {
		// We know that `num` should be a numerical string, because we just substituted it. Convert to a number for mw.language
		num = parseInt( num, 10 );

		// Split up form to either explicit (use a specific message if the number is exactly the one specified) or normal (use a specific message based on the plural rules for the language)
		var forms = formsStr.split( '|' );
		var explicitForms = {};
		var normalForms = [];
		forms.forEach(
			function ( form ) {
				if ( /^\d+=/.test( form ) ) {
					// Explicit plural forms like 12=a dozen
					var explicitFormNumber = parseInt( form.split( /=/ )[ 0 ], 10 );
					explicitForms[ explicitFormNumber ] = form.slice( form.indexOf( '=' ) + 1 );
				} else {
					normalForms.push( form );
				}
			}
		);

		return mw.language.convertPlural( num, normalForms, explicitForms );
	}
);

For block logs that are partial blocks for a namespace, we need to fetch the local names of each namespace, plus the local version of the blanknamespace message, since different wikis have different names for their main namespace and have different configured extra namespaces. Use the query

{
	"action": "query",
	"meta": "siteinfo|allmessages",
	"siprop": "namespaces",
	"ammessages": "blanknamespace",
	"format": "json",
	"formatversion": "2"
}

eg https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&meta=siteinfo%7Callmessages&siprop=namespaces&ammessages=blanknamespace&format=json&formatversion=2

This should be the only message we need to fetch per wiki, the rest we will use the central wiki for. Will probably add block log support not in the first round since this will take some extra work. It appears that namespace names other than the main namespace are always in English? Based on looking at zhwiki's Special:BlockList for partial blocks in Chinese, even if the rest of the description is in Chinese the namespace names are in English.

Change 714147 had a related patch set uploaded (by DannyS712; author: DannyS712):

[mediawiki/extensions/GlobalWatchlist@master] Handle rendering edits and log entries separately

https://gerrit.wikimedia.org/r/714147

Hacks to test new javascript in production by bypassing ResourceLoader's logic

1) Visit w/load.php?modules=ext.globalwatchlist.specialglobalwatchlist on patchdemo or wherever the new code is, and copy
2) Visit https://meta.wikimedia.org/w/load.php?modules=ext.globalwatchlist.specialglobalwatchlist, identify the current version (ext.globalwatchlist.specialglobalwatchlist@12789 -> 12789) and in the new code change the implemented version
3) delete from current module registry: `delete mw.loader.moduleRegistry['ext.globalwatchlist.specialglobalwatchlist']`
4) delete from current store: `delete mw.loader.store.items['ext.globalwatchlist.specialglobalwatchlist@12789']`
5) run the implement script from 1) with the changed version from 2) in the console
6) save the changed version `mw.loader.store.add( 'ext.globalwatchlist.specialglobalwatchlist' )`

But, that seems to have broken the messages, and I'm not sure how to load those properly...
Add a snippet (new mw.Api()).loadMessages(Object.keys(mw.loader.moduleRegistry['ext.globalwatchlist.specialglobalwatchlist'].messages)); to fix that, plus a similar snippet to mw.load dependencies just added (mw.loader.load(['mediawiki.language','mediawiki.util']);), both put directly at the start of the declaration of MultiSiteWrapper or similar, also remove the line only enabling it in dev mode

Reveals that

  • protection description from api is not identical to what is shown onwiki ("autoconfirmed" and "sysop" vs "Allow only autoconfirmed users" and "Allow only administrators") - low priority
  • comment should not be included if empty (we do this currently, but not in the patch)
  • italics for log reason - low priority

https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=logevents&formatversion=2&leaction=protect%2Fprotect

Part of the reason I was looking at re-implementing the GENDER and PLURAL handling was to avoid needing to depend on the mediawiki.jqueryMsg ResourceLoader module. But, I just realized that the mediawiki.api module, which we already depend on and use heavily, itself loads the mediawiki.jqueryMsg, so we can add an explicit dependency with no additional cost and then make use of the message parsing there for the non-node replacements