Stream cc map should not be generated on every pageload
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	mpopov
	Jun 23 2020, 7:33 PM

Description

Given an example stream config:

let CONFIG = {
	"edit": {},
	"edit.growth.experiments.help_panel": {},
	"edit.mobile.app": {},
	"edit.mobile.web": {}
};

a map of which streams should be cc'd (when any given stream is logged to) looks like

{
   "edit": [ "edit.growth", "edit.mobile" ],
   "edit.growth": [ "edit.growth.experiments" ],
   "edit.growth.experiments": [ "edit.growth.experiments.help_panel" ],
   "edit.mobile": [ "edit.mobile.app", "edit.mobile.web" ]
}

This map can be generated via the following:

/**
 * Generate a map for stream cc'ing.
 *
 * This enables us to determine which streams can be cc'd when logging an event
 * to a stream (e.g. edit ~> edit.growth).
 *
 * Stream cc-ing should be done only 1 level deep to avoid duplication. For
 * example, 'edit' shouldn't cc 'edits.growth' AND 'edits.growth.test', since
 * 'edits.growth' would, in turn, cc 'edits.growth.test'.
 *
 * Instead of working stream by stream to find children streams for each parent
 * stream, we work backwards by constructing parent streams from the children.
 * This enables children streams such as 'edits.growth.test' to be cc'd when
 * logging events to 'edits' even when parent streams 'edits' and/or
 * 'edits.growth' are not explicitly present in the stream configuration.
 */
mapCopyStreams = function( config ) {
	var cc_map = {}; // map of streams which are to be cc'd
	for (stream in config) {
		let s = stream.split( '.' );
		let n_prefixes = s.length - 1;
		if ( n_prefixes > 1 ) {
			for (i = 1; i <= n_prefixes; i++) {
				var child = s.slice(0, i + 1).join( '.' );
				var parent = s.slice(0, i).join ( '.' );
				if ( !cc_map.hasOwnProperty( parent ) ) {
					Object.defineProperty( cc_map, parent, {
						value: [ child ],
						writable: true,
						enumerable: true
					});
				} else {
					if (!cc_map[parent].includes( child ) ) {
						cc_map[parent].push( child );
					}
				}
			}
		} else if ( n_prefixes === 1) {
			if ( !cc_map.hasOwnProperty( s[0] ) ) {
				Object.defineProperty( cc_map, s[0], {
					value: [ stream ],
					writable: true,
					enumerable: true
				});
			} else {
				if ( !(cc_map[parent].includes( stream )) ) {
					cc_map[s[0]].push( stream );
				}
			}
		} else {
			// no work to be done
		}
	}
	return cc_map;
};

and can be verified by running

let COPIED = mapCopyStreams( CONFIG );
console.log( JSON.stringify(COPIED, null, 2) );

{
  "edit": [
    "edit.growth",
    "edit.mobile"
  ],
  "edit.growth": [
    "edit.growth.experiments"
  ],
  "edit.growth.experiments": [
    "edit.growth.experiments.help_panel"
  ],
  "edit.mobile": [
    "edit.mobile.app",
    "edit.mobile.web"
  ]
}

This is quite an expensive operation to perform at every pageload! It'd be great if, within EventLogging code, that map was simply just available and accessible the same way the stream config is.

Details

	Subject	Repo	Branch	Lines +/-
	[WIP] Generate Stream CC Map, CC streams in submit()	mediawiki/extensions/EventLogging	master	+173 -23

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Declined	None	T259734 BUOD-KR1-Q4+: Certify that analytics schema and instruments have been upgraded to use the MEP system (clearing the legacy system for sunsetting)
Declined	Ottomata	T259157 BUOD-KR1-Q3: Require that all new schema/instruments are created with the MEP system
Declined	None	T259704 BUOD-KR1-Q2: Upgrade MEP clients to full release status
Resolved	phuedx	T274175 Control instruments with Stream Configuration
Resolved	phuedx	T259708 MEP Client MediaWiki JS (non-essential features)
Declined	mpopov	T256165 EventLogging MEP Upgrade Phase 3 (Stream cc-ing)
Declined	None	T256169 Stream cc map should not be generated on every pageload

Event Timeline

mpopov created this task.Jun 23 2020, 7:33 PM

Restricted Application added subscribers: Masumrezarock100, Aklapper. · View Herald TranscriptJun 23 2020, 7:33 PM

mpopov mentioned this in T256165: EventLogging MEP Upgrade Phase 3 (Stream cc-ing).Jun 23 2020, 7:34 PM

mpopov moved this task from Inbox to Done! on the Better Use Of Data board.Jun 23 2020, 7:42 PM

mpopov moved this task from Incoming to Event Platform on the Analytics board.

mpopov updated the task description. (Show Details)Jun 23 2020, 7:57 PM

mpopov renamed this task from EventStreamConfig should generate and provide the stream cc map to Stream cc map should not be generated on every pageload.Jun 24 2020, 12:24 PM

@Ottomata: do you think we can incorporate that algorithm into EventLogging in a way that the cc map isn't generated client-side on every pageload? If I write the PHP version, can we put it in EventLoggingHooks.php similar to loadEventStreamConfigs? It would generate the map from $config->get( 'EventLoggingStreamNames' ) and make it available client-side as config.streamCCMap (or something like that).

I don't fully grasp exactly what you need, but putting this logic in EventLogging ext makes the most sense. When I first saw this ticket titled for EventStreamConfig I wasn't so sure, but in EventLogging that seems fine. EventLogging is the only place that will know what to do with these 'CC' streams anyway :).

Alternatively, could you just make a config setting that indicated the parent stream for a 'CCed' child stream?

	"edit.mobile.app": {"cc_by": "edit.mobile" },

Then in EventLogging JS when you get a submit for "edit.mobile", you could iterate over all stream configs and look for ones with the cc_by (name TBD) setting and also submit to those. Or you could build up the map in the JS from stream config itself. An explicit config like this sounds saner than inferring some hierarchy from some naming convention, which will probably not be held by other non EventLogging streams (e.g. mediawiki.revision-create, mediawiki.page-delete, etc.).

In T256169#6252704, @Ottomata wrote:
Alternatively, could you just make a config setting that indicated the parent stream for a 'CCed' child stream?
	"edit.mobile.app": {"cc_by": "edit.mobile" },
Then in EventLogging JS when you get a submit for "edit.mobile", you could iterate over all stream configs and look for ones with the cc_by (name TBD) setting and also submit to those.

It'd be faster to lookup an array of stream names in a dictionary by stream name than to loop through the whole thing on every call.

Or you could build up the map in the JS from stream config itself. An explicit config like this sounds saner than inferring some hierarchy from some naming convention

That's what mapCopyStreams in the task description does, and it does it in a way that a parent stream does not have to be configured.

For example, suppose you have streams "edit.growth" and "edit.mobile" in the stream config but not "edit". If you have instrumentation that logs to "edit" stream (which does not exist), we want events to be auto-logged to "edit.growth" and "edit.mobile" streams via the cc feature.

I just don't think it'd be good to build that map client-side on every pageload, especially as the list of streams grows.

For example, suppose you have streams "edit.growth" and "edit.mobile" in the stream config but not "edit".

What happens if an instrumentation produces directly to "edit.growth"? Is that ok? What happens if "edit.growth.experiment" is also in stream config?

Generating whatever this is in the EventLogging extension should be fine. I'd be careful about using some a naming convention to figure out what gets 'CCed' to what though. It'll probably work in most cases, but I'd predict you'll run into some unexpected bugs when people start doing weird things with it.

You could make up another EventLogging config variable for this where you keep an explicit map?

$wgEventLoggingStreamCCMap = [
   "edit" => [ "edit.growth", "edit.mobile" ],
   "edit.growth" => [ "edit.growth.experiments" ],
   "edit.growth.experiments" => [ "edit.growth.experiments.help_panel" ],
   "edit.mobile" => [ "edit.mobile.app", "edit.mobile.web" ]
]

And export this to EventLogging JS via RL hooks like other configs are. Then if a stream is not in stream config, EventLogging could look up the stream name to see if it is in the streamCCMap and submit to each of those streams instead. Or do both?

Change 618306 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[mediawiki/extensions/EventLogging@master] [WIP] Generate Stream CC Map, CC streams in submit()

https://gerrit.wikimedia.org/r/618306

gerritbot added a project: Patch-For-Review.Aug 4 2020, 1:03 PM

The WIP patch I uploaded is just so I don't lose the work I did on this, especially if we decide to keep the feature and use the implicit mapping then at least the PHP version of the JS algorithm in the description is already done.

Change 618306 abandoned by Bearloga:
[mediawiki/extensions/EventLogging@master] [WIP] Generate Stream CC Map, CC streams in submit()

Reason:
Likely going to use a manual, rather than automatic solution for stream cc-ing feature when the time comes

https://gerrit.wikimedia.org/r/618306

• Mholloway subscribed.Oct 3 2020, 2:57 AM

• sdkim moved this task from Done! to Inbox on the Better Use Of Data board.Oct 29 2020, 4:38 PM

• jlinehan moved this task from Inbox to Sign-off on the Better Use Of Data board.Feb 8 2021, 6:41 PM

Closed as abandoned

Stream cc map should not be generated on every pageloadClosed, DeclinedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Stream cc map should not be generated on every pageload
Closed, DeclinedPublic
Actions

Related Objects
Search...