For demo purposes it would be nice to prepare some Superset views that can show the relevant metrics we're observing.
Acceptance criteria
TBD
For demo purposes it would be nice to prepare some Superset views that can show the relevant metrics we're observing.
Acceptance criteria
TBD
Looping @nettrom_WMF to give us some guidance on which views would be more interesting to create in the context of the experiment.
I've looked for other instances that we've had to generate fake event data from a jsonschema and could not find any examples...
Maybe the easiest thing we can do is to grab an existing event database table based on the web/base schema (@phuedx do we already have one in the data lake?),
and use SparkSQL to SELECT-INSERT into another table, modifying the desired fields as necessary. We could use SparkSql random functions to give random values.
Maybe the event generation could be a separate task from the creation of the superset dasboard?
I think that we can not play with Superset until we have the data, no?
The @wikimedia/json-schema-tools package that we maintain (and is integrated into the schema repositories) uses the json-schema-faker package. Perhaps we could use that tool for this?
Given that we're focused on the instrumentation of the Community Updates module, I think being able to monitor the impression rate and CTR of that module (ref the leading indicators) would be the most interesting.
When it comes to what a Superset dashboard monitoring the experiment could look like, the Levelling Up experiment leading indicators dashboard can give some ideas. Splits by experiment group and also by platform are typically most useful.
The @wikimedia/json-schema-tools package that we maintain (and is integrated into the schema repositories) uses the json-schema-faker package. Perhaps we could use that tool for this?
Oh! Missed that. I ended up using existing data from another table to generate the fake data (I think it's a good approach, since it generates more realistic data). See T374699.
Here's an example Superset dashboard for the community updates fake data.
https://superset.wikimedia.org/superset/dashboard/550
Awesome example dashboard, great work!
Nitpicking, I'd change the format of the CTR charts to use percentages. For the Line chart, find the "Y Axis Format" in the "Customize" tab and change it to a percentage-based format (I prefer one decimal digit for these). For the table there's "Customize columns" in the "Customize" tab, choose the "ctr" column and then the "Number formatting" tab in the popup that comes up.
@Cyndymediawiksim: This task is open and its associated sprint project is archived. Please associate an active project tag to this task so it can be found on workboards, or set the task status to resolved if no further work is needed. Thanks!
Results of this experiment captured in Metrics Platform Integration: Verify that we can query for the data points that were captured by Metrics Platform on Superset or Turnilo.