Page MenuHomePhabricator

Understanding first day: determine which instrumentation can be re-used
Closed, ResolvedPublic


Other teams have probably already instrumented parts of the new editor experience that we care about, and could re-use for our own purposes. This task is about discovering those and figuring out whether things we need are already being recorded (or if there is code that can be revived to do so.)

One use case that stands out is Visual Editor, because we are interested in knowing how many new editors begin, but do not complete, their edits. Assigning to @JTannerWMF to set up the right conversations to figure that out.

Visual Editor is only one example, and there may be others.

Event Timeline

Assigning to @kostajh so he can add notes and action items.

Looking at, here are my observations:

I'm probably missing something, so @nettrom_WMF please review the list closely and let me know your thoughts as well.

@kostajh -- thank you for doing such a thorough job. This gives me confidence that we're making to sure to use whatever existing code and data could be relevant, instead of reinventing the wheel. In general, please see the evolving specifications on T205758 to evaluate which of the existing schemas can help us. One thought I had while reading your list:

Regarding "Thanks", I feel like this must be stored in the Mediawiki database, because this page exists in public:

@kostajh – Kudos from me too, excellent job on going through these!

I looked through the list of schemas and did my best to cross-reference them with the questions we're looking to answer, and I could not find any new candidates to be added.

Here are some comments on the list above:

  • PrefUpdate and email: I don't think this captures whether a user adds/updates their email address, as that setting seems to be handled differently in the MediaWiki code, and I did some exploration of the logged data and couldn't find anything there either. When it comes to email verification, that's available in the user table as it stores the timestamp of the verification. So as suggested, we might want to have a schema that logs when a user adds or updates their email address.
  • Thanks: that's stored in the logging table with log_type = 'thanks' and log_action = 'thank' (hence the Special:Log page @MMiller_WMF mentioned)
  • Revert/undo: The SHA1 checksum in the revision table allows us to identify identity reverts after the fact. As far as I understand, those are most reverts, so I'm not sure if it's worth the effort to do more. I noticed that the "undo" link has a specific URL, so that could allow us to identify other types of reverts.
  • Block status: this can be inferred from the logging table as blocks are logged there, but maybe it's useful for us to also capture this in the schema? It's the kind of transient status that I'm unsure about the cost of calculating, I haven't worked with the blocking data.
  • PageCreation: these have been logged using an EventBus schema since mid-2017, so that is available for all wikis. I don't know where that schema is documented, but it logs a whole bunch of fields. See the mediawiki_page_create_3 table in the log database for the most recent data.
  • PageDeletion: page deletions are also logged in the logging table. Not sure if we want more information than what is captured there?

I can't think of anything else. Thanks again for a great review, an excellent starting point @kostajh!

The final step on this task is to list the existing schemas we intend to use in our analysis. I think @nettrom_WMF should list them, and then we'll resolve this task and move on to T206802.

From what I can tell, the schemas that we intend to use (apart from the one we're developing) are:

  • Echo
  • EchoInteraction
  • Edit
  • GettingStartedRedirectImpression
  • GuidedTourButtonClick
  • GuidedTourExited
  • GuidedTourExternalLinkActivation
  • GuidedTourGuiderHidden
  • GuidedTourGuiderImpression
  • GuidedTourInternalLinkActivation
  • PrefUpdate
  • ServerSideAccountCreation

We also intend to use the page creation data that is captured through the EventBus and stored in the Data Lake (the mediawiki_page_create table in the event database).