Page MenuHomePhabricator

Make contextual attributes information more clear in the metrics platform documentation
Closed, ResolvedPublic

Description

<Should link to either a User Story or Epic>

Description

We need to answer user questions about contextual attributes in our metrics platform documentation:

See slack thread for additional information: https://wikimedia.slack.com/archives/C01DFMX6QLB/p1716256483547319

Acceptance Criteria

Required

Event Timeline

I'd also recommend making sure there are definitions documented for contextual attributes where it may be unclear.

For example, the definition of performer_active_browsing_session_token is currently unclear to me as there are various session type tokens defined in current instruments. I believe it's the same as the activity session defined in this data hub entry but it would be helpful to confirm.

Are they still called contextual attributes in the current version of Metrics Platform?

This is a good first question.

A couple of searches of the data-engineering/metrics-platform codebase indicates that we're referring to them as:

  • Context attributes
  • Contextual attributes
  • Context values
  • Contextual values
  • Context data fields

I think this task could extend to an MR to fix this once we decide on a term to use.

While I originally settled on contextual attributes, my preferences are: 1) context fields; and 2) context values. I have a slight preference for 1 because it matches the verbiage when talking about events and event schemas (events have fields, event schemas have fields).

To be honest I don't have a strong opinion about this. Everything sounds too similar for me (probably because the way my mind translates).

Anyway I like the field term rather than attribute or value for the same reason that @phuedx has mentioned before. I guess they are always considered as fields for folks who work with the generated data and, in the end, they are the ones who has to deal with all this. About context and contextual, they are almost the same for me but I'm going to vote for context (does it sound better?). So I can say my preference is context field

ah - naming rears its head once again

Renaming contextual_attributes and propagating that change throughout all the schemas and client libraries and mpic and current instruments in prod feels like adding to an already full plate atm which is also why I'm fine with leaving it as is and giving it a clear definition in docs. But maybe there's never a good time to do a name change like this and better now than later when we have dozens more instruments in prod.

So if we are committed to changing it, I'm fine with context fields (seems simpler to grok imho). Would the timing be better tho to do it once the pending buy/build decision is made? I suppose this is ultimately a product decision?

Renaming contextual_attributes and propagating that change throughout all the schemas and client libraries and mpic and current instruments in prod feels like adding to an already full plate atm which is also why I'm fine with leaving it as is and giving it a clear definition in docs. But maybe there's never a good time to do a name change like this and better now than later when we have dozens more instruments in prod.

Propagating the change and this being a low priority (which may be even lower in the coming days) is a good point. FWIW it seems mostly limited to inline documentation and variable names. I'm going to double-check this and document the cases where non-documentation and/or backwards-incompatible changes would need to be made. This task can then be reprioritised.

Renaming will fall towards tech debt clean up.

@apaskulin do you feel they are currently adequately covered as is in the user docs?

Thanks for the ping!

Are they still called contextual attributes in the current version of Metrics Platform?

I'm in favor of keeping "contextual attributes". I agree that "context fields" is simpler, but I don't think the benefit is enough to justify the work of changing it. As long as the documentation is clear, I think the current name is ok.

Where can one look to know what are the currently-supported contextual attributes?

Currently in the docs, contextual attributes are documented as part of the step to create an instrument specification. There are two lists of available contextual attributes, one for JS and one for PHP. The docs link to each of these files in the code and point users to the schemas to look up the descriptions of each field. Even though this is a multi-tab flow, I think this is the best option for now, so we avoid the work of manually duplicating the information between the three places. In the future, maybe we can leverage some type of MPIC-related automation to show this info all in one place. The reason this is presented so early in the overall create-an-instrument flow is that selected contextual attributes can impact the risk tier as defined by the data collection guidelines.

Should https://wikitech.wikimedia.org/wiki/Metrics_Platform/Event_Schema be marked as deprecated or just updated?

This is a question I have as well. (Tracked as part of T372689) I still need to read through this page and see how much content here is unique and not duplicated in the schema descriptions, but ideally, I'd like to be able to delete this page as use the schemas as the source of truth. I'm hoping to find a visualization tool to make the schemas easier to read (T372680).