Page MenuHomePhabricator

Create grafana dashboards using Prometheus
Closed, ResolvedPublic

Description

In order to finalize the statslib migration new dashboards using Prometheus as the data source need to be created. See steps from T350592: EPIC: migrate in use metrics and dashboards to statslib, most of the code has been migrated and merged, remaining steps are (6), (7) and (8):

  1. Identify the metric (or group of metrics) that will be to be converted.
  2. Create/assign a Phabricator subtask linked to this task (with granularity of individual metric or group of metrics) and update task description to reflect which task(s) have been created for which metric(s).
  3. Follow the migration process as outlined below.
  4. Secure/Conduct code review(s).
  5. Deploy the changes to production via the train (https://wikitech.wikimedia.org/wiki/Deployments/Train).
  6. Verify that the changes have been successfully implemented.
  7. Place the metrics subtask in a 2-3 week waiting period to allow prometheus time to establish 2-3 weeks of metric history
  8. After 2-3 week waiting period is complete, update the dashboard:
    • Save a copy of the dashboard using legacy metrics as-is into the Legacy grafana dashboard folder
    • Replace the old Graphite metric(s) with the new Prometheus metric(s) and save/update the live dashboard

Dashboards and Sections to migrate:

  • Special:Homepage and Suggested Edits
    • Page rendering (server-side)
    • Page rendering (client-side) (except for that one transfer-size panel => T383563)
    • Impact module
    • Loading suggested task (client-side)
    • Image recommendation service
    • Errors
    • Task pool
    • Task suggester
    • UserImpact API
    • [WIP] Time to editor read for suggested edits tasks
  • Growth team product KPI
    • Clicks/Save/Reverts
    • Questions posted to mentors or help desk
    • Mentorship
    • Account registration

Event Timeline

Sgs triaged this task as Medium priority.Jan 16 2025, 12:47 PM

Should we create a new dashboard that we all can edit or re-use the existing and tag WIP sections until finalized? I'm leaning towards the second since we would be able to look at both data sources at the same time. Thoughts? cc @Michael @Cyndymediawiksim

Edit: If we re-use the existing one we should do step 8.1 Save a copy of the dashboard using legacy metrics as-is into the Legacy grafana dashboard folder before.

Should we create a new dashboard that we all can edit or re-use the existing and tag WIP sections until finalized? I'm leaning towards the second since we would be able to look at both data sources at the same time. Thoughts? cc @Michael @Cyndymediawiksim

Yes, I agree with using the existing dashboard. That makes the progress more visible.

On the KPI board, "Questions asked to mentors via mentorship module on Special:Homepage or help panel" still needs migrating.

The panels in the "Mentorship" section still need to be double-checked.

Also, do we still need the two panels for the "Users (not) opted into Growth features" at the very bottom? As far as I can tell, we are providing our features to 100% of our users. Do we intend to change that again in the future? If not, then I'd propose we remove the GEHomepageNewAccountEnablePercentage config option and also this tracking.

@KStoller-WMF for consideration.

This should then probably considered as part of T367724: Reconsider GrowthExperiments feature flags in spirit.

Michael updated the task description. (Show Details)

On the KPI board, "Questions asked to mentors via mentorship module on Special:Homepage or help panel" still needs migrating.

I did that one. I'm observing less events in Prometheus than Graphite though.

The panels in the "Mentorship" section still need to be double-checked.

These are looking good, very similar values and trends

Also, do we still need the two panels for the "Users (not) opted into Growth features" at the very bottom? As far as I can tell, we are providing our features to 100% of our users. Do we intend to change that again in the future? If not, then I'd propose we remove the GEHomepageNewAccountEnablePercentage config option and also this tracking.

Good point—this was a leftover from when we had a control group without Growth features on pilot wikis. We removed that group because it confused experienced editors helping newcomers, and because newcomers clearly did better with Growth features enabled.

We should continue A/B testing new features, but in my view, the control group should never be "no Growth features." I think it's safe to remove those two panels unless we believe they might help us detect bugs—like users not being opted into Growth features—but that seems unlikely.

Also, do we still need the two panels for the "Users (not) opted into Growth features" at the very bottom? As far as I can tell, we are providing our features to 100% of our users. Do we intend to change that again in the future? If not, then I'd propose we remove the GEHomepageNewAccountEnablePercentage config option and also this tracking.

Good point—this was a leftover from when we had a control group without Growth features on pilot wikis. We removed that group because it confused experienced editors helping newcomers, and because newcomers clearly did better with Growth features enabled.

We should continue A/B testing new features, but in my view, the control group should never be "no Growth features." I think it's safe to remove those two panels unless we believe they might help us detect bugs—like users not being opted into Growth features—but that seems unlikely.

Thanks for clarifying! I created T394435 Remove GEHomepageNewAccountEnablePercentage config option to remove that.

Michael claimed this task.

T394435: Remove GEHomepageNewAccountEnablePercentage config option has been done.

I'll close this task as resolved. What needs to still be done for the overall statslib migration work:

  • prune the now Graphite tracking in our code: it doesn't do anything anymore
  • (later) remove the (hidden) instructions that read from (now read-only) Graphite in our Grafana boards. That should/could maybe happen after the read-only Graphite has been removed as well.