Page MenuHomePhabricator

Follow-Up Ticket for QA: Validate Sample Rate Adjustments
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

  • The purpose of this ticket is to validate the sample rate adjustments made for events originating through the Metrics Platform on both mobile and desktop platforms. These adjustments are necessary to align the sample rates with the current configurations of the Event Platform. Ensuring accurate sample rates is crucial for maintaining data integrity and consistency across platforms, which ultimately impacts the quality of data analysis.

User story

  • As an engineer, I want to validate the sample rate adjustments for desktop and mobile platforms, so that the sample rates reflect the configurations of the Event Platform and ensure data consistency and accuracy.

Requirements

  • Ensure that the sample rates for Metrics Platform desktop and mobile events are set as per the Event Platform configurations.
  • Validate that the local setup reflects the new sample rates accurately.
  • Obtain confirmation from a member of the Data Engineering/MP team on the correctness of the implementation.

BDD

  • For QA engineer to fill out

Test Steps

Go to
https://superset.wikimedia.org/

SELECT sample, session_id
FROM event.mediawiki_web_ui_actions
WHERE year = 2024 AND month = 4 AND day = 5;
SELECT sample, performer.session_id
FROM event.mediawiki_web_ui_actions
WHERE year = 2024 AND month = 6 AND day = 3;

Verify that sampling rates are now at 20%

Design

  • Not applicable (no mockups or design requirements for this validation task)

Acceptance criteria

  • The sample rates for both Metrics Platform web mobile and desktop events reflect the configurations set by the Event Platform.
  • Successful verification and validation of sample rates locally and across different instances.
  • Confirmation from Data Engineering/MP team on the correctness of the sample rate adjustments.
  • No discrepancies found during post-implementation testing.

Communication criteria - does this need an announcement or discussion?

  • This task may require discussion with the Data Engineering/MP team for confirmation and validation purposes but does not need a formal announcement.

Rollback plan

  • If the sample rate adjustments are found to be incorrect, revert to the previous sample rate configurations as documented before the changes were made. Coordinate with the Data Engineering/MP team to ensure the rollback is completed smoothly.

This task was created by Version 1.0.0 of the Web team task template using phabulous

Event Timeline

ovasileva subscribed.

Moving to sprint 5 to continue QA. @KSarabia-WMF - would this task require engineering efforts or would be be for QA done by @jwang? Also, any chance we can convert the description to the task template?

@ovasileva The plan was I could generate a quick report and have it be signed off by another engineer on the team, because @jwang is near the time of her leave.

ovasileva triaged this task as Medium priority.May 23 2024, 5:27 PM
ovasileva raised the priority of this task from Medium to High.
SToyofuku-WMF set the point value for this task to 3.May 23 2024, 5:36 PM

Report on SQL Query Analysis and Sampling Rate Fix

Summary of Conversation and Context

This task is a result of the conversation and SQL query analysis discussed in Slack between Kim and Jennifer. The discussion was initiated to analyze data from the mediawiki_web_ui_actions table following the Metrics Platform adoption of the Web team's instruments detailed in T351298. The key focus was on verifying and correcting the sample rates.

Participants:

  • Jennifer Wang
  • Kim Sarabia

Conversation Highlights:

  1. Initial Query Review:
    • Jennifer shared an SQL query to retrieve samples from the mediawiki_web_ui_actions table.
    • The query results from April showed a sample rate of 100%, which was incorrect.
  1. Sampling Rate Clarification:
    • Kim verified the sampling rate, leading to the clarification that the sample rate should match the old schema, which was not 100%.
    • The sample rate has now been corrected to 20%.

SQL Query Analysis

April 2024 Query Analysis

The April query:

SELECT sample, session_id
FROM event.mediawiki_web_ui_actions
WHERE year = 2024 AND month = 4 AND day = 5;

Data Summary:

  • Total Entries: 1000 (Automatically limited to 1000)
  • Sample Rate: 100%
  • Value Structure: Lists containing a numerical value of 1.0 and the string "pageview".
June 2024 Query Analysis

The June query:

SELECT sample, performer.session_id
FROM event.mediawiki_web_ui_actions
WHERE year = 2024 AND month = 6 AND day = 3;

Data Summary:

  • Total Entries: 100
  • Sample Rate: 20%
  • Value Structure: Lists containing a numerical value of 0.2 and the string "session".

Comparison of April vs. June Query Results

April 2024 Query:

  • Sample Rate: 100%
  • Entries: 1000 rows
  • Example Entry: [1.0, "pageview"], indicating all pageviews were included in the sample, which was incorrect and led to overestimation.

June 2024 Query:

  • Sample Rate: 20%
  • Entries: 100 rows
  • Example Entry: [0.2, "session"], indicating only 20% of sessions were included, which aligns with the expected behavior based on the old schema.

Key Observations and Conclusion

  1. Sample Rate Discrepancy:
    • April 2024 Query: The sample rate was incorrectly set at 100%, causing overestimation.
    • June 2024 Query: The sample rate has been corrected to 20%, ensuring a representative subset of user sessions.
  1. Corrective Measures:
    • The sampling logic was reviewed and fixed as per the comments in the Phabricator task T361962
  2. Verification:
    • The inclusion of performer.session_id in the June query allows for confirmation that each row represents a different session.

Conclusion

The sampling rates have been fixed from 100% in the April query to 20% in the June query. This adjustment ensures accurate data representation and prevents overestimation. The changes made in task T361962 corrected the sampling logic, and the current query accurately reflects the intended sample rate and data structure.

To verify the data, use the above queries in https://superset.wikimedia.org/

KSarabia-WMF updated the task description. (Show Details)

@SToyofuku-WMF

Bringing this conversation to Phab in the spirit of our earlier retro today on avoiding DMs :)

TL;DR I worked with Jennifer to look into your query, and wondered if the concerns about too many 100% sample rates were likely from before the patch was merged on May 20, 2024. Jennifer inquired about the patch date and potential cache impact. We agreed that the cache takes awhile to clear (around a couple of weeks) which could affect results. Jennifer provided the following query and confirmed that there is a drop, but there are definitely still some cached sample rates.

Query

SELECT sample.rate,sample.unit,  month, day, substr(meta.dt,1,10) AS event_date, count(distinct performer.session_id) AS sessions
FROM event.mediawiki_web_ui_actions
WHERE year = 2024 AND ((month=5 AND day>20) OR month=6)
GROUP BY sample.rate,sample.unit, month, day , substr(meta.dt,1,10)
order by month desc, day desc

Screenshot 2024-06-05 at 3.31.19 PM.png (1×1 px, 101 KB)

Link

Hope that's helpful and let me know if you need anything else in the QA process. Thanks again!

SToyofuku-WMF subscribed.

Ah, gotcha! If there's cache impact here, then I'm very comfortable signing off saying I can see the introduction of the new sample rate on the 20th, I can confirm there is a significant drop, and it's only been a couple weeks, so this will likely still be propagating - moving to sign off thanks for following up on this ☺️

Value continues to trend down, gonna mark this as resolved