Page MenuHomePhabricator

[CIM] All-edit-types option not aggregating properly
Open, Needs TriagePublic3 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Making a get request using the all-edit-types

What happens?:

  • The response body does is not returning aggregate edit types.

What should have happened instead?:

  • The response body should return data with aggregate edit types.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):
CIM and AQS 2.0 services

Details

TitleReferenceAuthorSource BranchDest Branch
Update test data to make sure we include examples of 'all-edit-types'repos/generated-data-platform/aqs/aqs-docker-cassandra-test-env!23xcollazoadd-all-edit-categoriesmain
Customize query in GitLab

Event Timeline

EChukwukere-WMF renamed this task from [CIM} All-edit-types option not aggregating properly to [CIM] All-edit-types option not aggregating properly.Fri, Jun 14, 12:25 PM
EChukwukere-WMF assigned this task to SGupta-WMF.
EChukwukere-WMF updated Other Assignee, added: SGupta-WMF.
EChukwukere-WMF set the point value for this task to 3.
EChukwukere-WMF updated Other Assignee, removed: SGupta-WMF.

@EChukwukere-WMF What are the endpoints that show this error? Can you give an example URL?

@SGupta-WMF I imagine the problem is not aggregation, right? Since the underlying data is already aggregated?

@mforns and @EChukwukere-WMF The problem is that we do not have any aggregation in test environment , previous AQS 2.0 services had some of the rows like this
"analytics.wikimedia.org","all-wikipedia-projects","all-sites","daily","20180101","13814000-1dd2-11b2-8080-808080808080","","128209697","33568105","94641592"
We got this from production cassandras , I am unsure how we should handle this . There can be multiple approaches -

  • Copy aggregation logic from prod to test environment .
  • Add multiple rows having all-edit-types as edit-type.
  • Copy aggregated data from cassandra db.

@mforns @xcollazo What do you suggest?

@SGupta-WMF Do you mean that the CSV data in the test environment does not have the aggregated rows?

@SGupta-WMF Do you mean that the CSV data in the test environment does not have the aggregated rows?

This is highly likely since we only included a couple rows in the test data. From me on Slack:

I figured that the top_* tables could use 10,000 rows, and the rest 1,000 rows so that they are easy to add to git. If you’d rather have more rows, let me know.

So it would be easy for those rows to miss the special case of all-edit-types.

I can regenerate the datasets that would be affected by this, and make sure to include rows of the special case.

Nevermind! There were actually 2 datasets that did not include all-edit-types. I will link the MR in a sec.

Test status: QA PASS

tested the requests for top-edited-categories-monthly and top-editors-monthly

url_full = 'http://localhost:8096/metrics/commons-analytics/top-editors-monthly/Allah_medallion_in_Hagia_Sophia/shallow/all-edit-types/2023/11'

url_full = 'http://localhost:8096/metrics/commons-analytics/top-edited-categories-monthly/deep/all-edit-types/2023/11'

Both requests now generate a 200 status code and appropriate data