Page MenuHomePhabricator

AQS 2.0: Edit Analytics: Implement endpoints
Closed, ResolvedPublic1 Estimated Story Points

Description

Implement endpoints for the Edit Analytics service. These endpoints are Druid-based.

Completion criteria: the following endpoints are implemented:

  • edits/aggregate
  • edits/per-page
  • bytes_difference/net/aggregate
  • bytes_difference/net/per-page
  • bytes_difference/absolute/aggregate
  • bytes_difference/absolute/per-page
  • edited_pages/new
  • edited_pages/aggregate
  • edited_pages/top-by-edits
  • edited_pages/top-by-net-bytes-difference
  • edited_pages/top-by-absolute-bytes-difference

See the parent task for discussion on a reusable package for commonalities between endpoints in this service and endpoints in the Editor Analytics service. There is additional related discussion in T288301: AQS 2.0:Wikistats 2 service.

Extremely rough proof-of-concept code for querying Druid can be found here.

Keep in mind that editors analytics (the other Druid-base service) endpoints have been already implemented in T327829, so that service can be a good source of knowledge about how to query Druid. As far as we have explored query types are pretty similar in both services.

Useful information about the Druid schema can be found here

Some information about which data is available on Druid test environment can be found in T336405

Event Timeline

Sfaci updated the task description. (Show Details)
Sfaci removed FGoodwin as the assignee of this task.
Sfaci triaged this task as Medium priority.
Sfaci edited projects, added AQS2.0 (Sprint 10); removed AQS2.0.
Sfaci added a subscriber: FGoodwin.
Sfaci moved this task from Next Up to In Progress on the AQS2.0 (Sprint 10) board.
Sfaci removed a subscriber: FGoodwin.

Change 937043 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[generated-data-platform/aqs/edit-analytics@main] [DNM] All endpoints are done and seem to work using reusable Druid functions to reduce code duplication. The purpose of this DNM is review the reusable functions for querying Druid to see if they are ok or could be improved. Or even if code duplication could be reduce more. I'll keep working on it but I think it's the right time to listen to other opinions.

https://gerrit.wikimedia.org/r/937043

Change 937043 merged by BPirkle:

[generated-data-platform/aqs/edit-analytics@main] All endpoints are done and seem to work using reusable Druid functions to reduce code duplication. The purpose of this change is review the reusable functions for querying Druid to see if they are ok or could be improved. Or even if code duplication could be reduce more. I'll keep working on it but I think it's the right time to listen to other opinions.

https://gerrit.wikimedia.org/r/937043

Sfaci added a subscriber: SGupta-WMF.

After the last merge (that includes endpoints using the definitive Druid code) some refactoring about common Druid code must be done to move it to aqsassist to be shared by this and editor service. Moving to "Next Up" to keep working on that.

Sfaci removed Sfaci as the assignee of this task.Aug 9 2023, 1:14 PM

There is a bug in the edited_pages_aggregate_data.go file in the ProcessEditedPagesAggregateQuery function:

  • activityLevel was not considered when building the parameters map to pass later to the ProcessTimeseriesQuery function. Just the following code must be added when populating the query parameters:
var parameters = DruidQueryParams{
    . . .
    ActivityLevel:         activityLevel,
    . . .
}

Just keep in mind to fix it when we refactor all endpoints for this service

Change 984802 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[generated-data-platform/aqs/edit-analytics@main] Passing activity-level value to the query for edited_pages_aggregate

https://gerrit.wikimedia.org/r/984802

The last MR intends to fix the bug I commented in my previous comment

Sfaci set the point value for this task to 1.Dec 21 2023, 9:13 AM
Sfaci moved this task from AQS 2.0 Backlog to DONE on the AQS2.0 board.

Change 984802 merged by jenkins-bot:

[generated-data-platform/aqs/edit-analytics@main] Passing activity-level value to the query for edited_pages_aggregate

https://gerrit.wikimedia.org/r/984802

VirginiaPoundstone claimed this task.
VirginiaPoundstone moved this task from DONE to RESOLVED on the AQS2.0 board.