Page MenuHomePhabricator

Global Editor Metrics - HTTP API endpoints
Closed, ResolvedPublic

Description

For WE3.3.7, we need to serve global edited metrics from Druid and Cassandra. After T405039: Global Editor Metrics - Data Pipeline and T405040: Global Editor Metrics - backfill pageview metric data, metrics will be in Cassandra ready for querying.

This task is about designing, coding and deploying HTTP API service endpoints to query editor metric data in Druid and Cassandra.

Data Persistence prefers that Data Gateway is used for access to the underlying data. However, Data Gateway does not support aggregations (or filtering) that we need for these metrics. Another service will have to do the aggregations and filtering. This service is likely to be AQS.

Done is

  • Data Gateway HTTP endpoints to access underlying metric tables in Druid and Cassandra are deployed and accessible.
  • AQS HTTP endpoints to serve product requirements for metric rollups are deployed and accessible.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/page-analyticsmain+28 -28
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/page-analyticsmain+13 -13
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/edit-analyticsmain+6 -2
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/page-analyticsmain+60 -9
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/page-analyticsmain+1 K -0
operations/deployment-chartsmaster+11 -0
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/page-analyticsmain+1 -1
generated-data-platform/aqs/page-analyticsmain+1 K -130
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/edit-analyticsmain+17 -18
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/edit-analyticsmain+1 -1
operations/deployment-chartsmaster+1 -1
generated-data-platform/aqs/edit-analyticsmain+1 K -9
generated-data-platform/aqs/editor-analyticsmain+968 -3
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

WIP OpenApi Spec.

See it here
https://app.swaggerhub.com/apis/wikimedia-3f9/global_editor_metrics

(for the next 25 days until my trial ends ;) )

Change #1194731 had a related patch set uploaded (by Ottomata; author: Ottomata):

[generated-data-platform/aqs/editor-analytics@main] Add /editor/edited-pages endpoint

https://gerrit.wikimedia.org/r/1194731

Change #1195045 had a related patch set uploaded (by Ottomata; author: Ottomata):

[generated-data-platform/aqs/edit-analytics@main] Add edits/per-user timeseries endpoint

https://gerrit.wikimedia.org/r/1195045

Add edits/per-user timeseries endpoint (1195045) is ready for review. api doc:

GET /edits/per-user/{user-central-id}/{page-type}/{granularity}/{start}/{end}

Example granularity=daily response

{
  "items": [
    {
      "userCentralId": 93359232,
      "pageType": "content",
      "granularity": "daily",
      "results": [
        {
          "timestamp": "2021-04-02T00:00:00.000Z",
          "edits": 8
        },
        {
          "timestamp": "2021-04-03T00:00:00.000Z",
          "edits": 1
        },
        {
          "timestamp": "2021-04-04T00:00:00.000Z",
          "edits": 82
        }
      ]
    }
  ]
}

We are waffling bikeshedding around 'user' vs 'editor' for this API. I'm going to paste @Sfaci 's comments from Slack and then continue the discussion here.


@Sfaci wrote:
Regarding the naming thing, I would say we should focus on what the endpoint is responding to find a proper name for it. Why not use just edits as the endpoint itself (and to remove the per-user part)? Focusing on what the endpoint is responding, they are just edit metrics, right? the rest of the URL are just path variables (that probably should be query params but everything is already designed that way) to filter the response not only by user but by other fields like page-type, granularity and start and end dates

but the existing ones are created as you proposed, so maybe adding per-user is the best way here. We already have edits/per-page , for example (edited)

naming is hard in AQS. In this case, It seems we should have one endpoint instead of two, to filter by combining different query parameters. Only one edits endpoint with all the different parameters we have for per-user and per-page

Indeed! If we could refactor AQS and make edit count be a single endpoint with parameters to slice and dice to filter around dimensions, that would be quite nice :)

For our purposes, I'd like to stay consistent with what AQS is already doing. The endpoints I'm currently working on are:

GET /metrics/edits/per-user/{user_central_id}/{page_type}/{granularity}/{start}/{end}
GET /metrics/pageviews/per-user/{user_central_id}/{page_type}{granularity}/{start}/{end}
GET /metrics/pageviews/top-by-user/{user_central_id}/{page_type}/{year}/{month}

I'm basing these on existent AQS endpoints like edits/per-page, pageviews/per-article and pageviews/top-per-country.

So, what I'm really waffling on is 'user' vs 'editor', e..g per-user or per-editor?

More examples:
editors-analytics endpoints use both 'users' and 'editors', depending on the context, e.g. registered-users vs editor-type. commons analytics endpoints mostly use 'editors', but there is the edits-per-user-monthly which kind of is like what I am building now, but across all wikis instead of just commons.

I started with user because the main param for filtering is user_central_id. However, the will be no results for users with 0 edits, and I suppose any use with at least one edit is an 'editor'?

Also, the new pageviews endpoints are already confusing, and using 'user' in them might make them more confusing. pageviews/per-user shows the total number of pageviews to all pages ever edited by the user. pageviews/per-user reads more like it is pageviews by the user, which is incorrect.

So, with all that said, I'm leaning towards 'editor'. That would change the endpoints to:

GET /metrics/edits/per-editor/{user_central_id}/{page_type}/{granularity}/{start}/{end}
GET /metrics/pageviews/per-editor/{user_central_id}/{page_type}{granularity}/{start}/{end}
GET /metrics/pageviews/top-by-editor/{user_central_id}/{page_type}/{year}/{month}

Thoughts/objections?

Looking a bit more at existent Analytics API endpoints, perhaps the aggregate edits and pageviews endpoints should have aggregate in them? E.g. pageviews/aggregate, edits/aggregate etc. If I followed that pattern, the endpoints would maybe be:

GET /metrics/edits/aggregate/per-editor/{user_central_id}/{page_type}/{granularity}/{start}/{end}
GET /metrics/pageviews/aggregate/per-editor/{user_central_id}/{page_type}{granularity}/{start}/{end}
GET /metrics/pageviews/top/per-editor/{user_central_id}/{page_type}/{year}/{month}

pageviews/top/per-editor doesn't really follow any existent pattern, but it seems much more consistent than 'top-by-editor'. The pattern is then

{metric_name}/{aggregation_type}/{primary_filter_key}/{primary_filter_value/...{params}

@mforns @Sfaci Whatcha think?

Change #1196495 had a related patch set uploaded (by Ottomata; author: Ottomata):

[generated-data-platform/aqs/page-analytics@main] Add pageviews/aggregate/per-editor endpoint

https://gerrit.wikimedia.org/r/1196495

Change #1194731 abandoned by Ottomata:

[generated-data-platform/aqs/editor-analytics@main] Add /editor/edited-pages endpoint

Reason:

I70b9f69ec6426d8f1e462592605e8225dabbf234

https://gerrit.wikimedia.org/r/1194731

Looking a bit more at existent Analytics API endpoints, perhaps the aggregate edits and pageviews endpoints should have aggregate in them? E.g. pageviews/aggregate, edits/aggregate etc. If I followed that pattern, the endpoints > would maybe be:

GET /metrics/edits/aggregate/per-editor/{user_central_id}/{page_type}/{granularity}/{start}/{end}
GET /metrics/pageviews/aggregate/per-editor/{user_central_id}/{page_type}{granularity}/{start}/{end}
GET /metrics/pageviews/top/per-editor/{user_central_id}/{page_type}/{year}/{month}

Agreed. I have been reviewing deeper the Specification you shared and the rest of existing endpoints which I didn't remember very well, and what you have proposed looks cool and according to the patterns we already have. I have nothing to add

Discussed with @mforns today, and we aren't so sure about the aggregate/top thing I was proposing. What does it add for the user? not a lot. They are both 'aggregates' (all of these endpoints are). We briefly considered something like 'edits/timeseries/per-editor/...' but we didn't love it. We ended up on going back to simpler matching of AQS endpoints with:

GET /metrics/edits/per-editor/{user_central_id}/{page_type}/{granularity}/{start}/{end}
GET /metrics/pageviews/per-editor/{user_central_id}/{page_type}{granularity}/{start}/{end}
GET /metrics/pageviews/top-pages-per-editor-monthly/{user_central_id}/{page_type}/{year}/{month}

I'll proceed with this for now. Let us know if there are objections!

@mforns @Sfaci and @APizzata-WMF met today to discuss what to do about certain inconsistencies in AQS endpoints.

We decided:

AQS already does all of these things in many different ways. (see: T342018: compile list of known issues for triage post AQS 2.0 launch). These decisions are intended to be the 'right way' of resolving those API inconsistencies.

  • In order to facilitate migration of existent endpoints to these decisions, we will host the new Global Editor Metrics endpoints at a v2 URL.

Change #1195045 merged by Ottomata:

[generated-data-platform/aqs/edit-analytics@main] Add metrics/v3/edits/per_editor timeseries endpoint

https://gerrit.wikimedia.org/r/1195045

Change #1199337 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] AQS edit-analytics - deploy new edits/per_editor endpoint

https://gerrit.wikimedia.org/r/1199337

Change #1199337 merged by jenkins-bot:

[operations/deployment-charts@master] AQS edit-analytics - deploy new edits/per_editor endpoint

https://gerrit.wikimedia.org/r/1199337

Change #1199461 had a related patch set uploaded (by Ottomata; author: Ottomata):

[generated-data-platform/aqs/edit-analytics@main] Build on bookworm instead of bullsye

https://gerrit.wikimedia.org/r/1199461

Change #1199461 merged by jenkins-bot:

[generated-data-platform/aqs/edit-analytics@main] Build on bookworm instead of bullsye

https://gerrit.wikimedia.org/r/1199461

Change #1199464 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] edit-analytics - bump to build on bookworm

https://gerrit.wikimedia.org/r/1199464

Change #1199464 merged by jenkins-bot:

[operations/deployment-charts@master] edit-analytics - bump to build on bookworm

https://gerrit.wikimedia.org/r/1199464

Change #1199477 had a related patch set uploaded (by Ottomata; author: Ottomata):

[generated-data-platform/aqs/edit-analytics@main] edits/v3/per_editor instead of v3/edits/per_editor

https://gerrit.wikimedia.org/r/1199477

Change #1199477 merged by jenkins-bot:

[generated-data-platform/aqs/edit-analytics@main] edits/v3/per_editor instead of v3/edits/per_editor

https://gerrit.wikimedia.org/r/1199477

Change #1199479 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] edit-analytics - image bump to fix path route

https://gerrit.wikimedia.org/r/1199479

Change #1199479 merged by jenkins-bot:

[operations/deployment-charts@master] edit-analytics - image bump to fix path route

https://gerrit.wikimedia.org/r/1199479

Change #1196495 merged by jenkins-bot:

[generated-data-platform/aqs/page-analytics@main] Add pageviews/v3/per_editor endpoint

https://gerrit.wikimedia.org/r/1196495

Change #1200098 had a related patch set uploaded (by Ottomata; author: Ottomata):

[generated-data-platform/aqs/page-analytics@main] Bump base image to bookworm

https://gerrit.wikimedia.org/r/1200098

Change #1200098 merged by jenkins-bot:

[generated-data-platform/aqs/page-analytics@main] Bump base image to bookworm

https://gerrit.wikimedia.org/r/1200098

Change #1200099 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] page-analytics - bump image to get pageviews/v3/per_editor

https://gerrit.wikimedia.org/r/1200099

Change #1200099 merged by jenkins-bot:

[operations/deployment-charts@master] page-analytics - bump image to get pageviews/v3/per_editor

https://gerrit.wikimedia.org/r/1200099

I just deployed the pageviews/v3/per_editor endpoint. It will not work because there is no data behind it.

But, as requested, I added a ?mock_data=true GET param. This will make it return 200 and dummy data.

https://wikimedia.org/api/rest_v1/metrics/pageviews/v3/per_editor/11878393/monthly/20250101/20260101?mock_data=true

Update for most recent API endpoints:

GET /metrics/edits/v3/per_editor/{user_central_id}/{page_type}/{granularity}/{start}/{end}
GET /metrics/pageviews/v3/per_editor/{user_central_id}/{granularity}/{start}/{end}
GET /metrics/pageviews/v3/top_pages_per_editor/{user_central_id}/{granularity}/{start}/{end} (with only granularity=monthly supported)

Notable changes since T405041#11278588:

  • underscores instead of hyphens
  • 'v3' API (See also T407863)
  • no {page_type} parameter in pageviews APIs. This would require more data source and data pipeline work to lookup "page type". See also T409462: mediawiki.page_change.v1 event - add a page type field
  • top_pages_per_editor now has a {granularity} parameter and accepts {start} and {end}. Only "monthly" {granularity} is supported at this time. This will allow us to possibly support e.g. "yearly" in the future.

Change #1217523 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/deployment-charts@master] Adjust page-analytics values to access the data-gateway

https://gerrit.wikimedia.org/r/1217523

Change #1217523 merged by jenkins-bot:

[operations/deployment-charts@master] Adjust page-analytics values to access the data-gateway

https://gerrit.wikimedia.org/r/1217523

Change #1219207 had a related patch set uploaded (by Mforns; author: Mforns):

[generated-data-platform/aqs/page-analytics@main] Add pageviews/v3/top_pages_per_editor endpoint

https://gerrit.wikimedia.org/r/1219207

Change #1219207 merged by jenkins-bot:

[generated-data-platform/aqs/page-analytics@main] Add pageviews/v3/top_pages_per_editor endpoint

https://gerrit.wikimedia.org/r/1219207

Change #1219607 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/deployment-charts@master] Add new image to page-analytics service

https://gerrit.wikimedia.org/r/1219607

Change #1219607 merged by jenkins-bot:

[operations/deployment-charts@master] Add new image to page-analytics service

https://gerrit.wikimedia.org/r/1219607

Change #1219846 had a related patch set uploaded (by Mforns; author: Mforns):

[generated-data-platform/aqs/page-analytics@main] Sort top_pages_per_editor results by timestamp too

https://gerrit.wikimedia.org/r/1219846

Change #1219846 merged by jenkins-bot:

[generated-data-platform/aqs/page-analytics@main] Sort top_pages_per_editor results by timestamp too

https://gerrit.wikimedia.org/r/1219846

Change #1219866 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/deployment-charts@master] Bump up version of page-analytics service

https://gerrit.wikimedia.org/r/1219866

Change #1219866 merged by jenkins-bot:

[operations/deployment-charts@master] Bump up version of page-analytics service

https://gerrit.wikimedia.org/r/1219866

Change #1219867 had a related patch set uploaded (by Mforns; author: Mforns):

[generated-data-platform/aqs/edit-analytics@main] Change the Druid datasource for edits_per_editor endpoint

https://gerrit.wikimedia.org/r/1219867

Change #1219867 merged by jenkins-bot:

[generated-data-platform/aqs/edit-analytics@main] Change the Druid datasource for edits_per_editor endpoint

https://gerrit.wikimedia.org/r/1219867

Change #1219909 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/deployment-charts@master] Update edit-analytics to use the new image

https://gerrit.wikimedia.org/r/1219909

Change #1219909 merged by jenkins-bot:

[operations/deployment-charts@master] Update edit-analytics to use the new image

https://gerrit.wikimedia.org/r/1219909

Change #1219916 had a related patch set uploaded (by Mforns; author: Mforns):

[generated-data-platform/aqs/page-analytics@main] Correct top_pages_per_editor entity example values for better docs

https://gerrit.wikimedia.org/r/1219916

Change #1219916 merged by jenkins-bot:

[generated-data-platform/aqs/page-analytics@main] Correct top_pages_per_editor entity example values for better docs

https://gerrit.wikimedia.org/r/1219916

Change #1219921 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/deployment-charts@master] Bump up page-analytics version to include doc improvements

https://gerrit.wikimedia.org/r/1219921

Change #1219921 merged by jenkins-bot:

[operations/deployment-charts@master] Bump up page-analytics version to include doc improvements

https://gerrit.wikimedia.org/r/1219921

Change #1220387 had a related patch set uploaded (by Mforns; author: Mforns):

[generated-data-platform/aqs/page-analytics@main] Apply minor corrections to anotations for documentation site

https://gerrit.wikimedia.org/r/1220387

Change #1220387 merged by jenkins-bot:

[generated-data-platform/aqs/page-analytics@main] Apply minor corrections to anotations for documentation site

https://gerrit.wikimedia.org/r/1220387

Change #1220390 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/deployment-charts@master] Bump up the page-analytics service image

https://gerrit.wikimedia.org/r/1220390

Change #1220390 merged by jenkins-bot:

[operations/deployment-charts@master] Bump up the page-analytics service image

https://gerrit.wikimedia.org/r/1220390