Page MenuHomePhabricator

compile list of known issues for triage post AQS 2.0 launch
Open, Needs TriagePublic

Description

Goal

Collect all inconsistencies found in AQS 1.0 services and some existing issues in AQS 2.0 services for possible future improvements

Current issues
{
    "detail": "no full months found in specified date range",
    "method": "get",
    "status": 400,
    "title": "Bad Request",
    "type": "about:blank",
    "uri": "/metrics/edits/aggregate/ab.wikipedia/user/content/monthly/20200201/20200229"
}
{
    "items": [
        {
            "project": "en.wikipedia",
            "access-site": "all-sites",
            "granularity": "monthly",
            "timestamp": "20200201",
            "devices": 875114780,
            "offset": 309778523,
            "underestimate": 565336257
        }
    ]
}

This issue is covered in T355517: [Breaking change] Unify the way to specify a date range for all services

{
    "items": [
        {
            "referer": "all-referers",
            "media_type": "all-media-types",
            "agent": "all-agents",
            "granularity": "monthly",
            "timestamp": "2020010100",
            "requests": 77922346640
        },
        {
            "referer": "all-referers",
            "media_type": "all-media-types",
            "agent": "all-agents",
            "granularity": "monthly",
            "timestamp": "2020020100",
            "requests": 74434493109
        }
    ]
}
{
    "items": [
        {
            "project": "ab.wikipedia",
            "editor-type": "user",
            "page-type": "non-content",
            "activity-level": "1..4-edits",
            "granularity": "daily",
            "results": [
                {
                    "timestamp": "2023-04-01T00:00:00.000Z",
                    "editors": 0
                },

This issue is covered in T355515: [Breaking change] Fix issues in naming conventions in the response structure

  • aqssassist common code for messaging, problems and loggers: [Refactoring task] [Done] During AQS 2.0 development some common code was added to aqsassist to manage messages, problems and loggers (some details at T343907: AQS 2.0: aqsassist - Extract some repeated common code from services to this module). Some services were deployed before completing that work so it's something we should do later to unify all services. This work is pending review and it haven't been done for any service. Keep in mind that Druid refactoring, also included in that task, is already done for both Druid-based services. It's a good opportunity to unify messages through all the services. For example, at this moment there are three different styles for messaging:
    • device-analytics detail message when wrong granularity is requested: Invalid granularity
    • geo-analytics detail message when wrong activity-level is requested: Activity-level should be equal to one of the allowed values: [5..99-edits, 100..-edits]
    • media-analytics (and page,edit and editor) detail message when wrong granularity is requested: granularity should be equal to one of the allowed values: [daily, monthly]
  • Automated way to test that /metrics endpoint is generating metrics properly: [QA improvement] /metrics endpoint can be present but not really working. The presence of this endpoint and real metrics generation are separate steps. Just adding the endpoint a default summary is created when you access to it, but not live metrics are shown there. We need to be sure of both things: the endpoint is present and the endpoint is generating metrics for every request

This issue is covered in T355514: Automated way to test that /metrics endpoint is generating metrics properly

  • Automated way to test that /api-spec.json endpoint is present and working: [QA improvement] For some services, integration and unit test cases for this endpoint are also missing (geo, page, edit and editor)

This issue is covered in T355511: Add integration test case for the /api-spec.json endpoint in edit-analytics

This issue is covered in T355510: How to manage/document the lack of data in mediarequest_top_files (media-analytics)

run_tests:
    image: python:3.9-slim-bullseye
    before_script:
        - apt-get update && apt-get install make
    script: 
        - make test

This issue is covered in T355508: Review (and fix/remove?) the pipeline in the AQS 2.0 QA test suite repository

{
    "type": "https://mediawiki.org/wiki/HyperSwitch/errors/bad_request",
    "title": "Invalid parameters",
    "method": "get",
    "detail": "data.params['page-type'] should be equal to one of the allowed values: [all-page-types, content, non-content]",
    "uri": "/wikimedia.org/v1/metrics/editors/aggregate/ab.wikipedia/user/non-constent/1..4-edits/daily/2024040112/2024050112"
}
{
    "type": "https://mediawiki.org/wiki/HyperSwitch/errors/invalid_request",
    "method": "get",
    "detail": [
        "end timestamp is invalid, must be a valid date in YYYYMMDD format"
    ],
    "uri": "/analytics.wikimedia.org/v1/mediarequests/aggregate/all-referers/all-media-types/all-agents/daily/20200101/2023053000a"
}
  • No data found error: [Breaking change] [Done] e.g.: empty 200 OK vs 404 between edit-analytics & device-analytics. Druid-based services consider an empty results array as a valid response for non-existent projects. Cassandra ones respond with a 404 Error. Both cases could be consider as ok but we should decide the same for all the services. In the end we decided to change how Druid-based services behaved and now all services are throwing a 404 Not Found error when a non-existent project is requested.
{
    "items": [
        {
            "project": "ab.wikipedia",
            "editor-type": "user",
            "page-type": "content",
            "granularity": "daily",
            "results": []
        }
    ]
}
{
    "type": "https://mediawiki.org/wiki/HyperSwitch/errors/not_found",
    "title": "Not found.",
    "method": "get",
    "detail": "The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet.  Please check https://wikimedia.org/api/rest_v1/?doc for more information.",
    "uri": "/analytics.wikimedia.org/v1/unique-devices/en.wikipedia.org/all-sites/daily/20240101/20240101"
}
  • Removed artificial zero-values in AQS 2.0 Druid-based services: AQS 1.0 Druid services, when any data is found for the current request, put an artificial zero-value when there is no data for a specific date. After discussing that, we decided not to include those zero-values in AQS 2.0 druid services responses and leave empty the results for this specific date (TODO: add a sample here)
  • Documentation: [Non breaking change] kebab-case vs lower_snake_case when specifying path parameters in some services (see AQS 1.0 OpenAPI Spec). This inconsistency only affects to the OpenAPI specification document and it's being fixed while writing the docs for new AQS 2.0 services.
  • Documentation: [Non breaking change] [Done] How to use all-projects and all-[family]-projects keywords is not always clear. Some endpoints explain explicitly you can use it and others don't but you can really use it. For example: it can be used when requesting edits/per-page/ab.wikipedia/Абжьыуа_жәлақәа/all-editor-types/daily/20230301/20230601 but it's not explained in the documentation for that endpoint.
  • Prepare device-analytics to be dockerized and add it to the aqs-docker-test-env-qa: [QA improvement] [Done] Starting from geo-analytics, we made some changes to every service's repository to be able to dockerize them and run the QA test suite using docker (some changes were done as well in aqs-docker-test-env project to create a docker-compose project with cassandra and all its services). Device analytics was deployed before that so it has not been included yet.
  • Project, referer and page-title validation/normalization: [Missing validation/normalization feature] [Done] During first part of the development we missed some validation/normalization process that was done by AQS 1.0 using some common functions. This was done for page, edit and editor but it is pending for geo, device and media. We created new aqsassist functions to do these tasks and they are called via a middleware function in each service project. It could be useful to take a look at page, edit or editor services to see how was addressed for these services, and also at T346598: [Pageviews_top_by_country] requests with projects with special characters returns 404 instead of 400
    • aqsassist.ValidateProject: Validate and normalize a project
    • aqassist.ValidateReferer: Validate and normalize a referer
    • aqsassist.NormalizePageTitle: Normalize a page-title replacing all occurrences of spaces with underscores
  • Remove indentation from response / Minify response: [Refactoring task] [Done] All 2.0 services respond with an indented response body but traffic team warned us about that because responses were bigger than before (66% bigger in some cases). That's why we decided to remove indentation from responses for all services. Because some of them were already deployed, this work is still pending for device and geo. See T346202: AQS 2.0: Remove indentation from the response body for all services for details.
  • Include versioning for aqsassist package: [Refactoring task] [Done] We should start tagging the aqsassist repository to include versioning to this package.
{
    "detail": "all-...` project values are not accepted for this metric",
    "method": "get",
    "status": 400,
    "title": "Bad Request",
    "type": "about:blank",
    "uri": "/metrics/editors/aggregate/all-projects/name-bot/all-page-types/all-activity-levels/monthly/20210302/20220901"
}
{
    "detail": "The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet. Please check documentation for more information.",
    "method": "get",
    "status": 404,
    "title": "Not Found",
    "type": "about:blank",
    "uri": "/metrics/pageviews/top/all-projects/all-access/2019/01/01"
}
Documentation/References

Related Objects

Event Timeline

Sfaci updated the task description. (Show Details)
Sfaci renamed this task from Inconsistences between services to Inconsistencies between services.Jul 19 2023, 7:40 AM

unexpected behaviour in AQS 1.0 media-analytics service. If I try to request "all-days" data for March, April and May 2019, I get a 404 Not Found error. For example:
https://wikimedia.org/api/rest_v1/metrics/mediarequests/top/all-referers/all-media-types/2019/05/all-days
responds with:
{

"type": "https://mediawiki.org/wiki/HyperSwitch/errors/not_found",
"title": "Not found.",
"method": "get",
"detail": "The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet.  Please check https://wikimedia.org/api/rest_v1/?doc for more information.",
"uri": "/analytics.wikimedia.org/v1/mediarequests/top/all-referers/all-media-types/2019/05/all-days"

}

most of the endpoints that accepts year/month/day support this filter ("all-days" value). I think there is one in page-analytics that doesn't support it and some others that only accept year/month so, there is no day parameter
You can take a look at the documentation here -> https://wikimedia.org/api/rest_v1/#/Mediarequests%20data/get_metrics_mediarequests_top__referer___media_type___year___month___day_

Full slack thread: https://wikimedia.slack.com/archives/C05FLGR3MCJ/p1691417371576499
There pre existing phab tickets out there about these known gaps in the data.

VirginiaPoundstone renamed this task from Inconsistencies between services to compile list of known issues for triage post AQS 2.0 launch.Aug 9 2023, 12:05 PM

About the previous comment about the unexpected behaviour in AQS 1.0 media-analytics service, I just wanted to add that, in the end, we figured out that the problem describe there is caused by a lack of data for the following dates: 2019.3, 2019.4 and 2019.5 when using the "all-days" value as "day" in top_files dataset. The issue is about the data, not about the code

Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)
Sfaci updated the task description. (Show Details)