Page MenuHomePhabricator

Sort inconsistency in AQS timestamp behavior
Open, MediumPublic

Description

In AQS's hourly and daily endpoints, end timestamps are inclusive, whereas in the monthly endpoints they are exclusive. I think all endpoints should be consistent, either all end timestamps inclusive or all exclusive.

Details

Related Gerrit Patches:

Event Timeline

mforns created this task.Mar 13 2017, 5:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 13 2017, 5:52 AM
Nuria added a subscriber: Nuria.Mar 13 2017, 3:45 PM

To recap:

In monthly endpoint 2015010100 -2015040100 should return jan, feb, march but not april

In daily: 2015010100-2015010301 -> this should not return a full day for 03/01 but should return full days 01/01 and 01/02

Nuria triaged this task as Medium priority.Mar 13 2017, 3:46 PM

Change 342601 had a related patch set uploaded (by Joal):
[analytics/aqs] Correct per-project endpoint date check

https://gerrit.wikimedia.org/r/342601

I did a quick check:

  • Top - Different approach, no boundary issue

Change 342601 merged by Fdans:
[analytics/aqs] Correct per-project endpoint date check

https://gerrit.wikimedia.org/r/342601

Lasdt update after daily bug correction:
There is inconsistency in monthly behavior:

  • Pattern is consistent in every endpoint except monthly ones: we include last date boundary, for instance, specifying a non-full day in daily request still provides you with data for that day.
  • Monthly endpoints behave as follow:
    • project - Boundary excluded.
    • per-article - Boundary excluded.
    • uniques - Boundary included.

The inconsistency seems to have roots in the change provided when aggregating daily article (see fullMonths option in code). Instead of forcing having last day of the provided month (include boundary), we provide the last explicitly provided full month (boundary excluded),
We did the mistake of reusing that pattern when changing per-project monthly endpoint, but we should correct both to be back in boundary inclusion in my opinion.

Another comment: My check was for END boundaries --> Our pattern is to include them (due to alphabetical sorting of timestamps).
For START boundaries, we actually exclude them (for the same exact reason).

Nuria added a comment.Mar 16 2017, 4:21 PM

Pattern is consistent in every endpoint except monthly ones: we include last date boundary, for instance, specifying a non-full day in daily request still provides you with data for that day.

We talked as a team and we agreed (can revisit) that when asking for monthly data and end date of 20150910 shoul NOT return october, our agreement was that this is the behaviour that all end points should have.

I don't mind if we decide to include or exclude end boundary but I'd like us to be consistent accross granularities.
For instance, if we decide to exclude end boundary when it's incomplete, we should also do so for daily:

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Barack_Obama/daily/2017010100/2017010300

In that specific example this query should return 2 days, not three (since the last day is not finished).

Nuria added a comment.Mar 16 2017, 5:25 PM

In that specific example this query should return 2 days, not three (since the last day is not finished).

Agreed.

Nuria added a comment.Jun 15 2017, 4:44 PM

Do we need to version api for such a change? (it will be a breaking change)

Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.Jun 15 2017, 4:45 PM
Nuria moved this task from Dashiki to Wikistats Production on the Analytics board.Jul 13 2017, 4:06 PM