Summary: the existing production AQS 1.0 code does various timestamp string manipulations during validation. In AQS 2.0, we mirrored part of this, but not all of it, resulting in AQS 2.0 using an incorrectly formatted timestamp for Device Analytics queries. This can, if we choose, be as simple as a one-line fix (see the end of this very long task description).
Details:
The Cassandra tables queried by AQS store timestamps as strings. In some cases, these strings are stored as YYYYMMDD and in other cases they are stored as YYYYMMDDHH. Relevant to this task, the table queried by Device Analytics uses YYYYMMDD. (There are other cases as well, but they're not relevant to this task.)
The AQS 1.0 code uses common functions for timestamp validation and manipulation. These functions accept various options, which control their behavior. In particular, for the case of unique devices, the AQS 1.0 code makes this call:
aqsUtil.validateStartAndEnd(rp, { // YYYYMMDD dates are allowed, but need an hour to pass validation fakeHour: true, // YYYYMMDDHH dates are also allowed, but the hour should be stripped // to match how cassandra stores records stripHour: true, });
This invokes this function, which in turn invokes this function. The two relevant bits of code (which are split between the two functions) are:
const hour = opts.fakeHour ? '00' : timestamp.substring(8, 10);
and
if (opts.zeroHour || opts.stripHour) { rp.start = rp.start.substring(0, 8); rp.end = rp.end.substring(0, 8); }
Observe that the code first adds a "fake hour" of "00" to the timestamp, and then later strips it back off. This results in a YYYYMMDD timestamp.
When we created AQS 2.0, we started with tables in the Page Analytics part of the dataset, where the hour is stored in Cassandra in YYYMMDDHH format. We therefore added this code in the aqsassist ValidateTimestamp function:
if len(param) == 8 { timestamp = fmt.Sprintf("%s00", param) } else { timestamp = param }
This code adds the fake "00" hour if necessary, to make the timestamp into a YYYYMMDDHH format. But we did not add corresponding code to strip that fake hour back off after validation, likely because Page Analytics didn't need that, and we didn't realize that other datasets were formatted differently.
Fix:
There are several ways we could fix this, including:
- adding more options to aqsassist.ValidateTimestamp, similar to the existing AQS 1.0 code, to give callers fine-grained control over timestamp validation and manipulation. This is powerful, but could be confusing, and also involves the most code changes.
- requiring callers to pass timestamps in the length they want them returned. aqsassist.ValidateTimestamp would check the length of the input and return an output of equal length (or an error, if the input was malformed). This is simpler, but a little subtle and it might not be obvious to callers what was happening. I'm also not sure it would handle all cases we'll encounter in the rest of the services.
- always return a YYYYMMDDHH format, and make the caller strip the HH part if that's not appropriate for a given situation
The last one is my favorite. It involves adding literally one line to unique-devices_handler.go. It keeps the aqsassist.ValidateTimestamp contract simple and straightforward: the same thing is always returned, and if particular callers want to manipulate that, they can.
However, I'm open to discussion, happy to hear thoughts from @SGupta-WMF or others, and will be glad for this to be fixed, regardless of how it is done.