Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | leila | T171561 Why we read Wikipedia | |||
Resolved | leila | T131949 Repeat the big English reader survey in one or two more languages | |||
Resolved | None | T165364 Test data quality of the test surveys | |||
Resolved | • schana | T165678 clientIP needs to be collected as part of the schema or ... |
Event Timeline
From the email thread:
Here’s a little more flushed-out query for extracting relevant pieces of the event while retaining the webrequest data. The event can then be compared to the mysql event logging data to check for validity of the event.
@leila, does this satisfy the requirements for getting the client IP?
select *,get_json_object(json_event, '$.event.surveySessionToken') as survey_session_token from ( select *,reflect("java.net.URLDecoder", "decode", substr(uri_query, 2)) as json_event from webrequest where uri_path like '%beacon/event' and uri_query like '%QuickSurvey%' and year=2017 and month=05 and day=15 and hour=15 limit 1 ) q1 ;
@schana per our follow up conversation in IRC and your suggestion: this may be a more accurate way to link the two datasets (EL and webrequest logs) anyway given that we don't have to worry about approximations based on IP+UA.
@flemmerich can you look into updating the code based on this new information and confirm if this approach allows us to link EL and webrequest logs? If yes, we don't have a blocker.
@leila I'm moving this to done based on the latest emails. Let me know if anything further is required.
@schana closing this per your comment and that we know now that we can link the data without this information.