Page MenuHomePhabricator

Nested EventLogging data doesn't get copied to MySQL
Closed, ResolvedPublic

Description

See this exception: https://www.irccloud.com/pastebin/P7P6BdpM/

EL records containing nested data are being logged to disk, but are not mirrored into MySQL. @AndyRussG says that it should be as simple as quoting something during json encoding.

@ellery: Please note that you'll be pulling banner history records from the EL log files in the near term, rather than querying MySQL.

See also T112926

Event Timeline

awight raised the priority of this task from to Medium.
awight updated the task description. (Show Details)
awight added subscribers: DStrine, Aklapper, awight and 2 others.
awight set Security to None.

FYI, although not 100% announced yet, EventLogging data is going into HDFS now, so you should be able to query them using Hive there. There are two ways to do this, one of which is just creating a table with single string field and then parsing JSON using a hive function:

# Create a table with a single string field
CREATE EXTERNAL TABLE `CentralNoticeBannerHistory` ( `json_data` string)
PARTITIONED BY (year int, month int, day int, hour int)
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/wmf/data/raw/eventlogging/eventlogging_CentralNoticeBannerHistory';

# Add a partition
ALTER TABLE CentralNoticeBannerHistory ADD PARTITION (year=2015, month=9, day=17, hour=16)
LOCATION '/wmf/data/raw/eventlogging/eventlogging_CentralNoticeBannerHistory/hourly/2015/09/17/16';

# Parse the single string field as JSON and select a nested key out of it
SELECT get_json_object(json_data, '$.event.l.b') as banner_name
FROM CentralNoticeBannerHistory
WHERE year=2015;

banner_name
["WMES_General_Assembly","WMES_General_Assembly"]
["WMES_General_Assembly","WMES_General_Assembly","WMES_General_Assembly","WMES_General_Assembly"]
["WMES_General_Assembly"]
["WMES_General_Assembly","WMES_General_Assembly","WMES_General_Assembly"]

The other would be to create a hive table that somehow maps to your JSON schema. Not sure how easy that is with nested data

Let us know if this is still a problem . Thus far we have not supported nested schemas but if workarround does not suffice we can do needed changes.

Thanks!! @ellery, now that the data is going into HDFS, is this task still relevant? Do you have plans to query banner history logs via MySQL sometimes?