Page MenuHomePhabricator

Investigate where Kafka records will almost all null fields are coming from
Closed, ResolvedPublic

Description

Where are records such as this one coming from? They appear on request table as of now

{
"hostname":null,"sequence":null,"dt":null,"time_firstbyte":null,"ip":null,"cache_status":null,"http_status":null,"response_size":null,"http_method":null,

"uri_host":"www.wikipedia.org",
"uri_path":"http://www.wikipedia.org",

"uri_query":"","content_type":null,"referer":null,"x_forwarded_for":null,"user_agent":null,"accept_language":null,"x_analytics":null,"range":null,"is_pageview":null,"record_version":null,"client_ip":null,"geocoded_data":null,"x_cache":null,"user_agent_map":null,"x_analytics_map":null,"ts":null,"access_method":null,"agent_type":"user","is_zero":null,"referer_class":null,"normalized_host":null,"pageview_info":null,"page_id":null,

"webrequest_source":"text","year":2016,"month":5,"day":30,"hour":1
}

Event Timeline

elukey renamed this task from Investigate where records will al null fields are coming from to Investigate where Kafka records will almost all null fields are coming from .Jun 3 2016, 10:46 AM
elukey updated the task description. (Show Details)

I tried today the following beeline/hive query but didn't find anything within wmf_raw and wmf:

select * from webrequest where hostname is null and webrequest_source = "text" and year = 2016 and month = 5 and day = 30 and hour = 1 ;

@madhuvishy - Would you mind to sync with me to find some examples of these inconsistency? Thanks :)

Milimetric triaged this task as Medium priority.Jun 6 2016, 4:36 PM
Milimetric moved this task from Incoming to Dashiki on the Analytics board.
Milimetric claimed this task.
Milimetric subscribed.

Haven't been able to reproduce, it might have been a fluke