Page MenuHomePhabricator

Hive Runtime Error - Query on event.MobileWikiAppDailyStats failing with errors
Closed, ResolvedPublic

Description

I run this query monthly, have never encountered this error, have tried from Jupyter on stat1007 and on Hue, failure is the same. Also tried increasing SET mapreduce.map.memory.mb=4096; to SET mapreduce.map.memory.mb=8192;
Logs on Hue:
job_1614951600082_44176
job_1614951600082_44093

Query:

SET mapreduce.map.memory.mb=4096;
SELECT
      app_install_id,
      is_anon,
      languages,
      days_installed
    FROM (
      SELECT
        event.app_install_id,
        IF(event.is_anon, 'TRUE', 'FALSE') AS is_anon,
        event.languages,
        event.appInstallAgeDays AS days_installed,
        RANK() OVER (PARTITION BY event.app_install_id ORDER BY event.appInstallAgeDays DESC) AS stat_rank
      FROM event.MobileWikiAppDailyStats
      WHERE year = 2021 AND month = 02 AND DAY >= 1
        AND revision = 18115101
        AND useragent.os_family = 'Android'
        AND INSTR(useragent.wmf_app_version, '-r-') > 0
    ) AS ranked_stats
    WHERE stat_rank = 1 -- latest stats from each device

Error from stderr:

            Log Type: stderr
          
            Log Upload Time: Sat Mar 13 00:11:26 +0000 2021
          
            Log Length: 1728
          Mar 13, 2021 12:10:15 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Mar 13, 2021 12:10:15 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Mar 13, 2021 12:10:15 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Mar 13, 2021 12:10:15 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Mar 13, 2021 12:10:15 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Mar 13, 2021 12:10:16 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Mar 13, 2021 12:10:16 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

From syslog:

2021-03-12 23:41:13,269 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1614951600082_44072_r_000001_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"00000de9-3b67-4266-933f-90c36d7f304c","reducesinkkey1":543},"value":{"_col1":{"appinstallagedays":543,"appinstallid":null,"ts":null,"is_anon":true,"app_install_id":"[redacted]","client_dt":"2021-02-27T20:43:40-0500","languages":"[\"en\"]"}}}
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:243)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"00000de9-3b67-4266-933f-90c36d7f304c","reducesinkkey1":543},"value":{"_col1":{"appinstallagedays":543,"appinstallid":null,"ts":null,"is_anon":true,"app_install_id":"[redacted]","client_dt":"2021-02-27T20:43:40-0500","languages":"[\"en\"]"}}}
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:231)
	... 7 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to org.apache.hadoop.io.LongWritable
	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.getPrimitiveJavaObject(WritableLongObjectInspector.java:46)
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:412)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.copyToStandardObject(GenericUDAFRank.java:219)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$GenericUDAFAbstractRankEvaluator.iterate(GenericUDAFRank.java:154)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:192)
	at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.processRow(WindowingTableFunction.java:407)
	at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
	at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222)
	... 7 more

Event Timeline

Also note I am getting the same stderr for a different query on table wmf.mediawiki_history:

SELECT
  COUNT(1) AS n_cumulative_total,
  SUM(IF(event_timestamp >= '${yyyy_mm}-01', 1, 0)) AS n_new_this_month
FROM wmf.mediawiki_history
WHERE snapshot = '2021-02'
  AND wiki_db = 'commonswiki'
  AND event_entity = 'revision'
  AND ARRAY_CONTAINS(revision_tags, 'android app edit')
  AND INSTR(event_comment, 'add-depicts') > 0
-- Revisions that haven't been reverted as of the snapshot:
  AND NOT revision_is_identity_reverted
  AND NOT revision_is_deleted_by_page_deletion

Error on Hue: https://hue-next.wikimedia.org/hue/jobbrowser/#!id=job_1614951600082_60486

            Log Type: stderr
          
            Log Upload Time: Mon Mar 15 21:36:23 +0000 2021
          
            Log Length: 1721
          Mar 15, 2021 9:35:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Mar 15, 2021 9:35:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Mar 15, 2021 9:35:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Mar 15, 2021 9:35:02 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Mar 15, 2021 9:35:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Mar 15, 2021 9:35:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Mar 15, 2021 9:35:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Hiya, just FYI reading this task, it looks like the offending error is

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to org.apache.hadoop.io.LongWritable

(The No appenders could be found for logger one is just a warning and can be ignored.)

@JAllemandou @elukey is it possible some part of the bigtop Hive upgrade could influence this? It looks like Hive is generating a struct key as part of a reduce...but is expecting Long?

@SNowick_WMF, am curious if this query could be run in Spark instead of Hive.

Hi @SNowick_WMF - I apologize for not having looked into your problem faster .

First a simple one: the second query (in comment) about mediawiki-history ran fine for me with biggest map memory (SET mapreduce.map.memory.mb=4096;).

Then the less simple one. The query in the task description (the one using the windowing function) hits that bug: https://issues.apache.org/jira/browse/HIVE-21104. Our new version of hive has problems when running windowing function on struct fields... :(
I have two possible solutions for you:
1 - Add a step to your query, making the data being computing in the window not a struct (please note the comment in the first CTE group-by, and the limit at the end for the output :):

WITH base_data AS(
      SELECT
        event.app_install_id,
        IF(event.is_anon, 'TRUE', 'FALSE') AS is_anon,
        event.languages,
        event.appInstallAgeDays AS days_installed
      FROM event.MobileWikiAppDailyStats
      WHERE year = 2021 AND month = 02 AND DAY >= 1
        AND revision = 18115101
        AND useragent.os_family = 'Android'
        AND INSTR(useragent.wmf_app_version, '-r-') > 0
      -- This group-by is important: without it, hive optimizes the query in a single step
      -- and hits the same problem. With it, hive needs 2 steps and the query succeeds.
      group by
        event.app_install_id,
        IF(event.is_anon, 'TRUE', 'FALSE'),
        event.languages,
        event.appInstallAgeDays
),

ranked_data AS (
SELECT
      app_install_id,
      is_anon,
      languages,
      days_installed,
      RANK() OVER (PARTITION BY app_install_id ORDER BY days_installed DESC) AS stat_rank
FROM base_data
)

SELECT
      app_install_id,
      is_anon,
      languages,
      days_installed
FROM ranked_data   
WHERE stat_rank = 1 -- latest stats from each device
LIMIT 10
;

2- As suggested by @Ottomata, use spark-sql:

spark2-sql --master yarn --driver-memory 4G --executor-memory 16G --executor-cores 4    --conf spark.dynamicAllocation.maxExecutors=32

-- Your regular query runs fine without a change (I added a limit for the output)
SELECT
      app_install_id,
      is_anon,
      languages,
      days_installed
    FROM (
      SELECT
        event.app_install_id as app_install_id,
        IF(event.is_anon, 'TRUE', 'FALSE') AS is_anon,
        event.languages as languages,
        event.appInstallAgeDays AS days_installed,
        RANK() OVER (PARTITION BY event.app_install_id ORDER BY event.appInstallAgeDays DESC) AS stat_rank
      FROM event.MobileWikiAppDailyStats
      WHERE year = 2021 AND month = 02 AND DAY >= 1
        AND revision = 18115101
        AND useragent.os_family = 'Android'
        AND INSTR(useragent.wmf_app_version, '-r-') > 0
    ) AS ranked_stats
    WHERE stat_rank = 1 -- latest stats from each device
    LIMIT 10
;

Again please excuse me in not chiming in sooner.

Thank you for the thorough response, I used the solution 1 query successfully, will try the spark-sql query next time I need to run these queries.

will try the spark-sql query next time I need to run these queries.

@SNowick_WMF for what it's worth, I think you would get the same benefit using wmfdata-python's spark.run function (you may need to pass the parameter session_type = "large", which tells Spark to put its back into it 😊).