Page MenuHomePhabricator

Mediawiki History Druid indexing failed
Closed, ResolvedPublic5 Estimated Story Points

Description

I don't know how to find what went wrong though, logs don't seem to be available in either:

https://hue.wikimedia.org/oozie/list_oozie_workflow/0026548-170621131133576-oozie-oozi-W/

or:

https://yarn.wikimedia.org/cluster

UPDATE: Nuria thinks it's because of the new fields, and that makes perfect sense. Assigning to her.

Event Timeline

Actually, i take it back i think it should work even if inserting a subset of fields, maybe some fields have changed type? Rerunning to see

Reruned and it failed:


0038299-170621131133576-oozie-oozi-W@:start: OK - OK -

0038299-170621131133576-oozie-oozi-W@generate_json_mediawiki_history OK job_1498042433999_103005SUCCEEDED -

0038299-170621131133576-oozie-oozi-W@mark_json_mediawiki_history_dataset_done OK 0038313-170621131133576-oozie-oozi-WSUCCEEDED -

0038299-170621131133576-oozie-oozi-W@index_druid ERROR 0038314-170621131133576-oozie-oozi-WKILLED -

0038299-170621131133576-oozie-oozi-W@send_error_email OK 0038320-170621131133576-oozie-oozi-WSUCCEEDED -

0038299-170621131133576-oozie-oozi-W@kill OK - OK E0729

Note to self that druid jobs run as druid not hdfs and thus logs bneed to be fetched like: sudo -u druid yarn logs -applicationId application_blah

Change 366327 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery@master] [WIP] Modifying ingestion spec after additions to edit history

https://gerrit.wikimedia.org/r/366327

Nuria set the point value for this task to 5.Jul 21 2017, 3:17 PM

Exception (from logs at /var/lib/druid/indexing-logs/)

[{"wiki_db":"abwiki","event_entity":"revision","event_type":"create","event_timestamp":"2006-03-28 05:23:11.0","event_comment":nul
l,"event_user_id":0,"event_user_text":"MediaWiki default","event_user_text_latest":null,"event_user_blocks":null,"event_user_blocks_latest":null,"event_user_groups":null,"ev
ent_user_groups_latest":null,"event_user_is_created_by_self":0,"event_user_is_created_by_system":0,"event_user_is_created_by_peer":0,"event_user_is_anonymous":1,"event_user_
is_bot_by_name":0,"event_user_creation_timestamp":null,"event_user_revision_count":null,"event_user_seconds_to_previous_revision":null,"page_id":null,"page_title":null,"page
_title_latest":"Revdelete-hide-restricted","page_namespace":null,"page_namespace_is_content":0,"page_namespace_latest":8,"page_namespace_is_content_latest":0,"page_is_redire
ct_latest":0,"page_creation_timestamp":null,"page_revision_count":null,"page_seconds_to_previous_revision":null,"user_id":null,"user_text":null,"user_text_latest":null,"user
_blocks":null,"user_blocks_latest":null,"user_groups":null,"user_groups_latest":null,"user_is_created_by_self":0,"user_is_created_by_system":0,"user_is_created_by_peer":0,"u
ser_is_anonymous":0,"user_is_bot_by_name":0,"user_creation_timestamp":null,"revision_id":7571,"revision_parent_id":null,"revision_minor_edit":0,"revision_text_bytes":52,"rev
ision_text_bytes_diff":52,"revision_text_sha1":"9m1n4hpvujlklvby4a8dg5dirrkkual","revision_content_model":"","revision_content_format":"","revision_is_deleted":1,"revision_d
eleted_timestamp":"2006-03-28 05:23:11.0","revision_is_identity_reverted":0,"revision_first_identity_reverting_revision_id":null,"revision_seconds_to_identity_revert":null,"
revision_is_identity_revert":0}]
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:88)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:421)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.metamx.common.parsers.ParseException: Unparseable timestamp found!
at io.druid.data.input.impl.MapInputRowParser.parse(MapInputRowParser.java:72)
at io.druid.data.input.impl.StringInputRowParser.parseMap(StringInputRowParser.java:136)
at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131)
at io.druid.indexer.HadoopDruidIndexerMapper.parseInputRow(HadoopDruidIndexerMapper.java:98)
at io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:69)
... 8 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "2006-03-28 05:23:11.0" is malformed at "-03-28 05:23:11.0"
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:899)
at com.metamx.common.parsers.TimestampParser$5.apply(TimestampParser.java:97)
at com.metamx.common.parsers.TimestampParser$5.apply(TimestampParser.java:92)
at com.metamx.common.parsers.TimestampParser$9.apply(TimestampParser.java:159)
at com.metamx.common.parsers.TimestampParser$9.apply(TimestampParser.java:150)
at io.druid.data.input.impl.TimestampSpec.extractTimestamp(TimestampSpec.java:81)
at io.druid.data.input.impl.MapInputRowParser.parse(MapInputRowParser.java:60)

mmm... operator error: format is yyyy-MM-dd HH:mm:ss.S!

Change 366327 merged by Mforns:
[analytics/refinery@master] Modifying ingestion spec after additions to edit history

https://gerrit.wikimedia.org/r/366327