Paste P6440

example log of MLR training executor killed by yarn
ActivePublic

Authored by EBernhardson on Dec 7 2017, 5:03 AM.
1
2​Logged in as: dr.who
3​Logs for container_e54_1512469367986_4908_01_000007
4​ResourceManager
5
6​ RM Home
7
8​NodeManager
9​Tools
10
11
12​Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
13​SLF4J: Class path contains multiple SLF4J bindings.
14​SLF4J: Found binding in [jar:file:/var/lib/hadoop/data/g/yarn/local/usercache/ebernhardson/filecache/492/__spark_libs__5186771349499828915.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
15​SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16​SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
17​SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
18​SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19​17/12/07 04:23:20 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 25933@analytics1049
20​17/12/07 04:23:20 INFO SignalUtils: Registered signal handler for TERM
21​17/12/07 04:23:20 INFO SignalUtils: Registered signal handler for HUP
22​17/12/07 04:23:20 INFO SignalUtils: Registered signal handler for INT
23​17/12/07 04:23:20 INFO SecurityManager: Changing view acls to: yarn,ebernhardson
24​17/12/07 04:23:20 INFO SecurityManager: Changing modify acls to: yarn,ebernhardson
25​17/12/07 04:23:20 INFO SecurityManager: Changing view acls groups to:
26​17/12/07 04:23:20 INFO SecurityManager: Changing modify acls groups to:
27​17/12/07 04:23:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, ebernhardson); groups with view permissions: Set(); users with modify permissions: Set(yarn, ebernhardson); groups with modify permissions: Set()
28​17/12/07 04:23:21 INFO TransportClientFactory: Successfully created connection to /10.64.53.30:40931 after 103 ms (0 ms spent in bootstraps)
29​17/12/07 04:23:21 WARN SparkConf: The configuration key 'spark.yarn.jar' has been deprecated as of Spark 2.0 and may be removed in the future. Please use the new key 'spark.yarn.jars' instead.
30​17/12/07 04:23:21 INFO SecurityManager: Changing view acls to: yarn,ebernhardson
31​17/12/07 04:23:21 INFO SecurityManager: Changing modify acls to: yarn,ebernhardson
32​17/12/07 04:23:21 INFO SecurityManager: Changing view acls groups to:
33​17/12/07 04:23:21 INFO SecurityManager: Changing modify acls groups to:
34​17/12/07 04:23:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, ebernhardson); groups with view permissions: Set(); users with modify permissions: Set(yarn, ebernhardson); groups with modify permissions: Set()
35​17/12/07 04:23:21 INFO TransportClientFactory: Successfully created connection to /10.64.53.30:40931 after 1 ms (0 ms spent in bootstraps)
36​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/b/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-fc46b9ad-2511-4261-88a5-d2dd2aa58fd7
37​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/c/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-31afa154-6155-4a5c-97ce-05e7b466aacf
38​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/d/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-27f1a9e3-a92a-4f9b-a2d5-22c7ebf44fce
39​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/e/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-c6164a5a-d8e0-4691-b13a-84ff0bb75a82
40​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/f/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-7f119622-3e17-4d68-89ff-0bee0156f0a0
41​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/g/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-06648fcf-99bc-4ef3-a21c-012d74e2d514
42​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/h/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-451a4fe0-87ff-49d3-8872-277242971db9
43​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/i/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-b1851e36-6ebe-4691-849c-d1be6ba5823c
44​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/j/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-20bd9f76-b393-4bf7-bbef-8e3ac57c0e93
45​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/k/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-a3f58022-adc9-49f0-b07e-3ee72e944344
46​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/l/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-730c13db-1af3-465a-9ff2-7f5b250d9cf3
47​17/12/07 04:23:21 INFO DiskBlockManager: Created local directory at /var/lib/hadoop/data/m/yarn/local/usercache/ebernhardson/appcache/application_1512469367986_4908/blockmgr-76e61f85-b535-4ba2-ab47-7d4b64fa48c4
48​17/12/07 04:23:21 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
49​17/12/07 04:23:21 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.64.53.30:40931
50​17/12/07 04:23:21 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
51​17/12/07 04:23:22 INFO Executor: Starting executor ID 6 on host analytics1049.eqiad.wmnet
52​17/12/07 04:23:22 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46181.
53​17/12/07 04:23:22 INFO NettyBlockTransferService: Server created on analytics1049.eqiad.wmnet:46181
54​17/12/07 04:23:22 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
55​17/12/07 04:23:22 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(6, analytics1049.eqiad.wmnet, 46181, None)
56​17/12/07 04:23:22 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(6, analytics1049.eqiad.wmnet, 46181, None)
57​17/12/07 04:23:22 INFO BlockManager: external shuffle service port = 7337
58​17/12/07 04:23:22 INFO BlockManager: Registering executor with local external shuffle service.
59​17/12/07 04:23:22 INFO TransportClientFactory: Successfully created connection to analytics1049.eqiad.wmnet/10.64.21.108:7337 after 1 ms (0 ms spent in bootstraps)
60​17/12/07 04:23:22 INFO BlockManager: Initialized BlockManager: BlockManagerId(6, analytics1049.eqiad.wmnet, 46181, None)
61​17/12/07 04:23:30 INFO CoarseGrainedExecutorBackend: Got assigned task 161
62​17/12/07 04:23:30 INFO Executor: Running task 23.0 in stage 5.0 (TID 161)
63​17/12/07 04:23:30 INFO MapOutputTrackerWorker: Updating epoch to 2 and clearing cache
64​17/12/07 04:23:30 INFO TorrentBroadcast: Started reading broadcast variable 8
65​17/12/07 04:23:30 INFO TransportClientFactory: Successfully created connection to analytics1052.eqiad.wmnet/10.64.5.15:37429 after 2 ms (0 ms spent in bootstraps)
66​17/12/07 04:23:30 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 11.8 KB, free 2004.6 MB)
67​17/12/07 04:23:31 INFO TorrentBroadcast: Reading broadcast variable 8 took 158 ms
68​17/12/07 04:23:31 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 26.9 KB, free 2004.6 MB)
69​17/12/07 04:23:31 INFO CodeGenerator: Code generated in 448.432572 ms
70​17/12/07 04:23:31 INFO CodeGenerator: Code generated in 12.224754 ms
71​17/12/07 04:23:31 INFO CodeGenerator: Code generated in 12.898548 ms
72​17/12/07 04:23:31 INFO CodeGenerator: Code generated in 13.493018 ms
73​17/12/07 04:23:31 INFO CodeGenerator: Code generated in 16.834078 ms
74​17/12/07 04:23:31 INFO CodeGenerator: Code generated in 12.784617 ms
75​17/12/07 04:23:32 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00023-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-23163525, partition values: [empty row]
76​17/12/07 04:23:32 INFO TorrentBroadcast: Started reading broadcast variable 7
77​17/12/07 04:23:32 INFO TransportClientFactory: Successfully created connection to analytics1064.eqiad.wmnet/10.64.36.104:33993 after 2 ms (0 ms spent in bootstraps)
78​17/12/07 04:23:32 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 28.3 KB, free 1988.3 MB)
79​17/12/07 04:23:32 INFO TorrentBroadcast: Reading broadcast variable 7 took 277 ms
80​17/12/07 04:23:32 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 381.7 KB, free 1987.9 MB)
81​SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
82​SLF4J: Defaulting to no-operation (NOP) logger implementation
83​SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
84​17/12/07 04:23:33 INFO CodecPool: Got brand-new decompressor [.snappy]
85​17/12/07 04:23:34 INFO Executor: Finished task 23.0 in stage 5.0 (TID 161). 3586 bytes result sent to driver
86​17/12/07 04:23:35 INFO CoarseGrainedExecutorBackend: Got assigned task 565
87​17/12/07 04:23:35 INFO Executor: Running task 28.0 in stage 6.0 (TID 565)
88​17/12/07 04:23:35 INFO MapOutputTrackerWorker: Updating epoch to 3 and clearing cache
89​17/12/07 04:23:35 INFO TorrentBroadcast: Started reading broadcast variable 9
90​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1043.eqiad.wmnet/10.64.53.23:37507 after 2 ms (0 ms spent in bootstraps)
91​17/12/07 04:23:35 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 10.9 KB, free 2004.2 MB)
92​17/12/07 04:23:35 INFO TorrentBroadcast: Reading broadcast variable 9 took 37 ms
93​17/12/07 04:23:35 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 23.0 KB, free 2004.1 MB)
94​17/12/07 04:23:35 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 2, fetching them
95​17/12/07 04:23:35 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
96​17/12/07 04:23:35 INFO MapOutputTrackerWorker: Got the output locations
97​17/12/07 04:23:35 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
98​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1032.eqiad.wmnet/10.64.36.132:7337 after 2 ms (0 ms spent in bootstraps)
99​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1031.eqiad.wmnet/10.64.36.131:7337 after 4 ms (0 ms spent in bootstraps)
100​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1030.eqiad.wmnet/10.64.36.130:7337 after 3 ms (0 ms spent in bootstraps)
101​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1064.eqiad.wmnet/10.64.36.104:7337 after 19 ms (0 ms spent in bootstraps)
102​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1029.eqiad.wmnet/10.64.36.129:7337 after 14 ms (0 ms spent in bootstraps)
103​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1046.eqiad.wmnet/10.64.21.105:7337 after 13 ms (0 ms spent in bootstraps)
104​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1055.eqiad.wmnet/10.64.5.18:7337 after 1 ms (0 ms spent in bootstraps)
105​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1050.eqiad.wmnet/10.64.21.111:7337 after 1 ms (0 ms spent in bootstraps)
106​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1040.eqiad.wmnet/10.64.53.19:7337 after 1 ms (0 ms spent in bootstraps)
107​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1051.eqiad.wmnet/10.64.21.112:7337 after 2 ms (0 ms spent in bootstraps)
108​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1039.eqiad.wmnet/10.64.53.18:7337 after 1 ms (0 ms spent in bootstraps)
109​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1066.eqiad.wmnet/10.64.36.106:7337 after 2 ms (0 ms spent in bootstraps)
110​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1033.eqiad.wmnet/10.64.36.133:7337 after 1 ms (0 ms spent in bootstraps)
111​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1062.eqiad.wmnet/10.64.21.114:7337 after 1 ms (0 ms spent in bootstraps)
112​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1038.eqiad.wmnet/10.64.53.17:7337 after 1 ms (0 ms spent in bootstraps)
113​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1054.eqiad.wmnet/10.64.5.17:7337 after 1 ms (0 ms spent in bootstraps)
114​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1059.eqiad.wmnet/10.64.5.22:7337 after 1 ms (0 ms spent in bootstraps)
115​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1047.eqiad.wmnet/10.64.21.106:7337 after 0 ms (0 ms spent in bootstraps)
116​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1044.eqiad.wmnet/10.64.53.24:7337 after 1 ms (0 ms spent in bootstraps)
117​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1060.eqiad.wmnet/10.64.5.23:7337 after 1 ms (0 ms spent in bootstraps)
118​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1048.eqiad.wmnet/10.64.21.107:7337 after 1 ms (0 ms spent in bootstraps)
119​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1036.eqiad.wmnet/10.64.53.15:7337 after 1 ms (0 ms spent in bootstraps)
120​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1028.eqiad.wmnet/10.64.36.128:7337 after 1 ms (0 ms spent in bootstraps)
121​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1049.eqiad.wmnet/10.64.21.108:7337 after 0 ms (0 ms spent in bootstraps)
122​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1052.eqiad.wmnet/10.64.5.15:7337 after 1 ms (0 ms spent in bootstraps)
123​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1067.eqiad.wmnet/10.64.53.27:7337 after 1 ms (0 ms spent in bootstraps)
124​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1061.eqiad.wmnet/10.64.21.113:7337 after 1 ms (0 ms spent in bootstraps)
125​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1037.eqiad.wmnet/10.64.53.16:7337 after 1 ms (0 ms spent in bootstraps)
126​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1068.eqiad.wmnet/10.64.53.28:7337 after 1 ms (0 ms spent in bootstraps)
127​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1065.eqiad.wmnet/10.64.36.105:7337 after 1 ms (0 ms spent in bootstraps)
128​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1045.eqiad.wmnet/10.64.53.25:7337 after 1 ms (0 ms spent in bootstraps)
129​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1043.eqiad.wmnet/10.64.53.23:7337 after 1 ms (0 ms spent in bootstraps)
130​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1063.eqiad.wmnet/10.64.21.115:7337 after 1 ms (0 ms spent in bootstraps)
131​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1057.eqiad.wmnet/10.64.5.20:7337 after 2 ms (0 ms spent in bootstraps)
132​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1056.eqiad.wmnet/10.64.5.19:7337 after 1 ms (0 ms spent in bootstraps)
133​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1042.eqiad.wmnet/10.64.53.22:7337 after 1 ms (0 ms spent in bootstraps)
134​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1053.eqiad.wmnet/10.64.5.16:7337 after 4 ms (0 ms spent in bootstraps)
135​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1034.eqiad.wmnet/10.64.36.134:7337 after 1 ms (0 ms spent in bootstraps)
136​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1058.eqiad.wmnet/10.64.5.21:7337 after 1 ms (0 ms spent in bootstraps)
137​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1041.eqiad.wmnet/10.64.53.20:7337 after 2 ms (0 ms spent in bootstraps)
138​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1035.eqiad.wmnet/10.64.53.14:7337 after 0 ms (0 ms spent in bootstraps)
139​17/12/07 04:23:35 INFO TransportClientFactory: Successfully created connection to analytics1069.eqiad.wmnet/10.64.53.29:7337 after 0 ms (0 ms spent in bootstraps)
140​17/12/07 04:23:35 INFO ShuffleBlockFetcherIterator: Started 58 remote fetches in 230 ms
141​17/12/07 04:23:35 INFO CodeGenerator: Code generated in 48.723522 ms
142​17/12/07 04:23:36 INFO Executor: Finished task 28.0 in stage 6.0 (TID 565). 17275 bytes result sent to driver
143​17/12/07 04:23:36 INFO CoarseGrainedExecutorBackend: Got assigned task 632
144​17/12/07 04:23:36 INFO Executor: Running task 95.0 in stage 6.0 (TID 632)
145​17/12/07 04:23:36 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
146​17/12/07 04:23:36 INFO ShuffleBlockFetcherIterator: Started 58 remote fetches in 51 ms
147​17/12/07 04:23:36 INFO Executor: Finished task 95.0 in stage 6.0 (TID 632). 17313 bytes result sent to driver
148​17/12/07 04:23:36 INFO CoarseGrainedExecutorBackend: Got assigned task 710
149​17/12/07 04:23:36 INFO Executor: Running task 173.0 in stage 6.0 (TID 710)
150​17/12/07 04:23:36 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
151​17/12/07 04:23:36 INFO ShuffleBlockFetcherIterator: Started 58 remote fetches in 55 ms
152​17/12/07 04:23:36 INFO Executor: Finished task 173.0 in stage 6.0 (TID 710). 17114 bytes result sent to driver
153​17/12/07 04:23:45 INFO CoarseGrainedExecutorBackend: Got assigned task 799
154​17/12/07 04:23:45 INFO Executor: Running task 61.0 in stage 8.0 (TID 799)
155​17/12/07 04:23:45 INFO TorrentBroadcast: Started reading broadcast variable 14
156​17/12/07 04:23:45 INFO TransportClientFactory: Successfully created connection to analytics1059.eqiad.wmnet/10.64.5.22:45183 after 1 ms (0 ms spent in bootstraps)
157​17/12/07 04:23:46 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 9.4 KB, free 2004.2 MB)
158​17/12/07 04:23:46 INFO TorrentBroadcast: Reading broadcast variable 14 took 59 ms
159​17/12/07 04:23:46 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 18.1 KB, free 2004.2 MB)
160​17/12/07 04:23:46 INFO CodeGenerator: Code generated in 15.857671 ms
161​17/12/07 04:23:46 INFO CodeGenerator: Code generated in 22.301075 ms
162​17/12/07 04:23:46 INFO PythonRunner: Times: total = 710, boot = 682, init = 24, finish = 4
163​17/12/07 04:23:47 INFO Executor: Finished task 61.0 in stage 8.0 (TID 799). 3141 bytes result sent to driver
164​17/12/07 04:23:47 INFO CoarseGrainedExecutorBackend: Got assigned task 856
165​17/12/07 04:23:47 INFO Executor: Running task 118.0 in stage 8.0 (TID 856)
166​17/12/07 04:23:47 INFO PythonRunner: Times: total = 50, boot = -214, init = 260, finish = 4
167​17/12/07 04:23:47 INFO Executor: Finished task 118.0 in stage 8.0 (TID 856). 2553 bytes result sent to driver
168​17/12/07 04:23:47 INFO CoarseGrainedExecutorBackend: Got assigned task 938
169​17/12/07 04:23:47 INFO Executor: Running task 200.0 in stage 8.0 (TID 938)
170​17/12/07 04:23:47 INFO PythonRunner: Times: total = 49, boot = -168, init = 213, finish = 4
171​17/12/07 04:23:47 INFO Executor: Finished task 200.0 in stage 8.0 (TID 938). 2553 bytes result sent to driver
172​17/12/07 04:23:47 INFO CoarseGrainedExecutorBackend: Got assigned task 1007
173​17/12/07 04:23:47 INFO Executor: Running task 269.0 in stage 8.0 (TID 1007)
174​17/12/07 04:23:47 INFO PythonRunner: Times: total = 47, boot = -106, init = 149, finish = 4
175​17/12/07 04:23:47 INFO Executor: Finished task 269.0 in stage 8.0 (TID 1007). 2553 bytes result sent to driver
176​17/12/07 04:23:47 INFO CoarseGrainedExecutorBackend: Got assigned task 1082
177​17/12/07 04:23:47 INFO Executor: Running task 344.0 in stage 8.0 (TID 1082)
178​17/12/07 04:23:47 INFO PythonRunner: Times: total = 50, boot = -105, init = 151, finish = 4
179​17/12/07 04:23:47 INFO Executor: Finished task 344.0 in stage 8.0 (TID 1082). 2553 bytes result sent to driver
180​17/12/07 04:23:47 INFO CoarseGrainedExecutorBackend: Got assigned task 1141
181​17/12/07 04:23:47 INFO Executor: Running task 23.0 in stage 9.0 (TID 1141)
182​17/12/07 04:23:47 INFO TorrentBroadcast: Started reading broadcast variable 15
183​17/12/07 04:23:47 INFO TransportClientFactory: Successfully created connection to analytics1035.eqiad.wmnet/10.64.53.14:38229 after 1 ms (0 ms spent in bootstraps)
184​17/12/07 04:23:47 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 7.6 KB, free 2004.2 MB)
185​17/12/07 04:23:47 INFO TorrentBroadcast: Reading broadcast variable 15 took 81 ms
186​17/12/07 04:23:47 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 18.1 KB, free 2004.1 MB)
187​17/12/07 04:23:47 INFO CodeGenerator: Code generated in 53.440243 ms
188​17/12/07 04:23:47 INFO CodeGenerator: Code generated in 22.074297 ms
189​17/12/07 04:23:47 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00023-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-20222125, partition values: [empty row]
190​17/12/07 04:23:47 INFO TorrentBroadcast: Started reading broadcast variable 13
191​17/12/07 04:23:47 INFO TransportClientFactory: Successfully created connection to analytics1066.eqiad.wmnet/10.64.36.106:41389 after 2 ms (0 ms spent in bootstraps)
192​17/12/07 04:23:47 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 29.3 KB, free 2004.1 MB)
193​17/12/07 04:23:47 INFO TorrentBroadcast: Reading broadcast variable 13 took 54 ms
194​17/12/07 04:23:48 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 381.7 KB, free 2003.7 MB)
195​17/12/07 04:23:48 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
196​17/12/07 04:23:48 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
197​17/12/07 04:23:48 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
198​17/12/07 04:23:48 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
199
200​Parquet form:
201​message spark_schema {
202​ optional binary wikiid (UTF8);
203​ optional binary query (UTF8);
204​ required int64 norm_query_id;
205​ optional int32 label;
206​ optional group features {
207​ required int32 type (INT_8);
208​ optional int32 size;
209​ optional group indices (LIST) {
210​ repeated group list {
211​ required int32 element;
212​ }
213​ }
214​ optional group values (LIST) {
215​ repeated group list {
216​ required double element;
217​ }
218​ }
219​ }
220​}
221
222​Catalyst form:
223​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
224
225​17/12/07 04:23:48 INFO CodeGenerator: Code generated in 33.523874 ms
226​17/12/07 04:23:48 INFO CodeGenerator: Code generated in 22.216389 ms
227​17/12/07 04:23:48 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 451923 records.
228​17/12/07 04:23:48 INFO InternalParquetRecordReader: at row 0. reading next block
229​17/12/07 04:23:48 INFO InternalParquetRecordReader: block read in memory in 96 ms. row count = 451923
230​17/12/07 04:23:50 INFO Executor: Finished task 23.0 in stage 9.0 (TID 1141). 3061 bytes result sent to driver
231​17/12/07 04:23:50 INFO CoarseGrainedExecutorBackend: Got assigned task 1212
232​17/12/07 04:23:50 INFO Executor: Running task 143.0 in stage 9.0 (TID 1212)
233​17/12/07 04:23:50 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00143-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-20222125, partition values: [empty row]
234​17/12/07 04:23:50 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
235​17/12/07 04:23:50 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
236​17/12/07 04:23:50 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
237​17/12/07 04:23:50 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
238
239​Parquet form:
240​message spark_schema {
241​ optional binary wikiid (UTF8);
242​ optional binary query (UTF8);
243​ required int64 norm_query_id;
244​ optional int32 label;
245​ optional group features {
246​ required int32 type (INT_8);
247​ optional int32 size;
248​ optional group indices (LIST) {
249​ repeated group list {
250​ required int32 element;
251​ }
252​ }
253​ optional group values (LIST) {
254​ repeated group list {
255​ required double element;
256​ }
257​ }
258​ }
259​}
260
261​Catalyst form:
262​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
263
264​17/12/07 04:23:50 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452514 records.
265​17/12/07 04:23:50 INFO InternalParquetRecordReader: at row 0. reading next block
266​17/12/07 04:23:50 INFO InternalParquetRecordReader: block read in memory in 260 ms. row count = 452514
267​17/12/07 04:23:52 INFO Executor: Finished task 143.0 in stage 9.0 (TID 1212). 2156 bytes result sent to driver
268​17/12/07 04:23:52 INFO CoarseGrainedExecutorBackend: Got assigned task 1278
269​17/12/07 04:23:52 INFO Executor: Running task 174.0 in stage 9.0 (TID 1278)
270​17/12/07 04:23:52 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00174-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-20222125, partition values: [empty row]
271​17/12/07 04:23:52 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
272​17/12/07 04:23:52 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
273​17/12/07 04:23:52 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
274​17/12/07 04:23:52 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
275
276​Parquet form:
277​message spark_schema {
278​ optional binary wikiid (UTF8);
279​ optional binary query (UTF8);
280​ required int64 norm_query_id;
281​ optional int32 label;
282​ optional group features {
283​ required int32 type (INT_8);
284​ optional int32 size;
285​ optional group indices (LIST) {
286​ repeated group list {
287​ required int32 element;
288​ }
289​ }
290​ optional group values (LIST) {
291​ repeated group list {
292​ required double element;
293​ }
294​ }
295​ }
296​}
297
298​Catalyst form:
299​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
300
301​17/12/07 04:23:52 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452439 records.
302​17/12/07 04:23:52 INFO InternalParquetRecordReader: at row 0. reading next block
303​17/12/07 04:23:52 INFO InternalParquetRecordReader: block read in memory in 252 ms. row count = 452439
304​17/12/07 04:23:53 INFO Executor: Finished task 174.0 in stage 9.0 (TID 1278). 2083 bytes result sent to driver
305​17/12/07 04:23:53 INFO CoarseGrainedExecutorBackend: Got assigned task 1462
306​17/12/07 04:23:53 INFO Executor: Running task 232.0 in stage 9.0 (TID 1462)
307​17/12/07 04:23:53 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00093-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-34135930, partition values: [empty row]
308​17/12/07 04:23:53 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
309​17/12/07 04:23:53 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
310​17/12/07 04:23:53 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
311​17/12/07 04:23:53 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
312
313​Parquet form:
314​message spark_schema {
315​ optional binary wikiid (UTF8);
316​ optional binary query (UTF8);
317​ required int64 norm_query_id;
318​ optional int32 label;
319​ optional group features {
320​ required int32 type (INT_8);
321​ optional int32 size;
322​ optional group indices (LIST) {
323​ repeated group list {
324​ required int32 element;
325​ }
326​ }
327​ optional group values (LIST) {
328​ repeated group list {
329​ required double element;
330​ }
331​ }
332​ }
333​}
334
335​Catalyst form:
336​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
337
338​17/12/07 04:23:53 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
339​17/12/07 04:23:53 INFO Executor: Finished task 232.0 in stage 9.0 (TID 1462). 1733 bytes result sent to driver
340​17/12/07 04:23:53 INFO CoarseGrainedExecutorBackend: Got assigned task 1476
341​17/12/07 04:23:53 INFO Executor: Running task 310.0 in stage 9.0 (TID 1476)
342​17/12/07 04:23:53 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-33991962, partition values: [empty row]
343​17/12/07 04:23:53 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
344​17/12/07 04:23:53 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
345​17/12/07 04:23:53 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
346​17/12/07 04:23:53 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
347
348​Parquet form:
349​message spark_schema {
350​ optional binary wikiid (UTF8);
351​ optional binary query (UTF8);
352​ required int64 norm_query_id;
353​ optional int32 label;
354​ optional group features {
355​ required int32 type (INT_8);
356​ optional int32 size;
357​ optional group indices (LIST) {
358​ repeated group list {
359​ required int32 element;
360​ }
361​ }
362​ optional group values (LIST) {
363​ repeated group list {
364​ required double element;
365​ }
366​ }
367​ }
368​}
369
370​Catalyst form:
371​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
372
373​17/12/07 04:23:53 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
374​17/12/07 04:23:53 INFO Executor: Finished task 310.0 in stage 9.0 (TID 1476). 1733 bytes result sent to driver
375​17/12/07 04:23:53 INFO CoarseGrainedExecutorBackend: Got assigned task 1487
376​17/12/07 04:23:53 INFO Executor: Running task 43.0 in stage 12.0 (TID 1487)
377​17/12/07 04:23:53 INFO TorrentBroadcast: Started reading broadcast variable 16
378​17/12/07 04:23:53 INFO TransportClientFactory: Successfully created connection to analytics1046.eqiad.wmnet/10.64.21.105:46169 after 1 ms (0 ms spent in bootstraps)
379​17/12/07 04:23:53 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 7.6 KB, free 2003.7 MB)
380​17/12/07 04:23:53 INFO TorrentBroadcast: Reading broadcast variable 16 took 11 ms
381​17/12/07 04:23:53 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 18.1 KB, free 2003.7 MB)
382​17/12/07 04:23:53 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-20222125, partition values: [empty row]
383​17/12/07 04:23:53 INFO TorrentBroadcast: Started reading broadcast variable 11
384​17/12/07 04:23:53 INFO TransportClientFactory: Successfully created connection to analytics1030.eqiad.wmnet/10.64.36.130:41203 after 1 ms (0 ms spent in bootstraps)
385​17/12/07 04:23:53 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 29.3 KB, free 2003.7 MB)
386​17/12/07 04:23:53 INFO TorrentBroadcast: Reading broadcast variable 11 took 12 ms
387​17/12/07 04:23:53 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 381.7 KB, free 2003.3 MB)
388​17/12/07 04:23:53 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
389​17/12/07 04:23:53 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
390​17/12/07 04:23:53 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
391​17/12/07 04:23:53 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
392
393​Parquet form:
394​message spark_schema {
395​ optional binary wikiid (UTF8);
396​ optional binary query (UTF8);
397​ required int64 norm_query_id;
398​ optional int32 label;
399​ optional group features {
400​ required int32 type (INT_8);
401​ optional int32 size;
402​ optional group indices (LIST) {
403​ repeated group list {
404​ required int32 element;
405​ }
406​ }
407​ optional group values (LIST) {
408​ repeated group list {
409​ required double element;
410​ }
411​ }
412​ }
413​}
414
415​Catalyst form:
416​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
417
418​17/12/07 04:23:53 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 453342 records.
419​17/12/07 04:23:53 INFO InternalParquetRecordReader: at row 0. reading next block
420​17/12/07 04:23:53 INFO InternalParquetRecordReader: block read in memory in 45 ms. row count = 453342
421​17/12/07 04:23:55 INFO Executor: Finished task 43.0 in stage 12.0 (TID 1487). 3061 bytes result sent to driver
422​17/12/07 04:23:55 INFO CoarseGrainedExecutorBackend: Got assigned task 1587
423​17/12/07 04:23:55 INFO Executor: Running task 92.0 in stage 12.0 (TID 1587)
424​17/12/07 04:23:55 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00092-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-20222125, partition values: [empty row]
425​17/12/07 04:23:55 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
426​17/12/07 04:23:55 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
427​17/12/07 04:23:55 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
428​17/12/07 04:23:55 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
429
430​Parquet form:
431​message spark_schema {
432​ optional binary wikiid (UTF8);
433​ optional binary query (UTF8);
434​ required int64 norm_query_id;
435​ optional int32 label;
436​ optional group features {
437​ required int32 type (INT_8);
438​ optional int32 size;
439​ optional group indices (LIST) {
440​ repeated group list {
441​ required int32 element;
442​ }
443​ }
444​ optional group values (LIST) {
445​ repeated group list {
446​ required double element;
447​ }
448​ }
449​ }
450​}
451
452​Catalyst form:
453​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
454
455​17/12/07 04:23:55 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 453906 records.
456​17/12/07 04:23:55 INFO InternalParquetRecordReader: at row 0. reading next block
457​17/12/07 04:23:55 INFO InternalParquetRecordReader: block read in memory in 44 ms. row count = 453906
458​17/12/07 04:23:56 INFO Executor: Finished task 92.0 in stage 12.0 (TID 1587). 2156 bytes result sent to driver
459​17/12/07 04:23:56 INFO CoarseGrainedExecutorBackend: Got assigned task 1651
460​17/12/07 04:23:56 INFO Executor: Running task 143.0 in stage 12.0 (TID 1651)
461​17/12/07 04:23:56 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00143-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-20222125, partition values: [empty row]
462​17/12/07 04:23:56 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
463​17/12/07 04:23:56 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
464​17/12/07 04:23:56 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
465​17/12/07 04:23:56 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
466
467​Parquet form:
468​message spark_schema {
469​ optional binary wikiid (UTF8);
470​ optional binary query (UTF8);
471​ required int64 norm_query_id;
472​ optional int32 label;
473​ optional group features {
474​ required int32 type (INT_8);
475​ optional int32 size;
476​ optional group indices (LIST) {
477​ repeated group list {
478​ required int32 element;
479​ }
480​ }
481​ optional group values (LIST) {
482​ repeated group list {
483​ required double element;
484​ }
485​ }
486​ }
487​}
488
489​Catalyst form:
490​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
491
492​17/12/07 04:23:56 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452514 records.
493​17/12/07 04:23:56 INFO InternalParquetRecordReader: at row 0. reading next block
494​17/12/07 04:23:56 INFO InternalParquetRecordReader: block read in memory in 56 ms. row count = 452514
495​17/12/07 04:23:57 INFO Executor: Finished task 143.0 in stage 12.0 (TID 1651). 2083 bytes result sent to driver
496​17/12/07 04:23:57 INFO CoarseGrainedExecutorBackend: Got assigned task 1712
497​17/12/07 04:23:57 INFO Executor: Running task 60.0 in stage 10.0 (TID 1712)
498​17/12/07 04:23:57 INFO MapOutputTrackerWorker: Updating epoch to 5 and clearing cache
499​17/12/07 04:23:57 INFO TorrentBroadcast: Started reading broadcast variable 20
500​17/12/07 04:23:57 INFO TransportClientFactory: Successfully created connection to analytics1049.eqiad.wmnet/10.64.21.108:33693 after 1 ms (0 ms spent in bootstraps)
501​17/12/07 04:23:57 INFO MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 18.4 KB, free 2003.3 MB)
502​17/12/07 04:23:57 INFO TorrentBroadcast: Reading broadcast variable 20 took 26 ms
503​17/12/07 04:23:57 INFO MemoryStore: Block broadcast_20 stored as values in memory (estimated size 41.3 KB, free 2003.3 MB)
504​17/12/07 04:23:58 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 9, fetching them
505​17/12/07 04:23:58 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
506​17/12/07 04:23:58 INFO MapOutputTrackerWorker: Got the output locations
507​17/12/07 04:23:58 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
508​17/12/07 04:23:58 INFO ShuffleBlockFetcherIterator: Started 64 remote fetches in 24 ms
509​17/12/07 04:23:58 INFO CodeGenerator: Code generated in 21.091631 ms
510​17/12/07 04:23:58 INFO CodeGenerator: Code generated in 14.87779 ms
511​17/12/07 04:23:58 INFO CodeGenerator: Code generated in 18.717434 ms
512​17/12/07 04:23:58 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 10, fetching them
513​17/12/07 04:23:58 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
514​17/12/07 04:23:58 INFO MapOutputTrackerWorker: Got the output locations
515​17/12/07 04:23:58 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 378 blocks
516​17/12/07 04:23:58 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 2 ms
517​17/12/07 04:23:58 INFO CodeGenerator: Code generated in 20.455553 ms
518​17/12/07 04:23:58 INFO CodeGenerator: Code generated in 59.217063 ms
519​17/12/07 04:23:58 INFO CodeGenerator: Code generated in 17.806493 ms
520​17/12/07 04:23:59 INFO Executor: Finished task 60.0 in stage 10.0 (TID 1712). 4785 bytes result sent to driver
521​17/12/07 04:23:59 INFO CoarseGrainedExecutorBackend: Got assigned task 1806
522​17/12/07 04:23:59 INFO Executor: Running task 142.0 in stage 10.0 (TID 1806)
523​17/12/07 04:23:59 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
524​17/12/07 04:23:59 INFO ShuffleBlockFetcherIterator: Started 64 remote fetches in 29 ms
525​17/12/07 04:23:59 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
526​17/12/07 04:23:59 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
527​17/12/07 04:24:00 INFO Executor: Finished task 142.0 in stage 10.0 (TID 1806). 4197 bytes result sent to driver
528​17/12/07 04:24:00 INFO CoarseGrainedExecutorBackend: Got assigned task 1863
529​17/12/07 04:24:00 INFO Executor: Running task 181.0 in stage 10.0 (TID 1863)
530​17/12/07 04:24:00 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
531​17/12/07 04:24:00 INFO ShuffleBlockFetcherIterator: Started 64 remote fetches in 33 ms
532​17/12/07 04:24:00 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
533​17/12/07 04:24:00 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
534​17/12/07 04:24:00 INFO Executor: Finished task 181.0 in stage 10.0 (TID 1863). 4284 bytes result sent to driver
535​17/12/07 04:24:00 INFO CoarseGrainedExecutorBackend: Got assigned task 2033
536​17/12/07 04:24:00 INFO Executor: Running task 203.0 in stage 12.0 (TID 2033)
537​17/12/07 04:24:00 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00166-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-34217774, partition values: [empty row]
538​17/12/07 04:24:00 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
539​17/12/07 04:24:00 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
540​17/12/07 04:24:00 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
541​17/12/07 04:24:00 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
542
543​Parquet form:
544​message spark_schema {
545​ optional binary wikiid (UTF8);
546​ optional binary query (UTF8);
547​ required int64 norm_query_id;
548​ optional int32 label;
549​ optional group features {
550​ required int32 type (INT_8);
551​ optional int32 size;
552​ optional group indices (LIST) {
553​ repeated group list {
554​ required int32 element;
555​ }
556​ }
557​ optional group values (LIST) {
558​ repeated group list {
559​ required double element;
560​ }
561​ }
562​ }
563​}
564
565​Catalyst form:
566​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
567
568​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
569​17/12/07 04:24:01 INFO Executor: Finished task 203.0 in stage 12.0 (TID 2033). 1733 bytes result sent to driver
570​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2047
571​17/12/07 04:24:01 INFO Executor: Running task 221.0 in stage 12.0 (TID 2047)
572​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00092-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-34159414, partition values: [empty row]
573​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
574​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
575​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
576​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
577
578​Parquet form:
579​message spark_schema {
580​ optional binary wikiid (UTF8);
581​ optional binary query (UTF8);
582​ required int64 norm_query_id;
583​ optional int32 label;
584​ optional group features {
585​ required int32 type (INT_8);
586​ optional int32 size;
587​ optional group indices (LIST) {
588​ repeated group list {
589​ required int32 element;
590​ }
591​ }
592​ optional group values (LIST) {
593​ repeated group list {
594​ required double element;
595​ }
596​ }
597​ }
598​}
599
600​Catalyst form:
601​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
602
603​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
604​17/12/07 04:24:01 INFO Executor: Finished task 221.0 in stage 12.0 (TID 2047). 1733 bytes result sent to driver
605​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2059
606​17/12/07 04:24:01 INFO Executor: Running task 232.0 in stage 12.0 (TID 2059)
607​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00093-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-34135930, partition values: [empty row]
608​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
609​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
610​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
611​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
612
613​Parquet form:
614​message spark_schema {
615​ optional binary wikiid (UTF8);
616​ optional binary query (UTF8);
617​ required int64 norm_query_id;
618​ optional int32 label;
619​ optional group features {
620​ required int32 type (INT_8);
621​ optional int32 size;
622​ optional group indices (LIST) {
623​ repeated group list {
624​ required int32 element;
625​ }
626​ }
627​ optional group values (LIST) {
628​ repeated group list {
629​ required double element;
630​ }
631​ }
632​ }
633​}
634
635​Catalyst form:
636​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
637
638​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
639​17/12/07 04:24:01 INFO Executor: Finished task 232.0 in stage 12.0 (TID 2059). 1733 bytes result sent to driver
640​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2069
641​17/12/07 04:24:01 INFO Executor: Running task 283.0 in stage 12.0 (TID 2069)
642​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00008-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-34073645, partition values: [empty row]
643​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
644​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
645​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
646​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
647
648​Parquet form:
649​message spark_schema {
650​ optional binary wikiid (UTF8);
651​ optional binary query (UTF8);
652​ required int64 norm_query_id;
653​ optional int32 label;
654​ optional group features {
655​ required int32 type (INT_8);
656​ optional int32 size;
657​ optional group indices (LIST) {
658​ repeated group list {
659​ required int32 element;
660​ }
661​ }
662​ optional group values (LIST) {
663​ repeated group list {
664​ required double element;
665​ }
666​ }
667​ }
668​}
669
670​Catalyst form:
671​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
672
673​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
674​17/12/07 04:24:01 INFO Executor: Finished task 283.0 in stage 12.0 (TID 2069). 1820 bytes result sent to driver
675​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2080
676​17/12/07 04:24:01 INFO Executor: Running task 310.0 in stage 12.0 (TID 2080)
677​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-33991962, partition values: [empty row]
678​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
679​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
680​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
681​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
682
683​Parquet form:
684​message spark_schema {
685​ optional binary wikiid (UTF8);
686​ optional binary query (UTF8);
687​ required int64 norm_query_id;
688​ optional int32 label;
689​ optional group features {
690​ required int32 type (INT_8);
691​ optional int32 size;
692​ optional group indices (LIST) {
693​ repeated group list {
694​ required int32 element;
695​ }
696​ }
697​ optional group values (LIST) {
698​ repeated group list {
699​ required double element;
700​ }
701​ }
702​ }
703​}
704
705​Catalyst form:
706​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
707
708​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
709​17/12/07 04:24:01 INFO Executor: Finished task 310.0 in stage 12.0 (TID 2080). 1733 bytes result sent to driver
710​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2087
711​17/12/07 04:24:01 INFO Executor: Running task 339.0 in stage 12.0 (TID 2087)
712​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00039-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-33944932, partition values: [empty row]
713​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
714​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
715​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
716​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
717
718​Parquet form:
719​message spark_schema {
720​ optional binary wikiid (UTF8);
721​ optional binary query (UTF8);
722​ required int64 norm_query_id;
723​ optional int32 label;
724​ optional group features {
725​ required int32 type (INT_8);
726​ optional int32 size;
727​ optional group indices (LIST) {
728​ repeated group list {
729​ required int32 element;
730​ }
731​ }
732​ optional group values (LIST) {
733​ repeated group list {
734​ required double element;
735​ }
736​ }
737​ }
738​}
739
740​Catalyst form:
741​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
742
743​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
744​17/12/07 04:24:01 INFO Executor: Finished task 339.0 in stage 12.0 (TID 2087). 1733 bytes result sent to driver
745​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2095
746​17/12/07 04:24:01 INFO Executor: Running task 358.0 in stage 12.0 (TID 2095)
747​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00143-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-33918996, partition values: [empty row]
748​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
749​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
750​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
751​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
752
753​Parquet form:
754​message spark_schema {
755​ optional binary wikiid (UTF8);
756​ optional binary query (UTF8);
757​ required int64 norm_query_id;
758​ optional int32 label;
759​ optional group features {
760​ required int32 type (INT_8);
761​ optional int32 size;
762​ optional group indices (LIST) {
763​ repeated group list {
764​ required int32 element;
765​ }
766​ }
767​ optional group values (LIST) {
768​ repeated group list {
769​ required double element;
770​ }
771​ }
772​ }
773​}
774
775​Catalyst form:
776​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
777
778​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
779​17/12/07 04:24:01 INFO Executor: Finished task 358.0 in stage 12.0 (TID 2095). 1733 bytes result sent to driver
780​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2101
781​17/12/07 04:24:01 INFO Executor: Running task 369.0 in stage 12.0 (TID 2101)
782​17/12/07 04:24:01 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00174-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 20222125-33909092, partition values: [empty row]
783​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
784​17/12/07 04:24:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
785​17/12/07 04:24:01 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
786​17/12/07 04:24:01 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
787
788​Parquet form:
789​message spark_schema {
790​ optional binary wikiid (UTF8);
791​ optional binary query (UTF8);
792​ required int64 norm_query_id;
793​ optional int32 label;
794​ optional group features {
795​ required int32 type (INT_8);
796​ optional int32 size;
797​ optional group indices (LIST) {
798​ repeated group list {
799​ required int32 element;
800​ }
801​ }
802​ optional group values (LIST) {
803​ repeated group list {
804​ required double element;
805​ }
806​ }
807​ }
808​}
809
810​Catalyst form:
811​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
812
813​17/12/07 04:24:01 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
814​17/12/07 04:24:01 INFO Executor: Finished task 369.0 in stage 12.0 (TID 2101). 1733 bytes result sent to driver
815​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2108
816​17/12/07 04:24:01 INFO Executor: Running task 63.0 in stage 13.0 (TID 2108)
817​17/12/07 04:24:01 INFO TorrentBroadcast: Started reading broadcast variable 17
818​17/12/07 04:24:01 INFO TransportClientFactory: Successfully created connection to analytics1048.eqiad.wmnet/10.64.21.107:36753 after 1 ms (0 ms spent in bootstraps)
819​17/12/07 04:24:01 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 9.4 KB, free 2003.3 MB)
820​17/12/07 04:24:01 INFO TorrentBroadcast: Reading broadcast variable 17 took 27 ms
821​17/12/07 04:24:01 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 18.1 KB, free 2003.2 MB)
822​17/12/07 04:24:01 INFO CodeGenerator: Code generated in 21.929687 ms
823​17/12/07 04:24:01 INFO PythonRunner: Times: total = 48, boot = -13673, init = 13717, finish = 4
824​17/12/07 04:24:01 INFO Executor: Finished task 63.0 in stage 13.0 (TID 2108). 3068 bytes result sent to driver
825​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2150
826​17/12/07 04:24:01 INFO Executor: Running task 96.0 in stage 13.0 (TID 2150)
827​17/12/07 04:24:01 INFO PythonRunner: Times: total = 47, boot = -61, init = 104, finish = 4
828​17/12/07 04:24:01 INFO Executor: Finished task 96.0 in stage 13.0 (TID 2150). 2553 bytes result sent to driver
829​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2175
830​17/12/07 04:24:01 INFO Executor: Running task 118.0 in stage 13.0 (TID 2175)
831​17/12/07 04:24:01 INFO PythonRunner: Times: total = 49, boot = -41, init = 86, finish = 4
832​17/12/07 04:24:01 INFO Executor: Finished task 118.0 in stage 13.0 (TID 2175). 2553 bytes result sent to driver
833​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2202
834​17/12/07 04:24:01 INFO Executor: Running task 139.0 in stage 13.0 (TID 2202)
835​17/12/07 04:24:01 INFO PythonRunner: Times: total = 49, boot = -33, init = 78, finish = 4
836​17/12/07 04:24:01 INFO Executor: Finished task 139.0 in stage 13.0 (TID 2202). 2553 bytes result sent to driver
837​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2228
838​17/12/07 04:24:01 INFO Executor: Running task 160.0 in stage 13.0 (TID 2228)
839​17/12/07 04:24:01 INFO PythonRunner: Times: total = 47, boot = -31, init = 74, finish = 4
840​17/12/07 04:24:01 INFO Executor: Finished task 160.0 in stage 13.0 (TID 2228). 2553 bytes result sent to driver
841​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2263
842​17/12/07 04:24:01 INFO Executor: Running task 184.0 in stage 13.0 (TID 2263)
843​17/12/07 04:24:01 INFO PythonRunner: Times: total = 48, boot = -27, init = 71, finish = 4
844​17/12/07 04:24:01 INFO Executor: Finished task 184.0 in stage 13.0 (TID 2263). 2553 bytes result sent to driver
845​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2303
846​17/12/07 04:24:01 INFO Executor: Running task 212.0 in stage 13.0 (TID 2303)
847​17/12/07 04:24:01 INFO PythonRunner: Times: total = 48, boot = -27, init = 69, finish = 6
848​17/12/07 04:24:01 INFO Executor: Finished task 212.0 in stage 13.0 (TID 2303). 2553 bytes result sent to driver
849​17/12/07 04:24:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2350
850​17/12/07 04:24:01 INFO Executor: Running task 253.0 in stage 13.0 (TID 2350)
851​17/12/07 04:24:02 INFO PythonRunner: Times: total = 48, boot = -42, init = 86, finish = 4
852​17/12/07 04:24:02 INFO Executor: Finished task 253.0 in stage 13.0 (TID 2350). 2553 bytes result sent to driver
853​17/12/07 04:24:02 INFO CoarseGrainedExecutorBackend: Got assigned task 2396
854​17/12/07 04:24:02 INFO Executor: Running task 292.0 in stage 13.0 (TID 2396)
855​17/12/07 04:24:02 INFO PythonRunner: Times: total = 47, boot = -22, init = 66, finish = 3
856​17/12/07 04:24:02 INFO Executor: Finished task 292.0 in stage 13.0 (TID 2396). 2553 bytes result sent to driver
857​17/12/07 04:24:02 INFO CoarseGrainedExecutorBackend: Got assigned task 2439
858​17/12/07 04:24:02 INFO Executor: Running task 330.0 in stage 13.0 (TID 2439)
859​17/12/07 04:24:02 INFO PythonRunner: Times: total = 49, boot = -25, init = 71, finish = 3
860​17/12/07 04:24:02 INFO Executor: Finished task 330.0 in stage 13.0 (TID 2439). 2553 bytes result sent to driver
861​17/12/07 04:24:02 INFO CoarseGrainedExecutorBackend: Got assigned task 2488
862​17/12/07 04:24:02 INFO Executor: Running task 375.0 in stage 13.0 (TID 2488)
863​17/12/07 04:24:02 INFO PythonRunner: Times: total = 45, boot = -28, init = 70, finish = 3
864​17/12/07 04:24:02 INFO Executor: Finished task 375.0 in stage 13.0 (TID 2488). 2553 bytes result sent to driver
865​17/12/07 04:24:02 INFO CoarseGrainedExecutorBackend: Got assigned task 2532
866​17/12/07 04:24:02 INFO Executor: Running task 0.0 in stage 11.0 (TID 2532)
867​17/12/07 04:24:02 INFO MapOutputTrackerWorker: Updating epoch to 6 and clearing cache
868​17/12/07 04:24:02 INFO TorrentBroadcast: Started reading broadcast variable 21
869​17/12/07 04:24:02 INFO TransportClientFactory: Successfully created connection to /10.64.53.30:35539 after 1 ms (0 ms spent in bootstraps)
870​17/12/07 04:24:02 INFO MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 17.3 KB, free 2003.3 MB)
871​17/12/07 04:24:02 INFO TorrentBroadcast: Reading broadcast variable 21 took 9 ms
872​17/12/07 04:24:02 INFO MemoryStore: Block broadcast_21 stored as values in memory (estimated size 34.3 KB, free 2003.2 MB)
873​17/12/07 04:24:02 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 11, fetching them
874​17/12/07 04:24:02 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
875​17/12/07 04:24:02 INFO MapOutputTrackerWorker: Got the output locations
876​17/12/07 04:24:02 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
877​17/12/07 04:24:02 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 7 ms
878​17/12/07 04:24:02 INFO CodeGenerator: Code generated in 11.16376 ms
879​17/12/07 04:24:02 INFO CodeGenerator: Code generated in 18.003127 ms
880​17/12/07 04:24:13 INFO MemoryStore: Will not store rdd_73_0
881​17/12/07 04:24:13 WARN MemoryStore: Not enough space to cache rdd_73_0 in memory! (computed 240.2 MB so far)
882​17/12/07 04:24:13 INFO MemoryStore: Memory use = 1396.5 KB (blocks) + 230.1 MB (scratch space shared across 1 tasks(s)) = 231.5 MB. Storage limit = 308.6 MB.
883​17/12/07 04:24:13 WARN BlockManager: Persisting block rdd_73_0 to disk instead.
884​17/12/07 04:24:23 INFO MemoryStore: Block rdd_73_0 stored as values in memory (estimated size 949.9 MB, free 1053.4 MB)
885​17/12/07 04:24:23 INFO CodeGenerator: Code generated in 5.893662 ms
886​17/12/07 04:24:23 INFO CodeGenerator: Code generated in 21.329799 ms
887​17/12/07 04:24:23 INFO CodeGenerator: Code generated in 8.402581 ms
888​17/12/07 04:24:25 INFO MemoryStore: Block taskresult_2532 stored as bytes in memory (estimated size 4.8 MB, free 1048.6 MB)
889​17/12/07 04:24:25 INFO Executor: Finished task 0.0 in stage 11.0 (TID 2532). 4994683 bytes result sent via BlockManager)
890​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 3740
891​17/12/07 04:24:31 INFO Executor: Running task 59.0 in stage 20.0 (TID 3740)
892​17/12/07 04:24:31 INFO MapOutputTrackerWorker: Updating epoch to 12 and clearing cache
893​17/12/07 04:24:31 INFO TorrentBroadcast: Started reading broadcast variable 27
894​17/12/07 04:24:31 INFO TransportClientFactory: Successfully created connection to analytics1039.eqiad.wmnet/10.64.53.18:34883 after 1 ms (0 ms spent in bootstraps)
895​17/12/07 04:24:31 INFO MemoryStore: Block broadcast_27_piece0 stored as bytes in memory (estimated size 9.3 KB, free 1053.4 MB)
896​17/12/07 04:24:31 INFO TorrentBroadcast: Reading broadcast variable 27 took 11 ms
897​17/12/07 04:24:31 INFO MemoryStore: Block broadcast_27 stored as values in memory (estimated size 18.0 KB, free 1053.4 MB)
898​17/12/07 04:24:31 INFO CodeGenerator: Code generated in 22.48162 ms
899​17/12/07 04:24:31 INFO PythonRunner: Times: total = 49, boot = -29033, init = 29078, finish = 4
900​17/12/07 04:24:31 INFO Executor: Finished task 59.0 in stage 20.0 (TID 3740). 3068 bytes result sent to driver
901​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 3771
902​17/12/07 04:24:31 INFO Executor: Running task 90.0 in stage 20.0 (TID 3771)
903​17/12/07 04:24:31 INFO PythonRunner: Times: total = 48, boot = -45, init = 89, finish = 4
904​17/12/07 04:24:31 INFO Executor: Finished task 90.0 in stage 20.0 (TID 3771). 2553 bytes result sent to driver
905​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 3829
906​17/12/07 04:24:31 INFO Executor: Running task 148.0 in stage 20.0 (TID 3829)
907​17/12/07 04:24:31 INFO PythonRunner: Times: total = 45, boot = -21, init = 63, finish = 3
908​17/12/07 04:24:31 INFO Executor: Finished task 148.0 in stage 20.0 (TID 3829). 2553 bytes result sent to driver
909​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 3885
910​17/12/07 04:24:31 INFO Executor: Running task 204.0 in stage 20.0 (TID 3885)
911​17/12/07 04:24:31 INFO PythonRunner: Times: total = 50, boot = -21, init = 67, finish = 4
912​17/12/07 04:24:31 INFO Executor: Finished task 204.0 in stage 20.0 (TID 3885). 2553 bytes result sent to driver
913​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 3946
914​17/12/07 04:24:31 INFO Executor: Running task 265.0 in stage 20.0 (TID 3946)
915​17/12/07 04:24:31 INFO PythonRunner: Times: total = 48, boot = -21, init = 65, finish = 4
916​17/12/07 04:24:31 INFO Executor: Finished task 265.0 in stage 20.0 (TID 3946). 2553 bytes result sent to driver
917​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 4003
918​17/12/07 04:24:31 INFO Executor: Running task 322.0 in stage 20.0 (TID 4003)
919​17/12/07 04:24:31 INFO PythonRunner: Times: total = 48, boot = -20, init = 64, finish = 4
920​17/12/07 04:24:31 INFO Executor: Finished task 322.0 in stage 20.0 (TID 4003). 2553 bytes result sent to driver
921​17/12/07 04:24:31 INFO CoarseGrainedExecutorBackend: Got assigned task 4063
922​17/12/07 04:24:31 INFO Executor: Running task 8.0 in stage 21.0 (TID 4063)
923​17/12/07 04:24:31 INFO TorrentBroadcast: Started reading broadcast variable 28
924​17/12/07 04:24:31 INFO TransportClientFactory: Successfully created connection to analytics1055.eqiad.wmnet/10.64.5.18:32823 after 1 ms (0 ms spent in bootstraps)
925​17/12/07 04:24:31 INFO MemoryStore: Block broadcast_28_piece0 stored as bytes in memory (estimated size 7.6 KB, free 1053.3 MB)
926​17/12/07 04:24:31 INFO TorrentBroadcast: Reading broadcast variable 28 took 9 ms
927​17/12/07 04:24:31 INFO MemoryStore: Block broadcast_28 stored as values in memory (estimated size 18.1 KB, free 1053.3 MB)
928​17/12/07 04:24:31 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00008-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
929​17/12/07 04:24:31 INFO TorrentBroadcast: Started reading broadcast variable 26
930​17/12/07 04:24:31 INFO MemoryStore: Block broadcast_26_piece0 stored as bytes in memory (estimated size 29.3 KB, free 1053.3 MB)
931​17/12/07 04:24:31 INFO TorrentBroadcast: Reading broadcast variable 26 took 7 ms
932​17/12/07 04:24:31 INFO MemoryStore: Block broadcast_26 stored as values in memory (estimated size 381.7 KB, free 1052.9 MB)
933​17/12/07 04:24:31 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
934​17/12/07 04:24:31 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
935​17/12/07 04:24:31 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
936​17/12/07 04:24:31 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
937
938​Parquet form:
939​message spark_schema {
940​ optional binary wikiid (UTF8);
941​ optional binary query (UTF8);
942​ required int64 norm_query_id;
943​ optional int32 label;
944​ optional group features {
945​ required int32 type (INT_8);
946​ optional int32 size;
947​ optional group indices (LIST) {
948​ repeated group list {
949​ required int32 element;
950​ }
951​ }
952​ optional group values (LIST) {
953​ repeated group list {
954​ required double element;
955​ }
956​ }
957​ }
958​}
959
960​Catalyst form:
961​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
962
963​17/12/07 04:24:31 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452480 records.
964​17/12/07 04:24:31 INFO InternalParquetRecordReader: at row 0. reading next block
965​17/12/07 04:24:31 INFO InternalParquetRecordReader: block read in memory in 49 ms. row count = 452480
966​17/12/07 04:24:33 INFO Executor: Finished task 8.0 in stage 21.0 (TID 4063). 3061 bytes result sent to driver
967​17/12/07 04:24:33 INFO CoarseGrainedExecutorBackend: Got assigned task 4144
968​17/12/07 04:24:33 INFO Executor: Running task 93.0 in stage 21.0 (TID 4144)
969​17/12/07 04:24:33 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00093-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
970​17/12/07 04:24:33 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
971​17/12/07 04:24:33 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
972​17/12/07 04:24:33 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
973​17/12/07 04:24:33 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
974
975​Parquet form:
976​message spark_schema {
977​ optional binary wikiid (UTF8);
978​ optional binary query (UTF8);
979​ required int64 norm_query_id;
980​ optional int32 label;
981​ optional group features {
982​ required int32 type (INT_8);
983​ optional int32 size;
984​ optional group indices (LIST) {
985​ repeated group list {
986​ required int32 element;
987​ }
988​ }
989​ optional group values (LIST) {
990​ repeated group list {
991​ required double element;
992​ }
993​ }
994​ }
995​}
996
997​Catalyst form:
998​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
999
1000​17/12/07 04:24:33 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 453016 records.
1001​17/12/07 04:24:33 INFO InternalParquetRecordReader: at row 0. reading next block
1002​17/12/07 04:24:33 INFO InternalParquetRecordReader: block read in memory in 39 ms. row count = 453016
1003​17/12/07 04:24:34 INFO Executor: Finished task 93.0 in stage 21.0 (TID 4144). 2156 bytes result sent to driver
1004​17/12/07 04:24:34 INFO CoarseGrainedExecutorBackend: Got assigned task 4217
1005​17/12/07 04:24:34 INFO Executor: Running task 147.0 in stage 21.0 (TID 4217)
1006​17/12/07 04:24:34 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00147-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1007​17/12/07 04:24:34 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1008​17/12/07 04:24:34 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1009​17/12/07 04:24:34 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1010​17/12/07 04:24:34 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1011
1012​Parquet form:
1013​message spark_schema {
1014​ optional binary wikiid (UTF8);
1015​ optional binary query (UTF8);
1016​ required int64 norm_query_id;
1017​ optional int32 label;
1018​ optional group features {
1019​ required int32 type (INT_8);
1020​ optional int32 size;
1021​ optional group indices (LIST) {
1022​ repeated group list {
1023​ required int32 element;
1024​ }
1025​ }
1026​ optional group values (LIST) {
1027​ repeated group list {
1028​ required double element;
1029​ }
1030​ }
1031​ }
1032​}
1033
1034​Catalyst form:
1035​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1036
1037​17/12/07 04:24:34 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452470 records.
1038​17/12/07 04:24:34 INFO InternalParquetRecordReader: at row 0. reading next block
1039​17/12/07 04:24:34 INFO InternalParquetRecordReader: block read in memory in 37 ms. row count = 452470
1040​17/12/07 04:24:35 INFO Executor: Finished task 147.0 in stage 21.0 (TID 4217). 2243 bytes result sent to driver
1041​17/12/07 04:24:35 INFO CoarseGrainedExecutorBackend: Got assigned task 4331
1042​17/12/07 04:24:35 INFO Executor: Running task 203.0 in stage 21.0 (TID 4331)
1043​17/12/07 04:24:35 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00166-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34217774, partition values: [empty row]
1044​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1045​17/12/07 04:24:35 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1046​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1047​17/12/07 04:24:35 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1048
1049​Parquet form:
1050​message spark_schema {
1051​ optional binary wikiid (UTF8);
1052​ optional binary query (UTF8);
1053​ required int64 norm_query_id;
1054​ optional int32 label;
1055​ optional group features {
1056​ required int32 type (INT_8);
1057​ optional int32 size;
1058​ optional group indices (LIST) {
1059​ repeated group list {
1060​ required int32 element;
1061​ }
1062​ }
1063​ optional group values (LIST) {
1064​ repeated group list {
1065​ required double element;
1066​ }
1067​ }
1068​ }
1069​}
1070
1071​Catalyst form:
1072​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1073
1074​17/12/07 04:24:35 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1075​17/12/07 04:24:35 INFO Executor: Finished task 203.0 in stage 21.0 (TID 4331). 1733 bytes result sent to driver
1076​17/12/07 04:24:35 INFO CoarseGrainedExecutorBackend: Got assigned task 4344
1077​17/12/07 04:24:35 INFO Executor: Running task 232.0 in stage 21.0 (TID 4344)
1078​17/12/07 04:24:35 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00093-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34135930, partition values: [empty row]
1079​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1080​17/12/07 04:24:35 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1081​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1082​17/12/07 04:24:35 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1083
1084​Parquet form:
1085​message spark_schema {
1086​ optional binary wikiid (UTF8);
1087​ optional binary query (UTF8);
1088​ required int64 norm_query_id;
1089​ optional int32 label;
1090​ optional group features {
1091​ required int32 type (INT_8);
1092​ optional int32 size;
1093​ optional group indices (LIST) {
1094​ repeated group list {
1095​ required int32 element;
1096​ }
1097​ }
1098​ optional group values (LIST) {
1099​ repeated group list {
1100​ required double element;
1101​ }
1102​ }
1103​ }
1104​}
1105
1106​Catalyst form:
1107​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1108
1109​17/12/07 04:24:35 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1110​17/12/07 04:24:35 INFO Executor: Finished task 232.0 in stage 21.0 (TID 4344). 1733 bytes result sent to driver
1111​17/12/07 04:24:35 INFO CoarseGrainedExecutorBackend: Got assigned task 4356
1112​17/12/07 04:24:35 INFO Executor: Running task 298.0 in stage 21.0 (TID 4356)
1113​17/12/07 04:24:35 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00194-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34016885, partition values: [empty row]
1114​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1115​17/12/07 04:24:35 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1116​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1117​17/12/07 04:24:35 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1118
1119​Parquet form:
1120​message spark_schema {
1121​ optional binary wikiid (UTF8);
1122​ optional binary query (UTF8);
1123​ required int64 norm_query_id;
1124​ optional int32 label;
1125​ optional group features {
1126​ required int32 type (INT_8);
1127​ optional int32 size;
1128​ optional group indices (LIST) {
1129​ repeated group list {
1130​ required int32 element;
1131​ }
1132​ }
1133​ optional group values (LIST) {
1134​ repeated group list {
1135​ required double element;
1136​ }
1137​ }
1138​ }
1139​}
1140
1141​Catalyst form:
1142​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1143
1144​17/12/07 04:24:35 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1145​17/12/07 04:24:35 INFO Executor: Finished task 298.0 in stage 21.0 (TID 4356). 1733 bytes result sent to driver
1146​17/12/07 04:24:35 INFO CoarseGrainedExecutorBackend: Got assigned task 4364
1147​17/12/07 04:24:35 INFO Executor: Running task 310.0 in stage 21.0 (TID 4364)
1148​17/12/07 04:24:35 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-33991962, partition values: [empty row]
1149​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1150​17/12/07 04:24:35 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1151​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1152​17/12/07 04:24:35 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1153
1154​Parquet form:
1155​message spark_schema {
1156​ optional binary wikiid (UTF8);
1157​ optional binary query (UTF8);
1158​ required int64 norm_query_id;
1159​ optional int32 label;
1160​ optional group features {
1161​ required int32 type (INT_8);
1162​ optional int32 size;
1163​ optional group indices (LIST) {
1164​ repeated group list {
1165​ required int32 element;
1166​ }
1167​ }
1168​ optional group values (LIST) {
1169​ repeated group list {
1170​ required double element;
1171​ }
1172​ }
1173​ }
1174​}
1175
1176​Catalyst form:
1177​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1178
1179​17/12/07 04:24:35 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1180​17/12/07 04:24:35 INFO Executor: Finished task 310.0 in stage 21.0 (TID 4364). 1733 bytes result sent to driver
1181​17/12/07 04:24:35 INFO CoarseGrainedExecutorBackend: Got assigned task 4370
1182​17/12/07 04:24:35 INFO Executor: Running task 339.0 in stage 21.0 (TID 4370)
1183​17/12/07 04:24:35 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00039-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-33944932, partition values: [empty row]
1184​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1185​17/12/07 04:24:35 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1186​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1187​17/12/07 04:24:35 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1188
1189​Parquet form:
1190​message spark_schema {
1191​ optional binary wikiid (UTF8);
1192​ optional binary query (UTF8);
1193​ required int64 norm_query_id;
1194​ optional int32 label;
1195​ optional group features {
1196​ required int32 type (INT_8);
1197​ optional int32 size;
1198​ optional group indices (LIST) {
1199​ repeated group list {
1200​ required int32 element;
1201​ }
1202​ }
1203​ optional group values (LIST) {
1204​ repeated group list {
1205​ required double element;
1206​ }
1207​ }
1208​ }
1209​}
1210
1211​Catalyst form:
1212​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1213
1214​17/12/07 04:24:35 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1215​17/12/07 04:24:35 INFO Executor: Finished task 339.0 in stage 21.0 (TID 4370). 1733 bytes result sent to driver
1216​17/12/07 04:24:35 INFO CoarseGrainedExecutorBackend: Got assigned task 4377
1217​17/12/07 04:24:35 INFO Executor: Running task 369.0 in stage 21.0 (TID 4377)
1218​17/12/07 04:24:35 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00174-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-33909092, partition values: [empty row]
1219​17/12/07 04:24:35 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1220​17/12/07 04:24:35 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1221​17/12/07 04:24:36 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1222​17/12/07 04:24:36 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1223
1224​Parquet form:
1225​message spark_schema {
1226​ optional binary wikiid (UTF8);
1227​ optional binary query (UTF8);
1228​ required int64 norm_query_id;
1229​ optional int32 label;
1230​ optional group features {
1231​ required int32 type (INT_8);
1232​ optional int32 size;
1233​ optional group indices (LIST) {
1234​ repeated group list {
1235​ required int32 element;
1236​ }
1237​ }
1238​ optional group values (LIST) {
1239​ repeated group list {
1240​ required double element;
1241​ }
1242​ }
1243​ }
1244​}
1245
1246​Catalyst form:
1247​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1248
1249​17/12/07 04:24:36 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1250​17/12/07 04:24:36 INFO Executor: Finished task 369.0 in stage 21.0 (TID 4377). 1733 bytes result sent to driver
1251​17/12/07 04:24:37 INFO CoarseGrainedExecutorBackend: Got assigned task 4496
1252​17/12/07 04:24:38 INFO Executor: Running task 31.0 in stage 22.0 (TID 4496)
1253​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Updating epoch to 14 and clearing cache
1254​17/12/07 04:24:38 INFO TorrentBroadcast: Started reading broadcast variable 30
1255​17/12/07 04:24:38 INFO MemoryStore: Block broadcast_30_piece0 stored as bytes in memory (estimated size 18.3 KB, free 1052.9 MB)
1256​17/12/07 04:24:38 INFO TorrentBroadcast: Reading broadcast variable 30 took 13 ms
1257​17/12/07 04:24:38 INFO MemoryStore: Block broadcast_30 stored as values in memory (estimated size 41.3 KB, free 1052.9 MB)
1258​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 12, fetching them
1259​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
1260​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Got the output locations
1261​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1262​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Started 60 remote fetches in 25 ms
1263​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 13, fetching them
1264​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
1265​17/12/07 04:24:38 INFO MapOutputTrackerWorker: Got the output locations
1266​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1267​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
1268​17/12/07 04:24:38 INFO Executor: Finished task 31.0 in stage 22.0 (TID 4496). 4712 bytes result sent to driver
1269​17/12/07 04:24:38 INFO CoarseGrainedExecutorBackend: Got assigned task 4555
1270​17/12/07 04:24:38 INFO Executor: Running task 108.0 in stage 22.0 (TID 4555)
1271​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1272​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Started 60 remote fetches in 27 ms
1273​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1274​17/12/07 04:24:38 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
1275​17/12/07 04:24:39 INFO Executor: Finished task 108.0 in stage 22.0 (TID 4555). 4284 bytes result sent to driver
1276​17/12/07 04:24:39 INFO CoarseGrainedExecutorBackend: Got assigned task 4632
1277​17/12/07 04:24:39 INFO Executor: Running task 184.0 in stage 22.0 (TID 4632)
1278​17/12/07 04:24:39 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1279​17/12/07 04:24:39 INFO ShuffleBlockFetcherIterator: Started 60 remote fetches in 15 ms
1280​17/12/07 04:24:39 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1281​17/12/07 04:24:39 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 1 ms
1282​17/12/07 04:24:39 INFO Executor: Finished task 184.0 in stage 22.0 (TID 4632). 4197 bytes result sent to driver
1283​17/12/07 04:24:39 INFO CoarseGrainedExecutorBackend: Got assigned task 4733
1284​17/12/07 04:24:39 INFO Executor: Running task 76.0 in stage 24.0 (TID 4733)
1285​17/12/07 04:24:39 INFO TorrentBroadcast: Started reading broadcast variable 31
1286​17/12/07 04:24:39 INFO TransportClientFactory: Successfully created connection to analytics1053.eqiad.wmnet/10.64.5.16:34421 after 1 ms (0 ms spent in bootstraps)
1287​17/12/07 04:24:39 INFO MemoryStore: Block broadcast_31_piece0 stored as bytes in memory (estimated size 9.3 KB, free 1052.9 MB)
1288​17/12/07 04:24:39 INFO TorrentBroadcast: Reading broadcast variable 31 took 36 ms
1289​17/12/07 04:24:39 INFO MemoryStore: Block broadcast_31 stored as values in memory (estimated size 18.0 KB, free 1052.9 MB)
1290​17/12/07 04:24:39 INFO CodeGenerator: Code generated in 20.977209 ms
1291​17/12/07 04:24:40 INFO PythonRunner: Times: total = 51, boot = -8097, init = 8144, finish = 4
1292​17/12/07 04:24:40 INFO Executor: Finished task 76.0 in stage 24.0 (TID 4733). 3068 bytes result sent to driver
1293​17/12/07 04:24:40 INFO CoarseGrainedExecutorBackend: Got assigned task 4804
1294​17/12/07 04:24:40 INFO Executor: Running task 145.0 in stage 24.0 (TID 4804)
1295​17/12/07 04:24:40 INFO PythonRunner: Times: total = 48, boot = -40, init = 84, finish = 4
1296​17/12/07 04:24:40 INFO Executor: Finished task 145.0 in stage 24.0 (TID 4804). 2553 bytes result sent to driver
1297​17/12/07 04:24:40 INFO CoarseGrainedExecutorBackend: Got assigned task 4855
1298​17/12/07 04:24:40 INFO Executor: Running task 196.0 in stage 24.0 (TID 4855)
1299​17/12/07 04:24:40 INFO PythonRunner: Times: total = 49, boot = -21, init = 66, finish = 4
1300​17/12/07 04:24:40 INFO Executor: Finished task 196.0 in stage 24.0 (TID 4855). 2553 bytes result sent to driver
1301​17/12/07 04:24:40 INFO CoarseGrainedExecutorBackend: Got assigned task 4912
1302​17/12/07 04:24:40 INFO Executor: Running task 253.0 in stage 24.0 (TID 4912)
1303​17/12/07 04:24:40 INFO PythonRunner: Times: total = 49, boot = -18, init = 63, finish = 4
1304​17/12/07 04:24:40 INFO Executor: Finished task 253.0 in stage 24.0 (TID 4912). 2553 bytes result sent to driver
1305​17/12/07 04:24:40 INFO CoarseGrainedExecutorBackend: Got assigned task 4971
1306​17/12/07 04:24:40 INFO Executor: Running task 312.0 in stage 24.0 (TID 4971)
1307​17/12/07 04:24:40 INFO PythonRunner: Times: total = 46, boot = -63, init = 105, finish = 4
1308​17/12/07 04:24:40 INFO Executor: Finished task 312.0 in stage 24.0 (TID 4971). 2553 bytes result sent to driver
1309​17/12/07 04:24:40 INFO CoarseGrainedExecutorBackend: Got assigned task 5029
1310​17/12/07 04:24:40 INFO Executor: Running task 370.0 in stage 24.0 (TID 5029)
1311​17/12/07 04:24:40 INFO PythonRunner: Times: total = 49, boot = -72, init = 117, finish = 4
1312​17/12/07 04:24:40 INFO Executor: Finished task 370.0 in stage 24.0 (TID 5029). 2553 bytes result sent to driver
1313​17/12/07 04:24:40 INFO CoarseGrainedExecutorBackend: Got assigned task 5101
1314​17/12/07 04:24:40 INFO Executor: Running task 43.0 in stage 25.0 (TID 5101)
1315​17/12/07 04:24:40 INFO TorrentBroadcast: Started reading broadcast variable 32
1316​17/12/07 04:24:40 INFO MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 7.6 KB, free 1052.9 MB)
1317​17/12/07 04:24:40 INFO TorrentBroadcast: Reading broadcast variable 32 took 7 ms
1318​17/12/07 04:24:40 INFO MemoryStore: Block broadcast_32 stored as values in memory (estimated size 18.1 KB, free 1052.8 MB)
1319​17/12/07 04:24:40 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1320​17/12/07 04:24:40 INFO TorrentBroadcast: Started reading broadcast variable 29
1321​17/12/07 04:24:40 INFO MemoryStore: Block broadcast_29_piece0 stored as bytes in memory (estimated size 29.3 KB, free 1052.8 MB)
1322​17/12/07 04:24:40 INFO TorrentBroadcast: Reading broadcast variable 29 took 7 ms
1323​17/12/07 04:24:40 INFO MemoryStore: Block broadcast_29 stored as values in memory (estimated size 381.7 KB, free 1052.4 MB)
1324​17/12/07 04:24:40 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1325​17/12/07 04:24:40 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1326​17/12/07 04:24:40 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1327​17/12/07 04:24:40 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1328
1329​Parquet form:
1330​message spark_schema {
1331​ optional binary wikiid (UTF8);
1332​ optional binary query (UTF8);
1333​ required int64 norm_query_id;
1334​ optional int32 label;
1335​ optional group features {
1336​ required int32 type (INT_8);
1337​ optional int32 size;
1338​ optional group indices (LIST) {
1339​ repeated group list {
1340​ required int32 element;
1341​ }
1342​ }
1343​ optional group values (LIST) {
1344​ repeated group list {
1345​ required double element;
1346​ }
1347​ }
1348​ }
1349​}
1350
1351​Catalyst form:
1352​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1353
1354​17/12/07 04:24:40 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 453342 records.
1355​17/12/07 04:24:40 INFO InternalParquetRecordReader: at row 0. reading next block
1356​17/12/07 04:24:40 INFO InternalParquetRecordReader: block read in memory in 43 ms. row count = 453342
1357​17/12/07 04:24:41 INFO Executor: Finished task 43.0 in stage 25.0 (TID 5101). 3061 bytes result sent to driver
1358​17/12/07 04:24:41 INFO CoarseGrainedExecutorBackend: Got assigned task 5127
1359​17/12/07 04:24:41 INFO Executor: Running task 143.0 in stage 25.0 (TID 5127)
1360​17/12/07 04:24:41 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00143-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1361​17/12/07 04:24:41 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1362​17/12/07 04:24:41 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1363​17/12/07 04:24:41 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1364​17/12/07 04:24:41 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1365
1366​Parquet form:
1367​message spark_schema {
1368​ optional binary wikiid (UTF8);
1369​ optional binary query (UTF8);
1370​ required int64 norm_query_id;
1371​ optional int32 label;
1372​ optional group features {
1373​ required int32 type (INT_8);
1374​ optional int32 size;
1375​ optional group indices (LIST) {
1376​ repeated group list {
1377​ required int32 element;
1378​ }
1379​ }
1380​ optional group values (LIST) {
1381​ repeated group list {
1382​ required double element;
1383​ }
1384​ }
1385​ }
1386​}
1387
1388​Catalyst form:
1389​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1390
1391​17/12/07 04:24:41 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452514 records.
1392​17/12/07 04:24:41 INFO InternalParquetRecordReader: at row 0. reading next block
1393​17/12/07 04:24:41 INFO InternalParquetRecordReader: block read in memory in 35 ms. row count = 452514
1394​17/12/07 04:24:42 INFO Executor: Finished task 143.0 in stage 25.0 (TID 5127). 2156 bytes result sent to driver
1395​17/12/07 04:24:42 INFO CoarseGrainedExecutorBackend: Got assigned task 5166
1396​17/12/07 04:24:42 INFO Executor: Running task 147.0 in stage 25.0 (TID 5166)
1397​17/12/07 04:24:42 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00147-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1398​17/12/07 04:24:42 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1399​17/12/07 04:24:42 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1400​17/12/07 04:24:42 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1401​17/12/07 04:24:42 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1402
1403​Parquet form:
1404​message spark_schema {
1405​ optional binary wikiid (UTF8);
1406​ optional binary query (UTF8);
1407​ required int64 norm_query_id;
1408​ optional int32 label;
1409​ optional group features {
1410​ required int32 type (INT_8);
1411​ optional int32 size;
1412​ optional group indices (LIST) {
1413​ repeated group list {
1414​ required int32 element;
1415​ }
1416​ }
1417​ optional group values (LIST) {
1418​ repeated group list {
1419​ required double element;
1420​ }
1421​ }
1422​ }
1423​}
1424
1425​Catalyst form:
1426​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1427
1428​17/12/07 04:24:42 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452470 records.
1429​17/12/07 04:24:42 INFO InternalParquetRecordReader: at row 0. reading next block
1430​17/12/07 04:24:42 INFO InternalParquetRecordReader: block read in memory in 36 ms. row count = 452470
1431​17/12/07 04:24:43 INFO Executor: Finished task 147.0 in stage 25.0 (TID 5166). 2156 bytes result sent to driver
1432​17/12/07 04:24:43 INFO CoarseGrainedExecutorBackend: Got assigned task 5273
1433​17/12/07 04:24:43 INFO Executor: Running task 174.0 in stage 25.0 (TID 5273)
1434​17/12/07 04:24:43 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00174-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1435​17/12/07 04:24:43 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1436​17/12/07 04:24:43 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1437​17/12/07 04:24:43 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1438​17/12/07 04:24:43 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1439
1440​Parquet form:
1441​message spark_schema {
1442​ optional binary wikiid (UTF8);
1443​ optional binary query (UTF8);
1444​ required int64 norm_query_id;
1445​ optional int32 label;
1446​ optional group features {
1447​ required int32 type (INT_8);
1448​ optional int32 size;
1449​ optional group indices (LIST) {
1450​ repeated group list {
1451​ required int32 element;
1452​ }
1453​ }
1454​ optional group values (LIST) {
1455​ repeated group list {
1456​ required double element;
1457​ }
1458​ }
1459​ }
1460​}
1461
1462​Catalyst form:
1463​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1464
1465​17/12/07 04:24:43 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 452439 records.
1466​17/12/07 04:24:43 INFO InternalParquetRecordReader: at row 0. reading next block
1467​17/12/07 04:24:43 INFO InternalParquetRecordReader: block read in memory in 38 ms. row count = 452439
1468​17/12/07 04:24:45 INFO Executor: Finished task 174.0 in stage 25.0 (TID 5273). 2156 bytes result sent to driver
1469​17/12/07 04:24:45 INFO CoarseGrainedExecutorBackend: Got assigned task 5694
1470​17/12/07 04:24:45 INFO Executor: Running task 296.0 in stage 28.0 (TID 5694)
1471​17/12/07 04:24:45 INFO TorrentBroadcast: Started reading broadcast variable 34
1472​17/12/07 04:24:45 INFO MemoryStore: Block broadcast_34_piece0 stored as bytes in memory (estimated size 9.3 KB, free 1052.4 MB)
1473​17/12/07 04:24:45 INFO TorrentBroadcast: Reading broadcast variable 34 took 6 ms
1474​17/12/07 04:24:45 INFO MemoryStore: Block broadcast_34 stored as values in memory (estimated size 18.0 KB, free 1052.4 MB)
1475​17/12/07 04:24:45 INFO CodeGenerator: Code generated in 22.92996 ms
1476​17/12/07 04:24:45 INFO PythonRunner: Times: total = 49, boot = -4746, init = 4791, finish = 4
1477​17/12/07 04:24:45 INFO Executor: Finished task 296.0 in stage 28.0 (TID 5694). 3068 bytes result sent to driver
1478​17/12/07 04:24:45 INFO CoarseGrainedExecutorBackend: Got assigned task 5745
1479​17/12/07 04:24:45 INFO Executor: Running task 340.0 in stage 28.0 (TID 5745)
1480​17/12/07 04:24:45 INFO PythonRunner: Times: total = 47, boot = -39, init = 82, finish = 4
1481​17/12/07 04:24:45 INFO Executor: Finished task 340.0 in stage 28.0 (TID 5745). 2553 bytes result sent to driver
1482​17/12/07 04:24:45 INFO CoarseGrainedExecutorBackend: Got assigned task 5779
1483​17/12/07 04:24:45 INFO Executor: Running task 373.0 in stage 28.0 (TID 5779)
1484​17/12/07 04:24:45 INFO PythonRunner: Times: total = 48, boot = -16, init = 60, finish = 4
1485​17/12/07 04:24:45 INFO Executor: Finished task 373.0 in stage 28.0 (TID 5779). 2553 bytes result sent to driver
1486​17/12/07 04:24:45 INFO CoarseGrainedExecutorBackend: Got assigned task 5820
1487​17/12/07 04:24:45 INFO Executor: Running task 43.0 in stage 29.0 (TID 5820)
1488​17/12/07 04:24:45 INFO TorrentBroadcast: Started reading broadcast variable 35
1489​17/12/07 04:24:45 INFO MemoryStore: Block broadcast_35_piece0 stored as bytes in memory (estimated size 7.6 KB, free 1052.4 MB)
1490​17/12/07 04:24:45 INFO TorrentBroadcast: Reading broadcast variable 35 took 7 ms
1491​17/12/07 04:24:45 INFO MemoryStore: Block broadcast_35 stored as values in memory (estimated size 18.1 KB, free 1052.4 MB)
1492​17/12/07 04:24:45 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1493​17/12/07 04:24:45 INFO TorrentBroadcast: Started reading broadcast variable 33
1494​17/12/07 04:24:45 INFO MemoryStore: Block broadcast_33_piece0 stored as bytes in memory (estimated size 29.3 KB, free 1052.4 MB)
1495​17/12/07 04:24:45 INFO TorrentBroadcast: Reading broadcast variable 33 took 5 ms
1496​17/12/07 04:24:45 INFO MemoryStore: Block broadcast_33 stored as values in memory (estimated size 381.7 KB, free 1052.0 MB)
1497​17/12/07 04:24:45 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1498​17/12/07 04:24:45 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1499​17/12/07 04:24:45 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1500​17/12/07 04:24:45 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1501
1502​Parquet form:
1503​message spark_schema {
1504​ optional binary wikiid (UTF8);
1505​ optional binary query (UTF8);
1506​ required int64 norm_query_id;
1507​ optional int32 label;
1508​ optional group features {
1509​ required int32 type (INT_8);
1510​ optional int32 size;
1511​ optional group indices (LIST) {
1512​ repeated group list {
1513​ required int32 element;
1514​ }
1515​ }
1516​ optional group values (LIST) {
1517​ repeated group list {
1518​ required double element;
1519​ }
1520​ }
1521​ }
1522​}
1523
1524​Catalyst form:
1525​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1526
1527​17/12/07 04:24:45 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 453342 records.
1528​17/12/07 04:24:45 INFO InternalParquetRecordReader: at row 0. reading next block
1529​17/12/07 04:24:45 INFO InternalParquetRecordReader: block read in memory in 44 ms. row count = 453342
1530​17/12/07 04:24:46 INFO Executor: Finished task 43.0 in stage 29.0 (TID 5820). 3061 bytes result sent to driver
1531​17/12/07 04:24:46 INFO CoarseGrainedExecutorBackend: Got assigned task 5892
1532​17/12/07 04:24:46 INFO Executor: Running task 92.0 in stage 29.0 (TID 5892)
1533​17/12/07 04:24:46 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00092-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1534​17/12/07 04:24:46 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1535​17/12/07 04:24:46 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1536​17/12/07 04:24:46 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1537​17/12/07 04:24:46 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1538
1539​Parquet form:
1540​message spark_schema {
1541​ optional binary wikiid (UTF8);
1542​ optional binary query (UTF8);
1543​ required int64 norm_query_id;
1544​ optional int32 label;
1545​ optional group features {
1546​ required int32 type (INT_8);
1547​ optional int32 size;
1548​ optional group indices (LIST) {
1549​ repeated group list {
1550​ required int32 element;
1551​ }
1552​ }
1553​ optional group values (LIST) {
1554​ repeated group list {
1555​ required double element;
1556​ }
1557​ }
1558​ }
1559​}
1560
1561​Catalyst form:
1562​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1563
1564​17/12/07 04:24:46 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 453906 records.
1565​17/12/07 04:24:46 INFO InternalParquetRecordReader: at row 0. reading next block
1566​17/12/07 04:24:46 INFO InternalParquetRecordReader: block read in memory in 37 ms. row count = 453906
1567​17/12/07 04:24:48 INFO Executor: Finished task 92.0 in stage 29.0 (TID 5892). 2156 bytes result sent to driver
1568​17/12/07 04:24:48 INFO CoarseGrainedExecutorBackend: Got assigned task 5979
1569​17/12/07 04:24:48 INFO Executor: Running task 40.0 in stage 26.0 (TID 5979)
1570​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Updating epoch to 18 and clearing cache
1571​17/12/07 04:24:48 INFO TorrentBroadcast: Started reading broadcast variable 37
1572​17/12/07 04:24:48 INFO TransportClientFactory: Successfully created connection to analytics1043.eqiad.wmnet/10.64.53.23:41817 after 1 ms (0 ms spent in bootstraps)
1573​17/12/07 04:24:48 INFO MemoryStore: Block broadcast_37_piece0 stored as bytes in memory (estimated size 18.3 KB, free 1052.0 MB)
1574​17/12/07 04:24:48 INFO TorrentBroadcast: Reading broadcast variable 37 took 12 ms
1575​17/12/07 04:24:48 INFO MemoryStore: Block broadcast_37 stored as values in memory (estimated size 41.3 KB, free 1051.9 MB)
1576​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 15, fetching them
1577​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
1578​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Got the output locations
1579​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1580​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Started 61 remote fetches in 21 ms
1581​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 16, fetching them
1582​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
1583​17/12/07 04:24:48 INFO MapOutputTrackerWorker: Got the output locations
1584​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1585​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
1586​17/12/07 04:24:48 INFO Executor: Finished task 40.0 in stage 26.0 (TID 5979). 4712 bytes result sent to driver
1587​17/12/07 04:24:48 INFO CoarseGrainedExecutorBackend: Got assigned task 6031
1588​17/12/07 04:24:48 INFO Executor: Running task 119.0 in stage 26.0 (TID 6031)
1589​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1590​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Started 61 remote fetches in 14 ms
1591​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 378 blocks
1592​17/12/07 04:24:48 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 0 ms
1593​17/12/07 04:24:49 INFO Executor: Finished task 119.0 in stage 26.0 (TID 6031). 4197 bytes result sent to driver
1594​17/12/07 04:24:49 INFO CoarseGrainedExecutorBackend: Got assigned task 6119
1595​17/12/07 04:24:49 INFO Executor: Running task 181.0 in stage 26.0 (TID 6119)
1596​17/12/07 04:24:49 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1597​17/12/07 04:24:49 INFO ShuffleBlockFetcherIterator: Started 61 remote fetches in 16 ms
1598​17/12/07 04:24:49 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1599​17/12/07 04:24:49 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 1 ms
1600​17/12/07 04:24:50 INFO Executor: Finished task 181.0 in stage 26.0 (TID 6119). 4197 bytes result sent to driver
1601​17/12/07 04:24:50 INFO CoarseGrainedExecutorBackend: Got assigned task 6180
1602​17/12/07 04:24:50 INFO Executor: Running task 148.0 in stage 29.0 (TID 6180)
1603​17/12/07 04:24:50 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00148-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 0-19599906, partition values: [empty row]
1604​17/12/07 04:24:50 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1605​17/12/07 04:24:50 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1606​17/12/07 04:24:50 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1607​17/12/07 04:24:50 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1608
1609​Parquet form:
1610​message spark_schema {
1611​ optional binary wikiid (UTF8);
1612​ optional binary query (UTF8);
1613​ required int64 norm_query_id;
1614​ optional int32 label;
1615​ optional group features {
1616​ required int32 type (INT_8);
1617​ optional int32 size;
1618​ optional group indices (LIST) {
1619​ repeated group list {
1620​ required int32 element;
1621​ }
1622​ }
1623​ optional group values (LIST) {
1624​ repeated group list {
1625​ required double element;
1626​ }
1627​ }
1628​ }
1629​}
1630
1631​Catalyst form:
1632​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1633
1634​17/12/07 04:24:50 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 451805 records.
1635​17/12/07 04:24:50 INFO InternalParquetRecordReader: at row 0. reading next block
1636​17/12/07 04:24:50 INFO InternalParquetRecordReader: block read in memory in 44 ms. row count = 451805
1637​17/12/07 04:24:51 INFO Executor: Finished task 148.0 in stage 29.0 (TID 6180). 2156 bytes result sent to driver
1638​17/12/07 04:24:51 INFO CoarseGrainedExecutorBackend: Got assigned task 6284
1639​17/12/07 04:24:51 INFO Executor: Running task 203.0 in stage 29.0 (TID 6284)
1640​17/12/07 04:24:51 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00166-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34217774, partition values: [empty row]
1641​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1642​17/12/07 04:24:51 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1643​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1644​17/12/07 04:24:51 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1645
1646​Parquet form:
1647​message spark_schema {
1648​ optional binary wikiid (UTF8);
1649​ optional binary query (UTF8);
1650​ required int64 norm_query_id;
1651​ optional int32 label;
1652​ optional group features {
1653​ required int32 type (INT_8);
1654​ optional int32 size;
1655​ optional group indices (LIST) {
1656​ repeated group list {
1657​ required int32 element;
1658​ }
1659​ }
1660​ optional group values (LIST) {
1661​ repeated group list {
1662​ required double element;
1663​ }
1664​ }
1665​ }
1666​}
1667
1668​Catalyst form:
1669​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1670
1671​17/12/07 04:24:51 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1672​17/12/07 04:24:51 INFO Executor: Finished task 203.0 in stage 29.0 (TID 6284). 1733 bytes result sent to driver
1673​17/12/07 04:24:51 INFO CoarseGrainedExecutorBackend: Got assigned task 6295
1674​17/12/07 04:24:51 INFO Executor: Running task 232.0 in stage 29.0 (TID 6295)
1675​17/12/07 04:24:51 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00093-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34135930, partition values: [empty row]
1676​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1677​17/12/07 04:24:51 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1678​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1679​17/12/07 04:24:51 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1680
1681​Parquet form:
1682​message spark_schema {
1683​ optional binary wikiid (UTF8);
1684​ optional binary query (UTF8);
1685​ required int64 norm_query_id;
1686​ optional int32 label;
1687​ optional group features {
1688​ required int32 type (INT_8);
1689​ optional int32 size;
1690​ optional group indices (LIST) {
1691​ repeated group list {
1692​ required int32 element;
1693​ }
1694​ }
1695​ optional group values (LIST) {
1696​ repeated group list {
1697​ required double element;
1698​ }
1699​ }
1700​ }
1701​}
1702
1703​Catalyst form:
1704​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1705
1706​17/12/07 04:24:51 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1707​17/12/07 04:24:51 INFO Executor: Finished task 232.0 in stage 29.0 (TID 6295). 1733 bytes result sent to driver
1708​17/12/07 04:24:51 INFO CoarseGrainedExecutorBackend: Got assigned task 6305
1709​17/12/07 04:24:51 INFO Executor: Running task 298.0 in stage 29.0 (TID 6305)
1710​17/12/07 04:24:51 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00194-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34016885, partition values: [empty row]
1711​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1712​17/12/07 04:24:51 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1713​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1714​17/12/07 04:24:51 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1715
1716​Parquet form:
1717​message spark_schema {
1718​ optional binary wikiid (UTF8);
1719​ optional binary query (UTF8);
1720​ required int64 norm_query_id;
1721​ optional int32 label;
1722​ optional group features {
1723​ required int32 type (INT_8);
1724​ optional int32 size;
1725​ optional group indices (LIST) {
1726​ repeated group list {
1727​ required int32 element;
1728​ }
1729​ }
1730​ optional group values (LIST) {
1731​ repeated group list {
1732​ required double element;
1733​ }
1734​ }
1735​ }
1736​}
1737
1738​Catalyst form:
1739​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1740
1741​17/12/07 04:24:51 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1742​17/12/07 04:24:51 INFO Executor: Finished task 298.0 in stage 29.0 (TID 6305). 1733 bytes result sent to driver
1743​17/12/07 04:24:51 INFO CoarseGrainedExecutorBackend: Got assigned task 6315
1744​17/12/07 04:24:51 INFO Executor: Running task 304.0 in stage 29.0 (TID 6315)
1745​17/12/07 04:24:51 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00158-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-34005072, partition values: [empty row]
1746​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1747​17/12/07 04:24:51 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1748​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1749​17/12/07 04:24:51 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1750
1751​Parquet form:
1752​message spark_schema {
1753​ optional binary wikiid (UTF8);
1754​ optional binary query (UTF8);
1755​ required int64 norm_query_id;
1756​ optional int32 label;
1757​ optional group features {
1758​ required int32 type (INT_8);
1759​ optional int32 size;
1760​ optional group indices (LIST) {
1761​ repeated group list {
1762​ required int32 element;
1763​ }
1764​ }
1765​ optional group values (LIST) {
1766​ repeated group list {
1767​ required double element;
1768​ }
1769​ }
1770​ }
1771​}
1772
1773​Catalyst form:
1774​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1775
1776​17/12/07 04:24:51 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1777​17/12/07 04:24:51 INFO Executor: Finished task 304.0 in stage 29.0 (TID 6315). 1733 bytes result sent to driver
1778​17/12/07 04:24:51 INFO CoarseGrainedExecutorBackend: Got assigned task 6323
1779​17/12/07 04:24:51 INFO Executor: Running task 310.0 in stage 29.0 (TID 6323)
1780​17/12/07 04:24:51 INFO FileScanRDD: Reading File path: hdfs://analytics-hadoop/user/ebernhardson/mjolnir/20171130/part-00043-872e5784-36c8-4e99-a525-5e33b806f1cf.snappy.parquet, range: 19599906-33991962, partition values: [empty row]
1781​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1782​17/12/07 04:24:51 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
1783​17/12/07 04:24:51 INFO FilterCompat: Filtering using predicate: noteq(norm_query_id, null)
1784​17/12/07 04:24:51 INFO ParquetReadSupport: Going to read the following fields from the Parquet file:
1785
1786​Parquet form:
1787​message spark_schema {
1788​ optional binary wikiid (UTF8);
1789​ optional binary query (UTF8);
1790​ required int64 norm_query_id;
1791​ optional int32 label;
1792​ optional group features {
1793​ required int32 type (INT_8);
1794​ optional int32 size;
1795​ optional group indices (LIST) {
1796​ repeated group list {
1797​ required int32 element;
1798​ }
1799​ }
1800​ optional group values (LIST) {
1801​ repeated group list {
1802​ required double element;
1803​ }
1804​ }
1805​ }
1806​}
1807
1808​Catalyst form:
1809​StructType(StructField(wikiid,StringType,true), StructField(query,StringType,true), StructField(norm_query_id,LongType,true), StructField(label,IntegerType,true), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true))
1810
1811​17/12/07 04:24:51 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 0 records.
1812​17/12/07 04:24:51 INFO Executor: Finished task 310.0 in stage 29.0 (TID 6323). 1733 bytes result sent to driver
1813​17/12/07 04:24:53 INFO CoarseGrainedExecutorBackend: Got assigned task 6455
1814​17/12/07 04:24:53 INFO Executor: Running task 96.0 in stage 30.0 (TID 6455)
1815​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Updating epoch to 20 and clearing cache
1816​17/12/07 04:24:53 INFO TorrentBroadcast: Started reading broadcast variable 39
1817​17/12/07 04:24:53 INFO MemoryStore: Block broadcast_39_piece0 stored as bytes in memory (estimated size 18.3 KB, free 1051.9 MB)
1818​17/12/07 04:24:53 INFO TorrentBroadcast: Reading broadcast variable 39 took 9 ms
1819​17/12/07 04:24:53 INFO MemoryStore: Block broadcast_39 stored as values in memory (estimated size 41.3 KB, free 1051.9 MB)
1820​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 18, fetching them
1821​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
1822​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Got the output locations
1823​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1824​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Started 63 remote fetches in 26 ms
1825​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 19, fetching them
1826​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@10.64.53.30:40931)
1827​17/12/07 04:24:53 INFO MapOutputTrackerWorker: Got the output locations
1828​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1829​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 1 ms
1830​17/12/07 04:24:53 INFO Executor: Finished task 96.0 in stage 30.0 (TID 6455). 4785 bytes result sent to driver
1831​17/12/07 04:24:53 INFO CoarseGrainedExecutorBackend: Got assigned task 6474
1832​17/12/07 04:24:53 INFO Executor: Running task 122.0 in stage 30.0 (TID 6474)
1833​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1834​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Started 63 remote fetches in 19 ms
1835​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1836​17/12/07 04:24:53 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 1 ms
1837​17/12/07 04:24:54 INFO Executor: Finished task 122.0 in stage 30.0 (TID 6474). 4197 bytes result sent to driver
1838​17/12/07 04:24:54 INFO CoarseGrainedExecutorBackend: Got assigned task 6511
1839​17/12/07 04:24:54 INFO Executor: Running task 152.0 in stage 30.0 (TID 6511)
1840​17/12/07 04:24:54 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1841​17/12/07 04:24:54 INFO ShuffleBlockFetcherIterator: Started 63 remote fetches in 17 ms
1842​17/12/07 04:24:54 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1843​17/12/07 04:24:54 INFO ShuffleBlockFetcherIterator: Started 3 remote fetches in 1 ms
1844​17/12/07 04:24:55 INFO Executor: Finished task 152.0 in stage 30.0 (TID 6511). 4197 bytes result sent to driver
1845​17/12/07 04:24:55 INFO CoarseGrainedExecutorBackend: Got assigned task 6535
1846​17/12/07 04:24:55 INFO Executor: Running task 156.0 in stage 30.0 (TID 6535)
1847​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1848​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Started 63 remote fetches in 12 ms
1849​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1850​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 0 ms
1851​17/12/07 04:24:55 INFO Executor: Finished task 156.0 in stage 30.0 (TID 6535). 4197 bytes result sent to driver
1852​17/12/07 04:24:55 INFO CoarseGrainedExecutorBackend: Got assigned task 6560
1853​17/12/07 04:24:55 INFO Executor: Running task 179.0 in stage 30.0 (TID 6560)
1854​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1855​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Started 63 remote fetches in 19 ms
1856​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1857​17/12/07 04:24:55 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
1858​17/12/07 04:24:56 INFO Executor: Finished task 179.0 in stage 30.0 (TID 6560). 4197 bytes result sent to driver
1859​17/12/07 04:25:00 INFO CoarseGrainedExecutorBackend: Got assigned task 6613
1860​17/12/07 04:25:00 INFO Executor: Running task 1.0 in stage 30.0 (TID 6613)
1861​17/12/07 04:25:00 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 400 blocks
1862​17/12/07 04:25:00 INFO ShuffleBlockFetcherIterator: Started 63 remote fetches in 23 ms
1863​17/12/07 04:25:00 INFO ShuffleBlockFetcherIterator: Getting 3 non-empty blocks out of 378 blocks
1864​17/12/07 04:25:00 INFO ShuffleBlockFetcherIterator: Started 2 remote fetches in 1 ms
1865​17/12/07 04:25:01 INFO Executor: Finished task 1.0 in stage 30.0 (TID 6613). 4197 bytes result sent to driver
1866​17/12/07 04:25:14 INFO CoarseGrainedExecutorBackend: Got assigned task 6625
1867​17/12/07 04:25:14 INFO Executor: Running task 0.0 in stage 35.0 (TID 6625)
1868​17/12/07 04:25:14 INFO MapOutputTrackerWorker: Updating epoch to 21 and clearing cache
1869​17/12/07 04:25:14 INFO TorrentBroadcast: Started reading broadcast variable 41
1870​17/12/07 04:25:14 INFO MemoryStore: Block broadcast_41_piece0 stored as bytes in memory (estimated size 2031.5 KB, free 1049.9 MB)
1871​17/12/07 04:25:14 INFO TorrentBroadcast: Reading broadcast variable 41 took 40 ms
1872​17/12/07 04:25:14 INFO MemoryStore: Block broadcast_41 stored as values in memory (estimated size 4.5 MB, free 1045.5 MB)
1873​17/12/07 04:25:15 INFO BlockManager: Found block rdd_73_0 locally
1874​17/12/07 04:25:15 INFO CodeGenerator: Code generated in 14.090169 ms
1875​17/12/07 04:25:15 INFO CodeGenerator: Code generated in 14.58589 ms
1876​17/12/07 04:25:15 INFO CodeGenerator: Code generated in 7.253869 ms
1877​17/12/07 04:25:15 INFO BlockManager: Found block rdd_73_0 locally
1878​17/12/07 04:25:15 INFO ResourceMonitorThread: RssAnon: 4631668 kB, RssFile: 54504 kB, RssShmem: 0 kB
1879​17/12/07 04:25:15 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1604450072(1566845K) committed = 3817865216(3728384K) max = 3817865216(3728384K)
1880​17/12/07 04:25:15 INFO ResourceMonitorThread: smapinfo mem: 4643544 size: 5546992 rss: 4698000 length: 1135
1881​17/12/07 04:25:25 INFO ResourceMonitorThread: RssAnon: 5216632 kB, RssFile: 583736 kB, RssShmem: 0 kB
1882​17/12/07 04:25:25 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2404774176(2348412K) committed = 3955228672(3862528K) max = 3955228672(3862528K)
1883​17/12/07 04:25:25 INFO ResourceMonitorThread: smapinfo mem: 5587552 size: 8316872 rss: 5915652 length: 1144
1884​[04:25:30] Tree method is selected to be 'hist', which uses a single updater grow_fast_histmaker.
1885​17/12/07 04:25:36 INFO ResourceMonitorThread: RssAnon: 6179728 kB, RssFile: 1724320 kB, RssShmem: 0 kB
1886​17/12/07 04:25:36 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1616532248(1578644K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1887​17/12/07 04:25:36 INFO ResourceMonitorThread: smapinfo mem: 6191092 size: 10012560 rss: 7941172 length: 1166
1888​[04:25:36] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 242 extra nodes, 0 pruned nodes, max_depth=7
1889​[04:25:40] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 202 extra nodes, 0 pruned nodes, max_depth=7
1890​[04:25:42] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 184 extra nodes, 0 pruned nodes, max_depth=7
1891​[04:25:45] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 196 extra nodes, 0 pruned nodes, max_depth=7
1892​17/12/07 04:25:46 INFO ResourceMonitorThread: RssAnon: 6242556 kB, RssFile: 2919316 kB, RssShmem: 0 kB
1893​17/12/07 04:25:46 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1655847848(1617038K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1894​17/12/07 04:25:46 INFO ResourceMonitorThread: smapinfo mem: 6253780 size: 11873596 rss: 9189440 length: 1171
1895​[04:25:48] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 236 extra nodes, 0 pruned nodes, max_depth=7
1896​[04:25:51] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 194 extra nodes, 0 pruned nodes, max_depth=7
1897​[04:25:54] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 190 extra nodes, 0 pruned nodes, max_depth=7
1898​17/12/07 04:25:56 INFO ResourceMonitorThread: RssAnon: 6245464 kB, RssFile: 4068244 kB, RssShmem: 0 kB
1899​17/12/07 04:25:56 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1696392656(1656633K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1900​17/12/07 04:25:56 INFO ResourceMonitorThread: smapinfo mem: 6256740 size: 11873596 rss: 10340128 length: 1171
1901​[04:25:57] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 182 extra nodes, 0 pruned nodes, max_depth=7
1902​[04:26:00] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 192 extra nodes, 0 pruned nodes, max_depth=7
1903​[04:26:03] /srv/xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 220 extra nodes, 0 pruned nodes, max_depth=7
1904​17/12/07 04:26:07 INFO ResourceMonitorThread: RssAnon: 6245728 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1905​17/12/07 04:26:07 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1737383816(1696663K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1906​17/12/07 04:26:07 INFO ResourceMonitorThread: smapinfo mem: 6257232 size: 11873596 rss: 10340684 length: 1171
1907​17/12/07 04:26:17 INFO ResourceMonitorThread: RssAnon: 6245728 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1908​17/12/07 04:26:17 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1776466032(1734830K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1909​17/12/07 04:26:17 INFO ResourceMonitorThread: smapinfo mem: 6257236 size: 11873596 rss: 10340688 length: 1171
1910​17/12/07 04:26:27 INFO ResourceMonitorThread: RssAnon: 6245876 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1911​17/12/07 04:26:27 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1815537096(1772985K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1912​17/12/07 04:26:27 INFO ResourceMonitorThread: smapinfo mem: 6257012 size: 11873596 rss: 10340464 length: 1171
1913​17/12/07 04:26:37 INFO ResourceMonitorThread: RssAnon: 6245876 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1914​17/12/07 04:26:37 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1855964928(1812465K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1915​17/12/07 04:26:37 INFO ResourceMonitorThread: smapinfo mem: 6257044 size: 11873596 rss: 10340488 length: 1171
1916​17/12/07 04:26:47 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1917​17/12/07 04:26:47 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1895047848(1850632K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1918​17/12/07 04:26:47 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1919​17/12/07 04:26:58 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1920​17/12/07 04:26:58 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1934117600(1888786K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1921​17/12/07 04:26:58 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1922​17/12/07 04:27:08 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1923​17/12/07 04:27:08 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 1974877792(1928591K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1924​17/12/07 04:27:08 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1925​17/12/07 04:27:18 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1926​17/12/07 04:27:18 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2013961496(1966759K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1927​17/12/07 04:27:18 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1928​17/12/07 04:27:28 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1929​17/12/07 04:27:28 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2053032888(2004914K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1930​17/12/07 04:27:28 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1931​17/12/07 04:27:39 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1932​17/12/07 04:27:39 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2093461272(2044395K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1933​17/12/07 04:27:39 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1934​17/12/07 04:27:49 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1935​17/12/07 04:27:49 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2132594224(2082611K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1936​17/12/07 04:27:49 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1937​17/12/07 04:27:59 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1938​17/12/07 04:27:59 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2175963248(2124964K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1939​17/12/07 04:27:59 INFO ResourceMonitorThread: smapinfo mem: 6257048 size: 11873596 rss: 10340492 length: 1171
1940​17/12/07 04:28:09 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1941​17/12/07 04:28:09 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2216390656(2164444K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1942​17/12/07 04:28:09 INFO ResourceMonitorThread: smapinfo mem: 6257052 size: 11873596 rss: 10340496 length: 1171
1943​17/12/07 04:28:10 INFO Executor: Executor is trying to kill task 0.0 in stage 35.0 (TID 6625)
1944​17/12/07 04:28:20 INFO ResourceMonitorThread: RssAnon: 6245908 kB, RssFile: 4076692 kB, RssShmem: 0 kB
1945​17/12/07 04:28:20 INFO ResourceMonitorThread: init = 1054087168(1029382K) used = 2255993464(2203118K) committed = 3975675904(3882496K) max = 3975675904(3882496K)
1946​17/12/07 04:28:20 INFO ResourceMonitorThread: smapinfo mem: 6257072 size: 11873596 rss: 10340516 length: 1171
1947​17/12/07 04:28:29 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
1948​17/12/07 04:28:29 INFO DiskBlockManager: Shutdown hook called
1949​17/12/07 04:28:29 INFO ShutdownHookManager: Shutdown hook called
1950
1951