Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

pyspark - Spark Exit Status 134. What does it mean

I get the following failed error for some of my tasks when running my job. But the job finishes successfully on the whole and exits. What does this mean? Can I trust the results?

ExecutorLostFailure (executor 8 exited caused by one of the running tasks) Reason: Container from a bad node: container_1610292825631_0097_01_000013 on host: ip-xx-xxx-xx-xx.us.aws.xxxx.com. Exit status: 134. Diagnostics: e 44.0 (TID 16633)

Container exited with a non-zero exit code 134. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 16507 Aborted
Last 4096 bytes of stderr :
 task 422.0 in stage 44.0 (TID 16633)
21/01/25 17:25:50 INFO ShuffleBlockFetcherIterator: Getting 56 non-empty blocks including 12 local blocks and 44 remote blocks
21/01/25 17:25:50 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 2 ms
21/01/25 17:25:50 INFO Executor: Finished task 422.0 in stage 44.0 (TID 16633). 6435 bytes result sent to driver
21/01/25 17:25:50 INFO CoarseGrainedExecutorBackend: Got assigned task 16639
21/01/25 17:25:50 INFO Executor: Running task 433.0 in stage 44.0 (TID 16639)
21/01/25 17:25:50 INFO ShuffleBlockFetcherIterator: Getting 95 non-empty blocks including 9 local blocks and 86 remote blocks
21/01/25 17:25:50 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 1 ms
21/01/25 17:25:51 INFO Executor: Finished task 383.0 in stage 44.0 (TID 16579). 6478 bytes result sent to driver
21/01/25 17:25:51 INFO CoarseGrainedExecutorBackend: Got assigned task 16661
21/01/25 17:25:51 INFO Executor: Running task 471.0 in stage 44.0 (TID 16661)
21/01/25 17:25:51 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks including 30 local blocks and 170 remote blocks
21/01/25 17:25:51 INFO ShuffleBlockFetcherIterator: Started 6 remote fetches in 1 ms
21/01/25 17:25:52 INFO Executor: Finished task 319.0 in stage 44.0 (TID 16555). 6478 bytes result sent to driver
21/01/25 17:25:52 INFO CoarseGrainedExecutorBackend: Got assigned task 16675
21/01/25 17:25:52 INFO Executor: Running task 482.0 in stage 44.0 (TID 16675)
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Getting 25 non-empty blocks including 5 local blocks and 20 remote blocks
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 1 ms
21/01/25 17:25:52 INFO Executor: Finished task 482.0 in stage 44.0 (TID 16675). 6435 bytes result sent to driver
21/01/25 17:25:52 INFO CoarseGrainedExecutorBackend: Got assigned task 16679
21/01/25 17:25:52 INFO Executor: Running task 491.0 in stage 44.0 (TID 16679)
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Getting 138 non-empty blocks including 19 local blocks and 119 remote blocks
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 1 ms
21/01/25 17:25:52 INFO Executor: Finished task 433.0 in stage 44.0 (TID 16639). 6521 bytes result sent to driver
21/01/25 17:25:52 INFO CoarseGrainedExecutorBackend: Got assigned task 16684
21/01/25 17:25:52 INFO Executor: Running task 493.0 in stage 44.0 (TID 16684)
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Getting 190 non-empty blocks including 29 local blocks and 161 remote blocks
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 1 ms
21/01/25 17:25:52 INFO Executor: Finished task 491.0 in stage 44.0 (TID 16679). 6435 bytes result sent to driver
21/01/25 17:25:52 INFO CoarseGrainedExecutorBackend: Got assigned task 16685
21/01/25 17:25:52 INFO Executor: Running task 500.0 in stage 44.0 (TID 16685)
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Getting 51 non-empty blocks including 12 local blocks and 39 remote blocks
21/01/25 17:25:52 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 1 ms
21/01/25 17:25:54 INFO Executor: Finished task 500.0 in stage 44.0 (TID 16685). 6478 bytes result sent to driver
21/01/25 17:25:54 INFO CoarseGrainedExecutorBackend: Got assigned task 16714
21/01/25 17:25:54 INFO Executor: Running task 524.0 in stage 44.0 (TID 16714)
21/01/25 17:25:54 INFO ShuffleBlockFetcherIterator: Getting 114 non-empty blocks including 17 local blocks and 97 remote blocks
21/01/25 17:25:54 INFO ShuffleBlockFetcherIterator: Started 7 remote fetches in 1 ms
21/01/25 17:25:59 INFO Executor: Finished task 471.0 in stage 44.0 (TID 16661). 6478 bytes result sent to driver
21/01/25 17:25:59 INFO CoarseGrainedExecutorBackend: Got assigned task 16767
21/01/25 17:25:59 INFO Executor: Running task 536.0 in stage 44.0 (TID 16767)
21/01/25 17:25:59 INFO ShuffleBlockFetcherIterator: Getting 110 non-empty blocks including 16 local blocks and 94 remote blocks
21/01/25 17:25:59 INFO ShuffleBlockFetcherIterator: Started 5 remote fetches in 1 ms
question from:https://stackoverflow.com/questions/65889696/spark-exit-status-134-what-does-it-mean

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

TL;DR You can trust the results.

Spark has in-built support to retry the failed tasks on other available nodes to support fault tolerance. Your failed job would have been retried on other node/executor and that result is included in your final result. So, yes, you can trust the result.

Regarding the error, the exit status 134 indicates recieving a SIGABORT signal for exit. As it says in the error message, this was probably because the container was launched on a blacklisted node (bad node). Blacklisted nodes are nodes marked unfit by YARN for running containers.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...