'Where are logs in Spark on YARN?
I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution.
The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce job.
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.2.1.jar \
./bin/spark-class org.apache.spark.deploy.yarn.Client --jar ./spark-example-1.0.0.jar \
--class SimpleApp --args yarn-standalone --num-workers 3 --master-memory 1g \
--worker-memory 1g --worker-cores 1
where can I find the logs/stderr/stdout?
Is there someplace to set the configuration? I did find an output from console saying:
14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
In this line, notice 1> $LOG_DIR/stdout 2> $LOG_DIR/stderr
Where can LOG_DIR be set?
Solution 1:[1]
You can access logs through the command
yarn logs -applicationId <application ID> [OPTIONS]
general options are:
appOwner <Application Owner>- AppOwner (assumed to be current user if not specified)containerId <Container ID>- ContainerId (must be specified if node address is specified)nodeAddress <Node Address>- NodeAddress in the formatnodename:port(must be specified if container id is specified)
Examples:
yarn logs -applicationId application_1414530900704_0003
yarn logs -applicationId application_1414530900704_0003 myuserid
// the user ids are different
yarn logs -applicationId <appid> -appOwner <userid>
Solution 2:[2]
None of the answers make it crystal clear where to look for logs ( although they do in pieces) so I am putting it together.
If log aggregation is turned on (with the yarn.log-aggregation-enable yarn-site.xml) then do this
yarn logs -applicationId <app ID>
However, if this is not turned on then one needs to go on the Data-Node machine and look at
$HADOOP_HOME/logs/userlogs/application_1474886780074_XXXX/
application_1474886780074_XXXX is the application id
Solution 3:[3]
It logs to:
/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout
The logs are on every node that your Spark job runs on.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Liran Funaro |
| Solution 2 | Somum |
| Solution 3 | rado |
