'Where are logs in Spark on YARN?

I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution.

The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce job.

SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.2.1.jar \
./bin/spark-class org.apache.spark.deploy.yarn.Client --jar ./spark-example-1.0.0.jar \
--class SimpleApp --args yarn-standalone  --num-workers 3 --master-memory 1g \
--worker-memory 1g --worker-cores 1

where can I find the logs/stderr/stdout?

Is there someplace to set the configuration? I did find an output from console saying:

14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr

In this line, notice 1> $LOG_DIR/stdout 2> $LOG_DIR/stderr

Where can LOG_DIR be set?



Solution 1:[1]

You can access logs through the command

yarn logs -applicationId <application ID> [OPTIONS]

general options are:

  • appOwner <Application Owner> - AppOwner (assumed to be current user if not specified)
  • containerId <Container ID> - ContainerId (must be specified if node address is specified)
  • nodeAddress <Node Address> - NodeAddress in the format nodename:port (must be specified if container id is specified)

Examples:

yarn logs -applicationId application_1414530900704_0003                                      
yarn logs -applicationId application_1414530900704_0003 myuserid

// the user ids are different
yarn logs -applicationId <appid> -appOwner <userid>

Solution 2:[2]

None of the answers make it crystal clear where to look for logs ( although they do in pieces) so I am putting it together.

If log aggregation is turned on (with the yarn.log-aggregation-enable yarn-site.xml) then do this

yarn logs -applicationId <app ID>

However, if this is not turned on then one needs to go on the Data-Node machine and look at

$HADOOP_HOME/logs/userlogs/application_1474886780074_XXXX/

application_1474886780074_XXXX is the application id

Solution 3:[3]

It logs to:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

The logs are on every node that your Spark job runs on.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Liran Funaro
Solution 2 Somum
Solution 3 rado