I'm attempting to access Accumulo 1.6 from an Apache Spark job (written in Java) by using an AccumuloInputFormat
with newAPIHadoopRDD
. In order to do this, I have to tell the AccumuloInputFormat
where to locate ZooKeeper by calling the setZooKeeperInstance
method. This method takes a ClientConfiguration
object which specifies various relevant properties.
I'm creating my ClientConfiguration
object by calling the static loadDefault
method. This method is supposed to look in various places for a client.conf
file to load its defaults from. One of the places it's supposed to look is $ACCUMULO_CONF_DIR/client.conf
.
Therefore, I am attempting to set the ACCUMULO_CONF_DIR
environment variable in such a way that it will be visible when Spark runs the job (for reference, I'm attempting to run in the yarn-cluster
deployment mode). I have not yet found a way to do that successfully.
So far, I've tried:
- Calling
setExecutorEnv("ACCUMULO_CONF_DIR", "/etc/accumulo/conf")
on the SparkConf
- Exporting
ACCUMULO_CONF_DIR
in spark-env.sh
- Setting
spark.executorEnv.ACCUMULO_CONF_DIR
in spark-defaults.conf
None of them have worked. When I print the environment before calling setZooKeeperInstance
, ACCUMULO_CONF_DIR
does not appear.
If it's relevant, I'm using the CDH5 versions of everything.
Here's an example of what I'm trying to do (imports and exception handling left out for brevity):
public class MySparkJob
{
public static void main(String[] args)
{
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("MySparkJob");
sparkConf.setExecutorEnv("ACcUMULO_CONF_DIR", "/etc/accumulo/conf");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
Job accumuloJob = Job.getInstance(sc.hadoopConfiguration());
// Foreach loop to print environment, shows no ACCUMULO_CONF_DIR
ClientConfiguration accumuloConfiguration = ClientConfiguration.loadDefault();
AccumuloInputFormat.setZooKeeperInstance(accumuloJob, accumuloConfiguration);
// Other calls to AccumuloInputFormat static functions to configure it properly.
JavaPairRDD<Key, Value> accumuloRDD =
sc.newAPIHadoopRDD(accumuloJob.getConfiguration(),
AccumuloInputFormat.class,
Key.class,
Value.class);
}
}
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…