I have a Spark (version 1.4.1) application on HDP 2.3. It works fine when running it in YARN-Client mode. However, when running it on YARN-Cluster mode none of my Hive tables can be found by the application.
I submit the application like so:
./bin/spark-submit
--class com.myCompany.Main
--master yarn-cluster
--num-executors 3
--driver-memory 4g
--executor-memory 10g
--executor-cores 1
--jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar /home/spark/apps/YarnClusterTest.jar
--files /etc/hive/conf/hive-site.xml
Here's a excerpt from the logs:
5/12/02 11:05:13 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/12/02 11:05:14 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/12/02 11:05:14 INFO metastore.ObjectStore: ObjectStore, initialize called
15/12/02 11:05:14 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/12/02 11:05:14 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/02 11:05:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker2.xxx.com:34697 with 5.2 GB RAM, BlockManagerId(1, worker2.xxx.com, 34697)
15/12/02 11:05:16 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/12/02 11:05:16 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
15/12/02 11:05:17 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:17 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/02 11:05:18 INFO metastore.ObjectStore: Initialized ObjectStore
15/12/02 11:05:19 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/12/02 11:05:19 INFO metastore.HiveMetaStore: Added admin role in metastore
15/12/02 11:05:19 INFO metastore.HiveMetaStore: Added public role in metastore
15/12/02 11:05:19 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
15/12/02 11:05:19 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/12/02 11:05:19 INFO parse.ParseDriver: Parsing command: SELECT * FROM streamsummary
15/12/02 11:05:20 INFO parse.ParseDriver: Parse Completed
15/12/02 11:05:20 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes.
15/12/02 11:05:20 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=streamsummary
15/12/02 11:05:20 INFO HiveMetaStore.audit: ugi=spark ip=unknown-ip-addr cmd=get_table : db=default tbl=streamsummary
15/12/02 11:05:20 DEBUG myCompany.Main$: no such table streamsummary; line 1 pos 14
I basically run into the same 'no such table' problem for any time my application needs to read from or write to the Hive tables.
Posting hive-site.xml here for reference.
<configuration>
<property>
<name>ambari.hive.db.schema.name</name>
<value>hive</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.cache.level2.type</name>
<value>none</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask.size</name>
<value>1968526677</value>
</property>
<property>
<name>hive.auto.convert.sortmerge.join</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.sortmerge.join.to.mapjoin</name>
<value>false</value>
</property>
<property>
<name>hive.cbo.enable</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>false</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.class</name>
<value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.zookeeper.connectString</name>
<value>worker1.xxx.com:2181,worker3.xxx.com:2181,worker2.xxx.com:2181</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.zookeeper.znode</name>
<value>/hive/cluster/delegation</value>
</property>
<property>
<name>hive.compactor.abortedtxn.threshold</name>
<value>1000</value>
</property>
<property>
<name>hive.compactor.check.interval</name>
<value>300L</value>
</property>
<property>
<name>hive.compactor.delta.num.threshold</name>
<value>10</value>
</property>
<property>
<name>hive.compactor.delta.pct.threshold</name>
<value>0.1f</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>0</value>
</property>
<property>
<name>hive.compactor.worker.timeout</name>
<value>86400L</value>
</property>
<property>
<name>hive.compute.query.using.stats</name>
<value>true</value>
</property>
<property>
<name>hive.conf.restricted.list</name>
<value>hive.security.authenticator.manager,hive.security.authorization.manager,hive.users.in.admin.role</value>
</property>
<property>
<name>hive.convert.join.bucket.mapjoin.tez</name>
<value>false</value>
</property>
<property>
<name>hive.default.fileformat</name>
<value>TextFile</value>
</property>
<property>
<name>hive.default.fileformat.managed</name>
<value>TextFile</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>false</value>
</property>
<property>
<name>hive.enforce.sorting</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.sortmergebucketmapjoin</name>
<value>true</value>
</property>
<property>
<name>hive.exec.compress.intermediate</name>
<value>false</value>
</property>
<property>
<name>hive.exec.compress.output</name>
<value>false</value>
</property>
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.exec.failure.hooks</name>
<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>
<property>
<name>hive.exec.max.created.files</name>
<value>100000</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions</name>
<value>5000</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions.pernode</name>
<value>2000</value>
</property>
<property>
<name>hive.exec.orc.compression.strategy</name>
<value>SPEED</value>
</property>
<property>
<name>hive.exec.orc.default.compress</name>
<value>ZLIB</value>
</property>
<property>
<name>hive.exec.orc.default.stripe.size</name>
<value>67108864</value>
</property>
<property>
<name>hive.exec.orc.encoding.strategy</name>
<value>SPEED</value>
</property>
<property>
<name>hive.exec.parallel</name>
<value>false</value>
</property>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>8</value>
</property>
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>
<property>
<name>hive.exec.pre.hooks</name>
<value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>
<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>67108864</value>
</property>
<property>
<name>hive.exec.reducers.max</name>
<value>1009</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
</property>
<property>
<name>hive.exec.submit.local.task.via.child</name>
<value>true</value>
</property>
<property>
<name>hive.exec.submitviachild</name>
<value>false</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>hive.fetch.task.aggr</name>
<value>false</value>
</property>
<property>
<name>hive.fetch.task.conversion</name>
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…