Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
662 views
in Technique[技术] by (71.8m points)

hadoop - Can Hive recursively descend into subdirectories without partitions or editing hive-site.xml?

I have some web server logs that I'd like to query with Hive. The directory structure, in HDFS, looks like this:

/data/access/web1/2014/09
/data/access/web1/2014/09/access-20140901.log
[... etc ...]
/data/access/web1/2014/10
/data/access/web1/2014/10/access-20141001.log
[... etc ...]
/data/access/web2/2014/09
/data/access/web2/2014/09/access-20140901.log
[... etc ...]
/data/access/web2/2014/10
/data/access/web2/2014/10/access-20141001.log
[... etc ...]

I'm able to create an external table:

CREATE EXTERNAL TABLE access(
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*") ([^ "]*|"[^"]*"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s")
LOCATION '/data/access/'

... though Hive doesn't descend into the subfolders unless I run the following commands before running the Hive query:

set hive.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;
set hive.supports.subdirectories=true;
set mapred.input.dir.recursive=true;

I've seen other posts set these properties at the table-level (e.g. Issue creating Hive External table using tblproperties):

TBLPROPERTIES ("hive.input.dir.recursive" = "TRUE", 
    "hive.mapred.supports.subdirectories" = "TRUE",
    "hive.supports.subdirectories" = "TRUE", 
    "mapred.input.dir.recursive" = "TRUE");

Unfortunately, this didn't work for me: the table doesn't return any records when I query it. I understand it's possible to set these properties in hive-site.xml, but I'd rather not make any changes that might impact other users if I don't need to.

Q) is there a way to create a table that descends into the subdirectories without using partitions, making site-wide changes, or running those 4 commands every time?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using Hive in HDInsight, I set the following properties before I create my external table in the Hive query and it works for me.

SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...