You can do it using regular expressions. Like this:
--create test table
create table test_col(year_2018 string, year_2019 string);
set hive.support.quoted.identifiers=none;
set hive.cli.print.header=true;
--test select using hard-coded pattern
select year_2018, `(year_)2019` from test_col;
OK
year_2018 year_2019
Time taken: 0.862 seconds
--test pattern parameter
set hivevar:year_param=2019;
select year_2018, `(year_)${year_param}` from test_col;
OK
year_2018 year_2019
Time taken: 0.945 seconds
--two parameters
set hivevar:year_param1=2018;
set hivevar:year_param2=2019;
select `(year_)${year_param1}`, `(year_)${year_param2}` from test_col t;
OK
year_2018 year_2019
Time taken: 0.159 seconds
--parameter contains full column_name and using more strict regexp pattern
set hivevar:year_param2=year_2019;
select `^${year_param2}$` from test_col t;
OK
year_2019
Time taken: 0.053 seconds
--select all columns using single pattern year_ and four digits
select `^year_[0-9]{4}$` from test_col t;
OK
year_2018 year_2019
Parameter should be calculated and passed to the hive script, no functions like concat(), regexp_replace are supported in the column names.
Also column aliasing does not work for columns extracted using regular expressions:
select t.number_of_incidents, `^${year_param}$` as year1 from test_t t;
throws exception:
FAILED: SemanticException [Error 10004]: Line 1:30 Invalid table alias
or column reference '^year_2018$
': (possible column names are:
number_of_incidents, year_2016, year_2017, year_2018)
I found a workaround to alias a column using union all with empty dataset, see this test:
create table test_t(number_of_incidents int, year_2016 int, year_2017 int, year_2018 int);
insert into table test_t values(15, 12, 5, 1); --insert test data
insert into table test_t values(5,10,6,18);
--parameter, can be passed from outside the script from command line
set hivevar:year_param=year_2018;
--enable regex columns and print column names
set hive.support.quoted.identifiers=none;
set hive.cli.print.header=true;
--Alias column using UNION ALL with empty dataset
select sum(number_of_incidents*year1) incidents_year1
from
(--UNION ALL with empty dataset to alias columns extracted
select 0 number_of_incidents, 0 year1 where false --returns no rows because of false condition
union all
select t.number_of_incidents, `^${year_param}$` from test_t t
)s;
Result:
OK
incidents_year1
105
Time taken: 38.003 seconds, Fetched: 1 row(s)
First query in the UNION ALL
does not affect data because it returns no rows. But it's column names become the names of the whole UNION ALL dataset and can be used in the upper query. This trick works. If you will find a better workaround to alias columns extracted using regexp, please add your solution as well.
Update:
No need in regular expressions if you can pass full column_name as a parameter. Hive substitutes variables as is (does not calculate them) before query execution. Use regexp only if you can not pass full column name for some reason and like in the original query some pattern concatenation is needed. See this test:
--parameter, can be passed from outside the script from command line
set hivevar:year_param=year_2018;
select sum(number_of_incidents*${year_param}) incidents_year1 from test_t t;
Result:
OK
incidents_year1
105
Time taken: 63.339 seconds, Fetched: 1 row(s)