I am planning to write a query that pull relevant rows using the `in` keyword where the list is pretty long, like 20 different elements, what is even worse, I need to run this query against many different tables.
The query will look like this:
create table mytable as select * from ( select * from table1 where mycolumn in (value1, value2, value3, value4 ..) union all select * from table2 where mycolumn in (value1, value2, value3, value4 ..) union all select * from tablen where mycolumn in (value1, value2, value3, value4 ..) union all ... ) unionresult
Clearly, a professional developer will start thinking about how to optimize the query to remove the highly repetitive syntax. Clearly, there is a feature in Hive called “variable substitution” that will help me.
set mylist = (value1, value2, value3, value4 ..)
create table mytable as select * from ( select * from table1 where mycolumn in ${hiveconf:mylist} union all select * from table2 where mycolumn in ${hiveconf:mylist} union all select * from tablen where mycolumn in ${hiveconf:mylist} union all ... ) unionresult
Clearly, this approach will help whenever you think you need a variable. 🙂