'SparkSQL error: collect_set() cannot have map type data
For SparkSQL on hive, when I used named_struct in the query, it returns results:
SELECT id, collect_set(emp_info) as employee_info
FROM
(
SELECT t.id, named_struct("name", t.emp_name, "dept", t.emp_dept) AS emp_info
FROM mytable t
) a
GROUP BY id
But when I replaced named_struct with map, I get an error:
SELECT id, collect_set(emp_info) as employee_info
FROM
(
SELECT t.id, map("name", t.emp_name, "dept", t.emp_dept) AS emp_info
FROM mytable t
) a
GROUP BY id
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'collect_set(a.`emp_info`)' due to data type mismatch: collect_set() cannot have map type data; line 36 pos 27;
'Distinct
I wish to return a map of name and dept, how to use with collect_set?
FYI: This query with map runs without issues in Hive(Hue)
Solution 1:[1]
Try converting the map to json first using to_json() method, then you can collect the set in same query such as:
SELECT id, collect_set(to_json(map("name", emp_name, "dept", emp_dept))) AS emp_info
FROM mytable
GROUP BY id
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | tiff sea |
