I am using java8 with spark v2.4.1
I am trying to use UDF for look up using a Map as show below
Data :
+-----+-----+-----+
|code1|code2|code3|
+-----+-----+-----+
|1 |7 | 5 |
|2 |7 | 4 |
|3 |7 | 3 |
|4 |7 | 2 |
|5 |7 | 1 |
+-----+-----+-----+
Expected Data :
+-----+-----+-----+
|code1|code2|code3|
+-----+-----+-----+
|1 |7 |51 |
|2 |7 |41 |
|3 |7 |31 |
|4 |7 |21 |
|5 |7 |11 |
+-----+-----+-----+
Map<Integer,Integer> map= new HashMap<>();
map.put(1,11);
map.put(2,21);
map.put(3,31);
map.put(4,41);
map.put(5,51);
public static UDF2 userDefinedFunction= new UDF2<java.util.Map<Integer, Integer> ,Integer, Integer>()
{
private static final long serialVersionUID = 1L;
@Override
public Integer call(java.util.Map<Integer, Integer> map, Integer score) throws Exception {
return map.get(score);
}
};
Dataset<Row> resultDs= dataDs.withColumn("code3",
functions.callUDF("userDefinedFunction",col("code3"),lit(map) ) )
Error :
java.lang.RuntimeException: Unsupported literal type class java.util.HashMap
What is wrong here ? how to pass/handle HashMap parameter in UDFs with JavaAPI
Data :
List<String[]> stringAsList = new ArrayList<>();
stringAsList.add(new String[] { "1","7","5" });
stringAsList.add(new String[] { "2","7","4" });
stringAsList.add(new String[] { "3","7","3" });
stringAsList.add(new String[] { "4","7","2" });
stringAsList.add(new String[] { "5","7","1" });
JavaSparkContext sparkContext = new JavaSparkContext(sparkSession.sparkContext());
JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String[] row) -> RowFactory.create(row));
StructType schema = DataTypes
.createStructType(new StructField[] {
DataTypes.createStructField("code1", DataTypes.StringType, false),
DataTypes.createStructField("code2", DataTypes.StringType, false),
DataTypes.createStructField("code3", DataTypes.StringType, false)
});
Dataset<Row> dataDf= sparkSession.sqlContext().createDataFrame(rowRDD, schema).toDF();
Dataset<Row> dataDs = dataDf
.withColumn("code1", col("code1").cast(DataTypes.IntegerType))
.withColumn("code2", col("code2").cast(DataTypes.IntegerType))
.withColumn("code3", col("code3").cast(DataTypes.IntegerType))
;
See Question&Answers more detail:
os