When trying to load data from JDBC(Oracle) into Spark, there seems to be precision loss in the decimal field, as per my understanding Spark supports DECIMAL(38,18) .
(当尝试将数据从JDBC(Oracle)加载到Spark中时,根据我的理解,Spark支持DECIMAL(38,18) ,因此在十进制字段中似乎存在精度损失。)
The field from the Oracle is DECIMAL(15,14), whereas Spark rounds off the last four digits making it a precision of DECIMAL(15,10). (Oracle的字段为DECIMAL(15,14),而Spark舍入最后四位数字,使其精度为DECIMAL(15,10)。)
This is happening to only one field in the dataframe whereas in the same query another field populates the right schema. (这仅发生在数据框中的一个字段,而在同一查询中,另一个字段填充了正确的模式。)
Tried to pass the spark.sql.decimalOperations.allowPrecisionLoss=false
conf in the Spark-submit though didn't get the desired results.
(试图在Spark提交中传递spark.sql.decimalOperations.allowPrecisionLoss=false
conf,尽管没有获得期望的结果。)
jdbcDF = spark.read
.format("jdbc")
.option("url", "ORACLE")
.option("dbtable", "QUERY")
.option("user", "USERNAME")
.option("password", "PASSWORD")
.load()
So considering that the Spark infers the schema from a sample records, how does this work here?
(因此,考虑到Spark从样本记录中推断出架构,这在这里如何工作?)
Does it use the results of the query ie (SELECT * FROM TABLE_NAME JOIN ...) or does it take a different route to guess the schema for itself? (它是否使用查询结果,即(SELECT * FROM TABLE_NAME JOIN ...),还是采用不同的路径来猜测其自身的模式?)
Can someone throw some light on this and advise how to achieve the right decimal precision on this regards without manipulating the query as doing a CAST on the query does solve the issue, but would prefer to get some alternatives. (有人可以对此进行一些说明,并建议如何在不处理查询的情况下就此方面实现正确的小数精度吗,因为对查询执行CAST确实可以解决问题,但还是希望找到其他选择。)
ask by Joby translate from so 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…