apache spark - How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

Question

Welcome To Ask or Share your Answers For Others

apache spark - How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

import numpy as np

data = [
    (1, 1, None), 
    (1, 2, float(5)), 
    (1, 3, np.nan), 
    (1, 4, None), 
    (1, 5, float(10)), 
    (1, 6, float("nan")), 
    (1, 6, float("nan")),
]
df = spark.createDataFrame(data, ("session", "timestamp1", "id2"))

Expected output

dataframe with count of nan/null for each column

Note: The previous questions I found in stack overflow only checks for null & not nan. That's why I have created a new question.

I know I can use isnull() function in Spark to find number of Null values in Spark column but how to find Nan values in Spark dataframe?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:05:48+0000

You can use method shown here and replace isNull with isnan:

from pyspark.sql.functions import isnan, when, count, col

df.select([count(when(isnan(c), c)).alias(c) for c in df.columns]).show()
+-------+----------+---+
|session|timestamp1|id2|
+-------+----------+---+
|      0|         0|  3|
+-------+----------+---+

or

df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns]).show()
+-------+----------+---+
|session|timestamp1|id2|
+-------+----------+---+
|      0|         0|  5|
+-------+----------+---+

Categories

apache spark - How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

apache spark - How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags