I have a dataframe with the folloing schema:
root
|-- distanceValue: integer (nullable = true)
|-- timeOfMeasurement: timestamp (nullable = true)
|-- EventProcessedUtcTime: timestamp (nullable = true)
|-- latency: interval (nullable = true)
And the dataframe looks something like this:
distance |timeOfMeasurement |EventProcessedUtcTime |latency
---------+----------------------------+----------------------------+---------------------------------
15 |2021-01-04T07:07:45.098+0000|2021-01-04T07:07:45.676+0000|{"months": 0, "days": 0, "microseconds": 578885}
26 |2021-01-04T07:07:46.098+0000|2021-01-04T07:07:46.301+0000|{"months": 0, "days": 0, "microseconds": 203909}
23 |2021-01-04T07:07:47.113+0000|2021-01-04T07:07:47.353+0000|{"months": 0, "days": 0, "microseconds": 240287}
When trying to compare the distance with the distance from the previous row
import pandas as pd
df['same'] = df.distance.eq(df.distance.shift())
# OR
import numpy as np
df['same'] = np.where(df.distance == df.distance.shift())
I get an error saying: Could not parse datatype: interval
The distance however is integer... Is the value getting mixed up with the latency which is an interval?
Thank you for any help
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…