Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
566 views
in Technique[技术] by (71.8m points)

python - Merge dataframes on nearest datetime / timestamp

I have two data frames as follows:

A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/22/2014","07/02/2014","01/01/2015","01/01/1991","08/02/1999"]})

B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"], "value": ["3","5","1","7","8"] })

Which look like the following:

>>> A
  ID       date
0  A 2014-06-22
1  A 2014-07-02
2  C 2015-01-01
3  B 1991-01-01
4  B 1999-08-02

>>> B
  ID       date value
0  A 2015-02-15     3
1  A 2014-06-30     5
2  C 1999-07-02     1
3  B 1990-10-05     7
4  B 2014-06-24     8

I want to merge A with the values of B using the nearest date. In this example, none of the dates match, but it could the the case that some do.

The output should be something like this:

>>> C
  ID        date value
0  A  06/22/2014     8
1  A  07/02/2014     5
2  C  01/01/2015     3
3  B  01/01/1991     7
4  B  08/02/1999     1

It seems to me that there should be a native function in pandas that would allow this.

Note: as similar question has been asked here pandas.merge: match the nearest time stamp >= the series of timestamps

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use reindex with method='nearest' and then merge:

A['date'] = pd.to_datetime(A.date)
B['date'] = pd.to_datetime(B.date)
A.sort_values('date', inplace=True)
B.sort_values('date', inplace=True)

B1 = B.set_index('date').reindex(A.set_index('date').index, method='nearest').reset_index()
print (B1)

print (pd.merge(A,B1, on='date'))
  ID_x       date ID_y value
0    B 1991-01-01    B     7
1    B 1999-08-02    C     1
2    A 2014-06-22    B     8
3    A 2014-07-02    A     5
4    C 2015-01-01    A     3

You can also add parameter suffixes:

print (pd.merge(A,B1, on='date', suffixes=('_', '')))
  ID_       date ID value
0   B 1991-01-01  B     7
1   B 1999-08-02  C     1
2   A 2014-06-22  B     8
3   A 2014-07-02  A     5
4   C 2015-01-01  A     3

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...