Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

pandas - How to format columns dates in python that they are weekly based on eachother?

I have a dataframe df that looks similar to this:

identity      Start        End     week
  E         6/18/2020   7/2/2020    1
  E         6/18/2020   7/2/2020    2
 2D         7/18/2020   8/1/2020    1
 2D         7/18/2020   8/1/2020    2
 A1          9/6/2020   9/20/2020   1
 A1          9/6/2020   9/20/2020   2

The problem is that when I extracted the data I only had Start date and End date for every identity it replaced, but I have the data by weeks all identitys have the same amount of weeks some times all identitys can have 5 or 6 weeks but they are always the same. I want to make Stata and end be weekly so when the first week end I add 7 days. And when the week starts again it starts where week ended. A representation would be

identity      Start        End     week
   E       6/18/2020    6/25/2020   1
   E       6/25/2020    7/2/2020    2
  2D       7/18/2020    7/25/2020   1
  2D       7/25/2020    8/1/2020    2
  A1        9/6/2020    9/13/2020   1
  A1       9/13/2020    9/20/2020   2

I tried a simple method that was creating a sevens column and making the sum to get the end of the week I get and error Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq Then I would concat start over minus seven but I don't know how to get around this problem. Any help would be magnificent.

question from:https://stackoverflow.com/questions/65830784/how-to-format-columns-dates-in-python-that-they-are-weekly-based-on-eachother

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Similar to your other question:

First convert to datetimes:

df.loc[:, ["Start", "End"]] = (df.loc[:, ["Start", "End"]]
                                 .transform(pd.to_datetime, format="%m/%d/%Y"))

df

identity    Start   End     week
0   E   2020-06-18  2020-07-02  1
1   E   2020-06-18  2020-07-02  2
2   2D  2020-07-18  2020-08-01  1
3   2D  2020-07-18  2020-08-01  2
4   A1  2020-09-06  2020-09-20  1
5   A1  2020-09-06  2020-09-20  2

Your identity is in groups of two, so I'll use that when selecting dates from the date_range:

 from itertools import chain

result = df.drop_duplicates(subset="identity")

date_range = (
    pd.date_range(start, end, freq="7D")[:2]
    for start, end in zip(result.Start, result.End)
)

date_range = chain.from_iterable(date_range)
End = lambda df: df.Start.add(pd.Timedelta("7 days"))

Create new dataframe:

df.assign(Start=list(date_range), End=End)

    identity    Start   End     week
0   E   2020-06-18  2020-06-25  1
1   E   2020-06-25  2020-07-02  2
2   2D  2020-07-18  2020-07-25  1
3   2D  2020-07-25  2020-08-01  2
4   A1  2020-09-06  2020-09-13  1
5   A1  2020-09-13  2020-09-20  2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...