Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

r - finding overlapping time between start time and end time of individuals in a group

I have

     household       person     start time   end time
          1           1          07:45:00    21:45:00
          1           2          09:45:00    17:45:00
          1           3          22:45:00    23:45:00
          1           4          08:45:00    01:45:00
          1           1          06:45:00    19:45:00
          2           1          07:45:00    21:45:00
          2           2          016:45:00   22:45:00

I want to find a column to find overlapping time between family members.

I need that column to be index of a person or persons who has/have time intersection with another one.

In the above example first family, the time of first, second and forth persons have intersection.

output:

      household       person     start time   end time      overlap
          1           1          07:45:00    21:45:00           2,4
          1           2          09:45:00    17:45:00           1,4
          1           3          22:45:00    23:45:00            NA
          1           4          08:45:00    01:45:00           1,2
          1           1          18:45:00    19:45:00            NA     
          2           1          07:45:00    21:45:00            2
          2           2          016:45:00   22:45:00            1

NA means no intersection with other family member it can be 0 or whatever

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Left join the input DF to itself joining on other persons in the same household and on the overlap condition. Then group by row concatenating the matched persons into a comma separated string.

In the absence of an explanation of what constitutes overlap we try three different definitions of overlap. The third is the closest to the output shown in the question.

  1. if end_time < start_time then everything before end_time and after start_time are in the interval to be checked for overlap. The overlap condition then decomposes into 4 cases according to whether the left and right hand sides of the join satisfy this or not.

  2. if start_time > end_time on either the left or right hand side then we regard the two as not overlapping

  3. If end_time > start_time then reverse them and perform overlap as before.

First overlap definition of overlap

library(sqldf)

sqldf("select a.*, group_concat(distinct b.person) as overlap
  from DF a
  left join DF b 
    on a.household = b.household and 
       a.person != b.person and
       (case 
          when a.start_time <= a.end_time and b.start_time <= b.end_time then 
               (a.start_time between b.start_time and b.end_time or
               b.start_time between a.start_time and a.end_time)
          when a.start_time <= a.end_time and b.start_time > b.end_time then
               not (a.start_time between b.end_time and b.start_time and
               a.end_time between b.end_time and b.start_time)
          when a.start_time > a.end_time and b.start_time <= b.end_time then
               not (b.start_time between a.end_time and a.start_time and
               b.end_time between a.end_time and a.start_time)
          else 1 end)
  group by a.rowid")

giving:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00       2
2         1      2   09:45:00 17:45:00     1,4
3         1      3   22:45:00 23:45:00       4
4         1      4   08:45:00 01:45:00     2,3
5         1      1   06:45:00 19:45:00       2
6         2      1   07:45:00 21:45:00       2
7         2      2  016:45:00 22:45:00       1

Second overlap definition of overlap

library(sqldf)

sqldf("select a.*, group_concat(distinct b.person) as overlap
  from DF a
  left join DF b 
    on a.household = b.household and 
       a.person != b.person and              
       (case
          when a.start_time <= a.end_time and b.start_time <= b.end_time then
               (a.start_time between b.start_time and b.end_time or
               b.start_time between a.start_time and a.end_time)
          else 0 end)
  group by a.rowid")

giving:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00       2
2         1      2   09:45:00 17:45:00       1
3         1      3   22:45:00 23:45:00    <NA>
4         1      4   08:45:00 01:45:00    <NA>
5         1      1   06:45:00 19:45:00       2
6         2      1   07:45:00 21:45:00       2
7         2      2  016:45:00 22:45:00       1

Third definition of overlap

sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
  select rowid, *, 
    min(start_time, end_time) as st,
    max(start_time, end_time) as en
  from DF)

  select a.household, a.person, a.start_time, a.end_time, 
      group_concat(distinct b.person) as overlap
    from DF2 a
    left join DF2 b 
      on a.household = b.household and 
         a.person != b.person and                  
         (a.st between b.st and b.en or
          b.st between a.st and a.en)
    group by a.rowid")

giving:

  household person start_time end_time overlap
1         1      1   07:45:00 21:45:00     2,4
2         1      2   09:45:00 17:45:00       1
3         1      3   22:45:00 23:45:00    <NA>
4         1      4   08:45:00 01:45:00       1
5         1      1   06:45:00 19:45:00     2,4
6         2      1   07:45:00 21:45:00       2
7         2      2   16:45:00 22:45:00       1

Note

We assume that the input DF in reproducible form is:

DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L, 
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00", 
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"), 
    end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00", 
    "19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA, 
-7L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...