Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
874 views
in Technique[技术] by (71.8m points)

group by - SQL to get 3 adjacent actions without duplicate from the flags

I have a question that a little bit similar with question#66044663 but more complicated.

Here's my dummy data.

enter image description here

I want to get 3 adjacent actions(no duplicate) from the flag by each user.

Here's the chart to describe my thought.

enter image description here

Here's what I want:

enter image description here

How can I implement SQL(I use Google Bigquery)? I know the function LAG could be a solution but I have no idea how to avoid the duplicate actions.

Hope someone can light me up. Thanks a million!

Here's the code for generating the dataset.

WITH
src_table AS (
SELECT 'Jack' AS User, 1 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Jack' AS User, 2 AS Sequence, 'Work' AS Action, '' AS Flag UNION ALL
SELECT 'Jack' AS User, 3 AS Sequence, 'Sleep' AS Action, 'Flag A' AS Flag UNION ALL
SELECT 'Jack' AS User, 4 AS Sequence, 'Exercise' AS Action, 'Flag B' AS Flag UNION ALL
SELECT 'Kenny' AS User, 1 AS Sequence, 'Run' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 2 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 3 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 4 AS Sequence, 'Work' AS Action, 'Flag C' AS Flag UNION ALL
SELECT 'Kenny' AS User, 5 AS Sequence, 'Work' AS Action, 'Flag D' AS Flag UNION ALL
SELECT 'May' AS User, 1 AS Sequence, 'Work' AS Action, 'Flag A' AS Flag
)
question from:https://stackoverflow.com/questions/66060358/sql-to-get-3-adjacent-actions-without-duplicate-from-the-flags

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Consider below

select user, actions.action_sequence, flag  from (
  select *, (
    select as struct count(1) actions_count,
      string_agg(action, ' >> ' order by grp) action_sequence
    from (
      select action, grp from t.arr group by action, grp
    )) actions
  from (
    select *, array_agg(struct(action, grp)) 
      over(partition by user order by grp desc range between current row and 2 following) arr
    from (
      select *, countif(change) over(partition by user order by sequence) grp
      from (
        select *, action != lag(action) over(partition by user order by sequence) change
        from src_table
      )
    )
  ) t
)
where flag != '' 
and actions.actions_count = 3
# order by user, sequence

If to apply to sample data in your question - output is

enter image description here

NOTE: above solution works for any number of adjacent actions (no duplicate) - you just need to change it (2 and 3) in two respective places

over(partition by user order by grp desc range between current row and 2 following) arr    

and

and actions.actions_count = 3   

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...