Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

SQL Server: Duplicates but based on specific criteria

I am trying to find duplicates based on forename, surname, and dateofbirth in my database. Below is what I have

Customers table:

custid  cust_refno  forename surname dateofbirth
1           10        David   John     10-02-1980
2           20        Peter   Broad    15-08-1978
3           30        Sarah   Holly    16-09-1982
4           40        Mathew  Mark     25-08-2001
5           50        Matt    Mark     25-08-2001

Address table:

addid cust_refno addresstype   line1 
1       10         address     No. 10, Mineview Road
2       10         address     No. 20, Mineview Lane
3       20         address     Rockview cottage, blackthorn
4       30         mobile      0504135864
5       40         address     No. 64, New Lane 
6       40         mobile      0504896532
7       50         address     No. 11, John's cottage 

Some customers have multiple addresses, so they are not duplicates. I am trying to find a way to avoid displaying those as duplicates. Can you advice how I can do that?

my query:

SELECT DISTINCT t.FORENAME, t.SURNAME, t.CUST_REFNO, t.DATE_OF_BIRTH , a.LINE1 FROM CUSTOMERS AS t 
LEFT OUTER JOIN dbo.ADDRESS a
ON t.CUST_REFNO = a.CUST_REFNO
    INNER JOIN (
        SELECT FORENAME, surname, DTTM_OF_BIRTH
        FROM CUSTOMERS GROUP BY FORENAME, SURNAME, DATE_OF_BIRTH
        HAVING COUNT(*) > 1) AS td 
    ON t.FORENAME = td.FORENAME AND t.DTTM_OF_BIRTH = td.DATE_OF_BIRTH
        AND t.SURNAME = td.SURNAME 
WHERE a.addresstype = 'address'

my result is:

Forename surname cust_refno dateofbirth line1 
David    John       10       10-02-1980  No. 10, Mineview Road
David    John       10       10-02-1980  No. 20, Mineview Lane

But in reality it is not a duplicate. Its just that the addresses are different. Is there a way to compare the cust_refno and see if it already exists so even if the address is different if the cust_refno is the same it does not show again?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use window functions to filter out customers with more than one address. Then aggregation can be used to return the duplicates:

select forename, surname, dateofbirth
from customers c join
     (select a.*,
             count(*) over (partition by cust_refno) as cnt
      from addresses a
      where addresstype = 'address'
     ) a
     on c.cust_refno = a.cust_refno
where cnt = 1
group by forename, surname, dateofbirth
having count(*) > 1;

If you want the full customer record, just use window functions twice:

select c.*
from (select c.*,
             count(*) over (partition by forename, surname, dateofbirth) as cnt
      from customers c 
     ) c join
     (select a.*,
             count(*) over (partition by cust_refno) as cnt
      from addresses a
      where addresstype = 'address'
     ) a
     on c.cust_refno = a.cust_refno
where a.cnt = 1 and c.cnt > 1;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...