Situation:
Our company just purchased 20 million rows from a vendor. The vendor inserted the data from Excel into TABLE1 (RDBMS is postgresql) and unfortunately TABLE1 doesn't have a PK. My manager wants me to insert all 20 million rows into TABLE2 but with distinct records only.
EMPCODE is the only column with unique values.
TABLE2 is empty prior to insertion.
TABLE1 doesn't have a primary key and last insert date or date_updated column.
I created TABLE2 with a primary key, int NOT NULL identity primary key.
What is the correct sql script to get the correct count to get the desired results?
insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
select distinct(EMPCODE), FN, LN, STATE, TYPE from table1
or
insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
select distinct(EMPCODE, FN, LN, STATE, TYPE) from table1
or
insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
select EMPCODE, FN, LN, LAST(STATE), TYPE from table1
group by EMPCODE, FN, LN, STATE, TYPE
TABLE1
EMPCODE FirstName LastName STATE TYPE
ID111 Fred Perry CO CONSULTANT
ID111 Fred Perry UT CONSULTANT
ID111 Fred Perry AZ CONSULTANT
ID215 Tommy Hilfiger IL INTERN
ID215 Tommy Hilfiger IN INTERN
ID215 Tommy Hilfiger OH INTERN
ID215 Tommy Hilfiger NY INTERN
ID467 David Abercrombie TX REGULAR
ID467 David Abercrombie CA REGULAR
ID875 Ezra Fitch NV TERMINATED
ID875 Ezra Fitch OR TERMINATED
ID875 Ezra Fitch WA TERMINATED
ID875 Ezra Fitch UT TERMINATED
ID875 Ezra Fitch AZ TERMINATED
Desired result in TABLE2
EMPCODE FirstName LastName STATE TYPE
ID111 Fred Perry CO CONSULTANT
ID215 Tommy Hilfiger IL INTERN
ID467 David Abercrombie TX REGULAR
ID875 Ezra Fitch NV TERMINATED
The problem is that I'm unable to see all data (RAM issues and pgadmin4 very slow) to decide which sql script works best.