I have to insert values available in DataFrame1 into one of the column with empty values with DataFrame2. Basically updating column in DataFrame2.
Both DataFrames have 2 common columns.
Is there a way to do same using Java? Or there can be different approach?
Sample Input :
1) File1.csv
BILL_ID,BILL_NBR_TYPE_CD,BILL_NBR,VERSION,PRIM_SW
0501841898,BIN ,404154,1000,Y
0681220958,BIN ,735332,1000,Y
5992410180,BIN ,454680,1000,Y
6995270884,SREBIN ,1000252750295575,1000,Y
Here BILL_ID
is system id and BILL_NBR
is external id.
2) File2.csv
TXN_ID,TXN_TYPE,BILL_ID,BILL_NBR_TYPE_CD,BILL_NBR
01234, ABC ," ",BIN ,404154
22365, XYZ ," ",BIN ,735332
45890, LKJ ," ",BIN ,454680
23456, MPK ," ",SREBIN ,1000252750295575
Sample Output
As shown below BILL_ID
value should be populated in File2.csv
01234, ABC ,501841898,BIN ,404154
22365, XYZ ,681220958,BIN ,735332
45890, LKJ ,5992410180,BIN ,454680
23456, MPK ,6995270884,SREBIN ,1000252750295575
I have created two DataFrames and loaded both file's data into it, now I am not sure how to proceed.
EDIT
Basically I want clarity on below three steps:
- how to get BILL_NBR and BILL_NBR_TYPE_CD values from File2.csv?
For this step I have written : file2Df.select("BILL_NBR_TYPE_CD","BILL_NBR");
How to get BILL_ID values from File1.csv based on the values retrieved in step1 ?
How to update BILL_ID values accordingly in File2.csv ?
I am new to spark and I would appreciate if someone can give pointers.
See Question&Answers more detail:
os