Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
947 views
in Technique[技术] by (71.8m points)

scala - Joining two dataframes without a common column

I have two dataframes which has different types of columns. I need to join those two different dataframe. Please refer the below example

val df1 has
Customer_name 
Customer_phone
Customer_age

val df2 has
Order_name
Order_ID

These two dataframe doesn't have any common column. Number of rows and Number of columns in the two dataframes also differs. I tried to insert a new dummy column to increase the row_index value as below val dfr=df1.withColumn("row_index",monotonically_increasing_id()).

But as i am using Spark 2, monotonically_increasing_id method is not supported. Is there any way to join two dataframe, so that I can create the value of two dataframe in a single sheet of excel file.

For example

val df1:
Customer_name  Customer_phone  Customer_age
karti           9685684551     24      
raja            8595456552     22

val df2:
Order_name Order_ID
watch       1
cattoy     2

My final excel sheet should be like this:

Customer_name  Customer_phone  Customer_age   Order_name  Order_ID

karti          9685684551      24             watch        1
   
raja           8595456552      22             cattoy      2
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

add an index column to both dataframe using the below code

df1.withColumn("id1",monotonicallyIncreasingId)
df2.withColumn("id2",monotonicallyIncreasingId)

then join both the dataframes using the below code and drop the index column

df1.join(df2,col("id1")===col("id2"),"inner")
   .drop("id1","id2")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...