Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

apache spark sql - In pyspark, how do you add/concat a string to a column?

I would like to add a string to an existing column. For example, df['col1'] has values as '1', '2', '3' etc and I would like to concat string '000' on the left of col1 so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'.

I thought I should use df.withColumn('col1', '000'+df['col1']) but of course it does not work since pyspark dataframe are immutable?

This should be an easy task but i didn't find anything online. Hope someone can give me some help!

Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
from pyspark.sql.functions import concat, col, lit


df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5)
+------------------------------+
|concat(firstname,  , lastname)|
+------------------------------+
|                Emanuel Panton|
|              Eloisa Cayouette|
|                   Cathi Prins|
|             Mitchel Mozdzierz|
|               Angla Hartzheim|
+------------------------------+
only showing top 5 rows

http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...