I have a number files each segregated by date (date=yyyymmdd)
on amazon s3. The files go back 6 months but I would like to restrict my script to only use the last 3 months of data. I am unsure as to whether I will be able to use regular expressions to do something like sc.textFile("s3://path_to_dir/yyyy[m1,m2,m3]*")
where m1,m2,m3 represents the 3 months from the current date that I would like to use.
One discussion also suggested using something like sc.textFile("s3://path_to_dir/yyyym1*","s3://path_to_dir/yyyym2*","s3://path_to_dir/yyyym3*")
but that doesn't seem to work for me.
Does sc.textFile( )
take regular expressions? I know you can use glob expressions but I was unsure how to represent the above case as a glob expression?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…