Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
406 views
in Technique[技术] by (71.8m points)

r - Speedup conversion of 2 million rows of date strings to POSIX.ct

I have a csv which includes about 2 million rows of date strings in the format:

2012/11/13 21:10:00 

Lets call that csv$Date.and.Time

I want to convert these dates (and their accompanying data) to xts as fast as possible

I have written a script which performs the conversion just fine (see below), but it's terribly slow and I'd like to speed this up as much as possible.

Here is my current methodology. Does anyone have any suggestions on how to make this faster?

 dt <- as.POSIXct(csv$Date.and.Time,tz="UTC")

idx <- format(dt,tz=z,usetz=TRUE)

So the script converts these date strings to POSIX.ct. It then does a timezone conversion using format (z is a variable representing the TZ to which I am converting). I then do a regular xts call to make this an xts series with the rest of the data in the csv.

This works 100%. It's just very, very slow. I've tried running this in parallel (it doesn't do anything; if anything it makes it worse). What do I mean by 'slow'?

 user    system   elapsed 
155.246  16.430 171.650 

That's on a 3GhZ, 16GB ram 2012 mb pro. I can get about half that on a similar processor with 32GB RAM on a Win7 Machine

I'm sure someone has a better idea - I'm open to suggestions via Rcpp etc. However, ideally the solution works with the csv rather than some other method, like setting up a database. Having said that, I'm up to doing this via whatever method is going to give the fastest conversion.

I'd be super appreciative of any help at all. Thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You want the small and simple fasttime package by Simon which does this in the fastest possible way---by not calling time parsing functions but just using C-level string functions.

It does not support as many formats as strptime. In fact, it doesn't even have a format string. But well-formed ISO format variants, that is yyyy-mm-dd hh:mm:ss.fff will work, and your / separator may just work too.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...