Much has been written here about developing a workflow in R for statistical projects. The most popular workflow seems to be Josh Reich's LCFD model. With a main.R
containing code:
source('load.R')
source('clean.R')
source('func.R')
source('do.R')
so that a single source('main.R')
runs the entire project.
Q: Is there a reason to prefer this workflow to one in which the line-by-line interpretive work done in load.R
, clean.R
, and do.R
is replaced by functions which are called by main.R
?
I can't find the link now, but I had read somewhere on SO that when programming in R one must get over their desire to write everything in terms of function calls---that R was MEANT to be written is this line-by-line interpretive form.
Q: Really? Why?
I've been frustrated with the LCFD approach and am going to probably write everything in terms of function calls. But before doing this, I'd like to hear from the good folks of SO as to whether this is a good idea or not.
EDIT: The project I'm working on right now is to (1) read in a set of financial data, (2) clean it (quite involved), (3) Estimate some quantity associated with the data using my estimator (4) Estimate that same quantity using traditional estimators (5) Report results. My programs should be written in such a way that it's a cinch to do the work (1) for different empirical data sets, (2) for simulation data, or (3) using different estimators. ALSO, it should follow literate programming and reproducible research guidelines so that it's simple for a newcomer to the code to run the program, understand what's going on, and how to tweak it.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…