While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number of features in your data). My questions (for RandomForestRegressor) are:
1) What does max_features correspond to (m or p or something else)?
2) Are m variables selected at random from max_features variables (what is the value of m)?
3) If max_features corresponds to m, then why would I want to set it equal to p for regression (the default)? Where is the randomness with this setting (i.e., how is it different from bagging)?
Thanks.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…