As using batchnormalization in my neural network model, is tuning the momentum make a really huge difference? As from what I know, smaller batch_size should use higher value for momentum and lower value for larger batch_size. High momentum will have more 'lag' and slow learnig result and so on. How is it gonna effect the performance tho?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…