The spark context is thread safe, so it's possible to call it from many threads in parallel. (I am doing it in production)
One thing to be aware of, is to limit the number of thread you have running, because:
1. the executor memory is shared between all threads, and you might get OOM or constantly swap in and out memory from the cache
2. the cpu is limited, so having more tasks than core won't have any improvement
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…