A package that parallelizes Pandas over multiple CPU cores is modin. Read more about it at its official documentation page.
The way to go ahead would be to use ray
as the backend. It is installed via pip
as follows:
pip install modin[ray]
In case you have a preference for dask
, you can install it as:
pip install modin[dask]
To use the package, just change the import, and ~88 %
of the functions you need are available:
import modin.pandas as pd
import numpy as np
frame_data = np.random.randint(0, 100, size=(2**10, 2**8))
df = pd.DataFrame(frame_data)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…