[D] How to load subset of large Oracle table into Dask dataframe?
Here’s what I tried:
dask_rf = dd.from_pandas(pd.read_sql('select ...)', conn_cx_Oracle), npartitions = 10)
This gives me a ‘large object’ warning and recommends using client.scatter. Problem is that it appears that client.scatter requires data to be loaded into a Pandas dataframe first, which is why I’m using Dask in the first place because of RAM limitations.
The Oracle table is too large to read using Dask’s read_sql_table because read_sql_table does not filter the table in any way.
Ideas? Dask not applicable to my use case?
submitted by /u/Professionalsimracin
[link] [comments]