A Table is a Runhouse primitive used for abstracting a particular tabular data storage configuration.
Constructs a Table object, which can be used to interact with the table at the given path.
data – Data to be stored in the table.
name (Optional[str]) – Name for the table, to reuse it later on.
path (Optional[str]) – Full path to the data file.
system (Optional[str]) – File system. Currently this must be one of:
[file
, github
, sftp
, ssh
, s3
, gs
, azure
].
data_config (Optional[dict]) – The data config to pass to the underlying fsspec handler.
partition_cols (Optional[list]) – List of columns to partition the table by.
mkdir (bool) – Whether to create a remote folder for the table. (Default: False
)
dryrun (bool) – Whether to create the Table if it doesn’t exist, or load a Table object as a dryrun.
(Default: False
)
stream_format (Optional[str]) – Format to stream the Table as.
Currently this must be one of: [pyarrow
, torch
, tf
, pandas
]
metadata (Optional[dict]) – Metadata to store for the table.
The resulting Table object.
Example
>>> import runhouse as rh
>>> # Create and save (pandas) table
>>> rh.table(
>>> data=data,
>>> name="~/my_test_pandas_table",
>>> path="table_tests/test_pandas_table.parquet",
>>> system="file",
>>> mkdir=True,
>>> ).save()
>>>
>>> # Load table from above
>>> reloaded_table = rh.table(name="~/my_test_pandas_table")
- __init__(path: str, name: str | None = None, file_name: str | None = None, system: str | None = None, data_config: dict | None = None, dryrun: bool = False, partition_cols: List | None = None, stream_format: str | None = None, metadata: Dict | None = None, **kwargs)[source]
The Runhouse Table object.
Note
To build a Table, please use the factory method
table()
.
Get the table data. If data is not already cached, return a Ray dataset.
With the dataset object we can stream or convert to other types, for example:
data.iter_batches()
data.to_pandas()
data.to_dask()
Whether the table exists in file system.
Example
>>> table.exists_in_system()
Returns the complete table contents.
Example
>>> table = rh.table(data)
>>> fomratted_data = table.fetch()
Read a table from it’s path.
Example
>>> table = rh.table(path="path/to/table")
>>> table_data = table.read_table_from_file()
Delete table, including its partitioned files where relevant.
Example
>>> table = rh.table(path="path/to/table")
>>> table.rm()
Return a local batched iterator over the ray dataset.
Example
>>> table = rh.table(data)
>>> batches = table.stream(batch_size=4)
>>> for _, batch in batches:
>>> print(batch)
Copy and return the table on the given filesystem and path.
Example
>>> local_table = rh.table(data, path="local/path")
>>> s3_table = local_table.to("s3")
>>> cluster_table = local_table.to(my_cluster)
Write underlying table data to fsspec URL.
Example
>>> rh.table(data, path="path/to/write").write()