Storage Providers
OpenPO is integrated with S3 and Hugging Face Hub out of the box. Use providers to easily upload and download datasets.
Usage
storage = HuggingFaceStorage(repo_id="my-dataset-repo", api_key="hf-token")
# Save data
preference = {"prompt": "text", "preferred": "response1", "rejected": "response2"}
storage.save_data(data=preference, filename="my-data.json")
# Load data
data = storage.load_data(filename="my-data.json")
print(data)
Hugging Face Storage
Initialization
from openpo.providers.huggingface import HuggingFaceStorage
storage = HuggingFaceStorage(
repo_id="username/repo_name", # The repository ID on HuggingFace
api_key="hf-token" # HuggingFace API token with write access
)
Methods
-
save_data(data: Dict[str, Any], filename: str) -> bool
- Parameters:
data
: Dictionary containing the data to savefilename
: Name of the file to save the data to
- Parameters:
-
load_data(filename: str) -> Dict[str, Any]
- Parameters:
filename
: Name of the file to load
- Parameters:
-
load_data_all() -> List[Dict[str, Any]]
S3 Storage
Initialization
from openpo.providers.s3 import S3Storage
storage = S3Storage(
region_name="us-west-2", # Optional: AWS region
aws_access_key_id="access_key", # Optional: AWS access key
aws_secret_access_key="secret_key", # Optional: AWS secret key
profile_name="default" # Optional: AWS profile name
)
Methods
-
save_data(data: List[Dict[str, Any]], bucket: str, key: str = None)
- Parameters:
data
: List of dictionaries containing the data to savebucket
: Name of the S3 bucketkey
: Object key (path) in the bucket
- Parameters:
-
load_data(bucket: str, key: str) -> List[Dict[str, Any]]
- Parameters:
bucket
: Name of the S3 bucketkey
: Object key (path) in the bucket
- Parameters:
-
load_data_all(bucket: str, limit: int) -> List[Dict[str, Any]]
- Parameters:
bucket
: Name of the S3 bucketlimit
: Maximum number of files to read
- Parameters: