osmo dataset#
usage: osmo dataset [-h]
{info,upload,delete,download,update,recollect,list,tag,label,metadata,rename,query,collect,inspect,checksum,migrate}
...
Positional Arguments#
- command#
Possible choices: info, upload, delete, download, update, recollect, list, tag, label, metadata, rename, query, collect, inspect, checksum, migrate
Sub-commands#
info#
Provide details of the dataset/collection
osmo dataset info [-h] [--all] [--count COUNT] [--order {asc,desc}]
[--format-type {json,text}]
name
Positional Arguments#
- name#
Dataset name. Specify bucket with [bucket/]DS.
Named Arguments#
- --all, -a#
Display all versions in any state.
Default:
False- --count, -c#
For Datasets. Display the given number of versions. Default 100.
Default:
100- --order, -o#
Possible choices: asc, desc
For Datasets. Display in the given order based on date created
Default:
'asc'- --format-type, -t#
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
Ex. osmo dataset info DS1 –format-type json
upload#
Upload a new Dataset/Collection
osmo dataset upload [-h] [--desc DESCRIPTION]
[--metadata METADATA [METADATA ...]]
[--labels LABELS [LABELS ...]] [--regex REGEX] [--resume]
[--processes PROCESSES] [--threads THREADS]
[--benchmark-out BENCHMARK_OUT]
name path [path ...]
Positional Arguments#
Named Arguments#
- --desc, -d#
Description of dataset.
Default:
''- --metadata, -m#
Yaml files of metadata to assign to dataset version
Default:
[]- --labels, -l#
Yaml files of labels to assign to dataset
Default:
[]- --regex, -x#
Regex to filter which types of files to upload
- --resume, -r#
Resume a canceled/failed upload. To resume, there must be atag.
Default:
False- --processes, -p#
Number of processes. Defaults to 4
Default:
4- --threads, -T#
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b#
Path to folder where benchmark data will be written to.
Ex. osmo dataset upload DS1:latest /path/to/file –desc “My description”
delete#
Marks a Dataset version(s) as PENDING_DELETE. If all versions are marked, prompts the user to delete the dataset from storage. Collection are deleted
osmo dataset delete [-h] [--all] [--force] [--format-type {json,text}] name
Positional Arguments#
- name#
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
Ex. osmo dataset delete DS1:latest –force –format-type json
download#
Download the dataset
osmo dataset download [-h] [--regex REGEX] [--resume] [--processes PROCESSES]
[--threads THREADS] [--benchmark-out BENCHMARK_OUT]
name path
Positional Arguments#
Named Arguments#
- --regex, -x#
Regex to filter which types of files to download
- --resume, -r#
Resume a canceled/failed download.
Default:
False- --processes, -p#
Number of processes. Defaults to 4
Default:
4- --threads, -T#
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b#
Path to folder where benchmark data will be written to.
Ex. osmo dataset download DS1:latest /path/to/folder
update#
Creates a new dataset version from an existing version by adding or removing files.
osmo dataset update [-h] [--add ADD [ADD ...]] [--remove REMOVE]
[--metadata METADATA [METADATA ...]]
[--labels LABELS [LABELS ...]] [--resume RESUME]
[--processes PROCESSES] [--threads THREADS]
[--benchmark-out BENCHMARK_OUT]
name
Positional Arguments#
- name#
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --add, -a#
Local paths/Remote URIs to append to the dataset. To specify path in the dataset where the files should be stored, use “:” to delineate local/path:remote/path. Files in the local path will be stored with the prefix of the remote path. If the path contains “:”, use “:” in the path.
Default:
[]- --remove, -r#
Regex to filter which types of files to remove.
- --metadata, -m#
Yaml files of metadata to assign to the newly created dataset version
Default:
[]- --labels, -l#
Yaml files of labels to assign to the dataset
Default:
[]- --resume#
Resume a canceled/failed update. To resume, specify the PENDING version to continue.
- --processes, -p#
Number of processes. Defaults to 4
Default:
4- --threads, -T#
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b#
Path to folder where benchmark data will be written to.
Ex. osmo dataset update DS1 –add relative/path:remote/path /other/local/path s3://path:remote/path Ex. osmo dataset update DS1 –remove “.*.(yaml|json)$”
recollect#
Add or remove datasets from a collection.
osmo dataset recollect [-h] [--add ADD [ADD ...]]
[--remove REMOVE [REMOVE ...]]
name
Positional Arguments#
- name#
Collection name. Specify bucket with [bucket/]Collection.
Named Arguments#
Ex. osmo dataset recollect C1 –remove DS1 –add DS2:4
list#
List all Datasets/Collections uploaded by the user
osmo dataset list [-h] [--name NAME] [--user USER [USER ...]]
[--bucket BUCKET [BUCKET ...]] [--all-users] [--count COUNT]
[--order {asc,desc}] [--format-type {json,text}]
Named Arguments#
- --name, -n#
Display datasets that have the given substring in their name
Default:
''- --user, -u#
Display all datasets where the user has uploaded to.
Default:
[]- --bucket, -b#
Display all datasets from the given buckets.
Default:
[]- --all-users, -a#
Display all datasets with no filtering on users
Default:
False- --count, -c#
Display the given number of datasets. Default 20. Max 1000.
Default:
20- --order, -o#
Possible choices: asc, desc
Display in the given order. asc means latest at the bottom. desc means latest at the top
Default:
'asc'- --format-type, -t#
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
Ex. osmo dataset list –all-users or osmo dataset list –user abc xyz
tag#
Update Dataset Version tags
osmo dataset tag [-h] [--set SET [SET ...]] [--delete DELETE [DELETE ...]]
name
Positional Arguments#
- name#
Dataset name to update. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
Ex. osmo dataset tag DS1 –set tag1 –delete tag2
label#
Update Dataset labels.
osmo dataset label [-h] [--file] [--set SET [SET ...]]
[--delete DELETE [DELETE ...]] [--format-type {json,text}]
name
Positional Arguments#
- name#
Dataset name to update. Specify bucket with [bucket/][DS].
Named Arguments#
- --file, -f#
If enabled, the inputs to set and delete must be files.
Default:
False- --set, -s#
Set label for dataset in the form “<key>:<type>:<value>” where type is string or numericor the file-path
Default:
[]- --delete, -d#
Delete labels from dataset in the form “<key>”or the file-path
Default:
[]- --format-type, -t#
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
Ex. osmo dataset label DS1 –set key1:string:value1 –delete key2
metadata#
Update Dataset Version metadata. A tag/version is required.
osmo dataset metadata [-h] [--file] [--set SET [SET ...]]
[--delete DELETE [DELETE ...]]
[--format-type {json,text}]
name
Positional Arguments#
- name#
Dataset name to update. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --file, -f#
If enabled, the inputs to set and delete must be files.
Default:
False- --set, -s#
Set metadata from dataset in the form “<key>:<type>:<value>” where type is string or numericor the file-path
Default:
[]- --delete, -d#
Delete metadata from dataset in the form “<key>”or the file-path
Default:
[]- --format-type, -t#
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
Ex. osmo dataset metadata DS1:latest –set key1:string:value1 –delete key2
rename#
Rename dataset/collection
osmo dataset rename [-h] original_name new_name
Positional Arguments#
Ex. osmo dataset rename original_name new_name
query#
Query datasets based on metadata
osmo dataset query [-h] [--bucket BUCKET] [--format-type {json,text}] file
Positional Arguments#
- file#
The Query file to submit
Named Arguments#
Ex. osmo dataset query file.yaml
collect#
Create a Collection
osmo dataset collect [-h] name datasets [datasets ...]
Positional Arguments#
Ex. osmo dataset collect CName C1 DS1 DS2 DS3:latest
inspect#
Display Dataset Directory
osmo dataset inspect [-h] [--format-type {text,tree,json}] [--regex REGEX]
[--count COUNT]
name
Positional Arguments#
- name#
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --format-type, -t#
Possible choices: text, tree, json
Type text is that files are just printed out. Type tree displays a better representation of the directory structure. Type json prints out the list of json objects with both URI and URL links.
Default:
'text'- --regex, -x#
Regex to filter which types of files to inspect
- --count, -c#
Number of files to print. Default 1,000.
Default:
1000
Ex. osmo dataset inspect DS1:latest –format-type json
checksum#
Calculate Directory Checksum
osmo dataset checksum [-h] path [path ...]
Positional Arguments#
- path#
Paths where the folder lies.
Ex. osmo dataset checksum /path/to/folder
migrate#
Migrate a legacy (non-manifest based) dataset to a new manifest based dataset.
osmo dataset migrate [-h] [--processes PROCESSES] [--threads THREADS]
[--benchmark-out BENCHMARK_OUT]
name
Positional Arguments#
- name#
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
Ex. osmo dataset migrate DS1:latest