osmo dataset#
usage: osmo dataset [-h] {info,upload,delete,download,update,recollect,list,tag,label,metadata,rename,query,collect,inspect,checksum,migrate} ...
Positional Arguments#
- command
Possible choices: info, upload, delete, download, update, recollect, list, tag, label, metadata, rename, query, collect, inspect, checksum, migrate
Sub-commands#
info#
Provide details of the dataset/collection
osmo dataset info [-h] [--all] [--count COUNT] [--order {asc,desc}] [--format-type {json,text}] name
Positional Arguments#
- name
Dataset name. Specify bucket with [bucket/]DS.
Named Arguments#
- --all, -a
Display all versions in any state.
Default:
False- --count, -c
For Datasets. Display the given number of versions. Default 100.
Default:
100- --order, -o
Possible choices: asc, desc
For Datasets. Display in the given order based on date created
Default:
'asc'- --format-type, -t
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
upload#
Upload a new Dataset/Collection
osmo dataset upload [-h] [--desc DESCRIPTION] [--metadata METADATA [METADATA ...]] [--labels LABELS [LABELS ...]] [--regex REGEX] [--resume]
[--processes PROCESSES] [--threads THREADS] [--benchmark-out BENCHMARK_OUT]
name path [path ...]
Positional Arguments#
- name
Dataset name. Specify bucket and tag with [bucket/]DS[:tag].If you want to continue an upload, then the most recent PENDING version is chosen
- path
Path where the dataset lies.
Named Arguments#
- --desc, -d
Description of dataset.
Default:
''- --metadata, -m
Yaml files of metadata to assign to dataset version
Default:
[]- --labels, -l
Yaml files of labels to assign to dataset
Default:
[]- --regex, -x
Regex to filter which types of files to upload
- --resume, -r
Resume a canceled/failed upload. To resume, there must be atag.
Default:
False- --start-only
Default:
False- --processes, -p
Number of processes. Defaults to 8
Default:
8- --threads, -T
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b
Path to folder where benchmark data will be written to.
delete#
Marks a Dataset version(s) as PENDING_DELETE. If all versions are marked, prompts the user to delete the dataset from storage. Collection are deleted
osmo dataset delete [-h] [--all] [--force] [--format-type {json,text}] name
Positional Arguments#
- name
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --all, -a
Deletes all versions.
Default:
False- --force, -f
Deletes without confirmation.
Default:
False- --format-type, -t
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
download#
Download the dataset
osmo dataset download [-h] [--regex REGEX] [--resume] [--processes PROCESSES] [--threads THREADS] [--benchmark-out BENCHMARK_OUT] name path
Positional Arguments#
- name
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
- path
Location where the dataset is downloaded to.
Named Arguments#
- --regex, -x
Regex to filter which types of files to download
- --resume, -r
Resume a canceled/failed download.
Default:
False- --processes, -p
Number of processes. Defaults to 8
Default:
8- --threads, -T
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b
Path to folder where benchmark data will be written to.
update#
Creates a new dataset version from an existing version by adding or removing files.
osmo dataset update [-h] [--add ADD [ADD ...]] [--remove REMOVE] [--metadata METADATA [METADATA ...]] [--labels LABELS [LABELS ...]] [--resume RESUME]
[--processes PROCESSES] [--threads THREADS] [--benchmark-out BENCHMARK_OUT]
name
Positional Arguments#
- name
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --add, -a
Local paths/Remote URIs to append to the dataset. To specify path in the dataset where the files should be stored, use “:” to delineate local/path:remote/path. Files in the local path will be stored with the prefix of the remote path. If the path contains “:”, use “:” in the path.
Default:
[]- --remove, -r
Regex to filter which types of files to remove.
- --metadata, -m
Yaml files of metadata to assign to the newly created dataset version
Default:
[]- --labels, -l
Yaml files of labels to assign to the dataset
Default:
[]- --resume
Resume a canceled/failed update. To resume, specify the PENDING version to continue.
- --start-only
Default:
False- --processes, -p
Number of processes. Defaults to 8
Default:
8- --threads, -T
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b
Path to folder where benchmark data will be written to.
recollect#
Add or remove datasets from a collection.
osmo dataset recollect [-h] [--add ADD [ADD ...]] [--remove REMOVE [REMOVE ...]] name
Positional Arguments#
- name
Collection name. Specify bucket with [bucket/]Collection.
Named Arguments#
- --add, -a
Datasets to add to collection.
Default:
[]- --remove, -r
Datasets to remove from collection. The remove operation happens before the add.
Default:
[]
list#
List all Datasets/Collections uploaded by the user
osmo dataset list [-h] [--name NAME] [--user USER [USER ...]] [--bucket BUCKET [BUCKET ...]] [--all-users] [--count COUNT] [--order {asc,desc}]
[--format-type {json,text}]
Named Arguments#
- --name, -n
Display datasets that have the given substring in their name
Default:
''- --user, -u
Display all datasets where the user has uploaded to.
Default:
[]- --bucket, -b
Display all datasets from the given buckets.
Default:
[]- --all-users, -a
Display all datasets with no filtering on users
Default:
False- --count, -c
Display the given number of datasets. Default 20. Max 1000.
Default:
20- --order, -o
Possible choices: asc, desc
Display in the given order. asc means latest at the bottom. desc means latest at the top
Default:
'asc'- --format-type, -t
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
tag#
Update Dataset Version tags
osmo dataset tag [-h] [--set SET [SET ...]] [--delete DELETE [DELETE ...]] name
Positional Arguments#
- name
Dataset name to update. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --set, -s
Set tag to dataset version.
Default:
[]- --delete, -d
Delete tag from dataset version.
Default:
[]
label#
Update Dataset labels.
osmo dataset label [-h] [--file] [--set SET [SET ...]] [--delete DELETE [DELETE ...]] [--format-type {json,text}] name
Positional Arguments#
- name
Dataset name to update. Specify bucket with [bucket/][DS].
Named Arguments#
- --file, -f
If enabled, the inputs to set and delete must be files.
Default:
False- --set, -s
Set label for dataset in the form “<key>:<type>:<value>” where type is string or numericor the file-path
Default:
[]- --delete, -d
Delete labels from dataset in the form “<key>”or the file-path
Default:
[]- --format-type, -t
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
metadata#
Update Dataset Version metadata. A tag/version is required.
osmo dataset metadata [-h] [--file] [--set SET [SET ...]] [--delete DELETE [DELETE ...]] [--format-type {json,text}] name
Positional Arguments#
- name
Dataset name to update. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --file, -f
If enabled, the inputs to set and delete must be files.
Default:
False- --set, -s
Set metadata from dataset in the form “<key>:<type>:<value>” where type is string or numericor the file-path
Default:
[]- --delete, -d
Delete metadata from dataset in the form “<key>”or the file-path
Default:
[]- --format-type, -t
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
rename#
Rename dataset/collection
osmo dataset rename [-h] original_name new_name
Positional Arguments#
- original_name
Old dataset/collection name. Specify bucket with [bucket/][DS].
- new_name
New dataset/collection name.
query#
Query datasets based on metadata
osmo dataset query [-h] [--bucket BUCKET] [--format-type {json,text}] file
Positional Arguments#
- file
The Query file to submit
Named Arguments#
- --bucket, -b
bucket to query.
- --format-type, -t
Possible choices: json, text
Specify the output format type (Default text).
Default:
'text'
collect#
Create a Collection
osmo dataset collect [-h] name datasets [datasets ...]
Positional Arguments#
- name
Collection name. Specify bucket and with [bucket/][C]. All datasets and collections added to this collection are based off of this bucket
- datasets
Each Dataset to add to collection. To create a collection from another collection, add the collection name.
inspect#
Display Dataset Directory
osmo dataset inspect [-h] [--format-type {text,tree,json}] [--regex REGEX] [--count COUNT] name
Positional Arguments#
- name
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --format-type, -t
Possible choices: text, tree, json
Type text is that files are just printed out. Type tree displays a better representation of the directory structure. Type json prints out the list of json objects with both URI and URL links.
Default:
'text'- --regex, -x
Regex to filter which types of files to inspect
- --count, -c
Number of files to print. Default 1,000.
Default:
1000
checksum#
Calculate Directory Checksum
osmo dataset checksum [-h] path [path ...]
Positional Arguments#
- path
Paths where the folder lies.
migrate#
Migrate a legacy (non-manifest based) dataset to a new manifest based dataset.
osmo dataset migrate [-h] [--processes PROCESSES] [--threads THREADS] [--benchmark-out BENCHMARK_OUT] name
Positional Arguments#
- name
Dataset name. Specify bucket and tag/version with [bucket/]DS[:tag/version].
Named Arguments#
- --processes, -p
Number of processes. Defaults to 8
Default:
8- --threads, -T
Number of threads per process. Defaults to 20
Default:
20- --benchmark-out, -b
Path to folder where benchmark data will be written to.