Remote
FTPRemoteResource
dataclass
Bases: RemoteResource
Source code in bionemo/llm/utils/remote.py
145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
download_resource(overwrite=False)
Downloads the resource to its specified fully_qualified_dest name.
Returns: the fully qualified destination filename.
Source code in bionemo/llm/utils/remote.py
146 147 148 149 150 151 152 153 154 155 156 157 |
|
RemoteResource
dataclass
Responsible for downloading remote files, along with optional processing of downloaded files for downstream usecases.
Each object is invoked through either its constructor (setting up the destination and checksum), or through a pre-configured class method.
download_resource()
contains the core functionality, which is to download the file at url
to the fully qualified filename. Class methods
can be used to further configure this process.
Receive
a file, its checksum, a destination directory, and a root directory
Our dataclass then provides some useful things: - fully qualified destination folder (property) - fully qualified destination file (property) - check_exists() - download_resource()
Form the fully qualified destination folder. Create a fully qualified path for the file
(all lives in the download routine) Check that the fq destination folder exists, otherwise create it Download the file. Checksum the download. Done.
Postprocessing should be their own method with their own configuration.
Example usage
The following will download and preprocess the prepackaged resources.
GRCh38Ensembl99ResourcePreparer().prepare() Hg38chromResourcePreparer().prepare() GRCh38p13_ResourcePreparer().prepare()
Attributes:
Name | Type | Description |
---|---|---|
dest_directory |
str
|
The directory to place the desired file upon completing the download. Should have the form {dest_directory}/{dest_filename} |
dest_filename |
str
|
The desired name for the file upon completing the download. |
checksum |
Optional[str]
|
checksum associated with the file located at url. If set to None, check_exists only checks for the existance of |
url |
Optional[str]
|
URL of the file to download |
root_directory |
str | PathLike
|
the bottom-level directory, the fully qualified path is formed by joining root_directory, dest_directory, and dest_filename. |
Source code in bionemo/llm/utils/remote.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
fully_qualified_dest_filename
property
Returns the fully qualified destination path of the file.
Example
/tmp/my_folder/file.tar.gz
check_exists()
Returns true if fully_qualified_dest_filename
exists and the checksum matches self.checksum
Source code in bionemo/llm/utils/remote.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
download_resource(overwrite=False)
Downloads the resource to its specified fully_qualified_dest name.
Returns: the fully qualified destination filename.
Source code in bionemo/llm/utils/remote.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
exists_or_create_destination_directory(exist_ok=True)
Checks that the fully_qualified_destination_directory
exists, if it does not, the directory is created (or fails).
exists_ok: Triest to create fully_qualified_dest_folder
if it doesnt already exist.
Source code in bionemo/llm/utils/remote.py
98 99 100 101 102 103 |
|
get_env_tmpdir()
staticmethod
Convenience method that exposes the environment TMPDIR variable.
Source code in bionemo/llm/utils/remote.py
105 106 107 108 |
|