Single cell collection
FileNames
Bases: str
, Enum
Names of files that are generated in SingleCellCollection.
Source code in bionemo/scdl/io/single_cell_collection.py
57 58 59 60 61 62 |
|
SingleCellCollection
Bases: SingleCellRowDatasetCore
A collection of one or more SingleCellMemMapDatasets.
SingleCellCollection support most of the functionality of the SingleCellDataSet API. An SingleCellCollection can be converted to a single SingleCellMemMapDataset. A SingleCellCollection enables the use of heterogeneous datasets, such as those composed of many AnnData files.
Attributes:
Name | Type | Description |
---|---|---|
_version |
str
|
The version of the dataset |
data_path |
str
|
The directory where the colleection of datasets is stored. |
_feature_index |
RowFeatureIndex
|
The corresponding RowFeatureIndex where features are |
fname_to_mmap |
Dict[str, SingleCellMemMapDataset]
|
dictionary to hold each SingleCellMemMapDataset object. |
False |
Dict[str, SingleCellMemMapDataset]
|
not ragged; all SingleCellMemMapDataset have same column dimemsion |
True |
Dict[str, SingleCellMemMapDataset]
|
ragged; scmmap column dimemsions vary |
Source code in bionemo/scdl/io/single_cell_collection.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
__init__(data_path)
Instantiate the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_path
|
str
|
Where the class will be stored. |
required |
Source code in bionemo/scdl/io/single_cell_collection.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
flatten(output_path, destroy_on_copy=False)
Flattens the collection into a single SingleCellMemMapDataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_path
|
str
|
location to store new dataset |
required |
destroy_on_copy
|
bool
|
Whether to remove the current data_path |
False
|
Source code in bionemo/scdl/io/single_cell_collection.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
load_h5ad(h5ad_path)
Loads data from an existing AnnData archive.
This creates and saves a new backing data structure. Then, the location and the data and the dataset are stored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
h5ad_path
|
str
|
the path to AnnData archive |
required |
Source code in bionemo/scdl/io/single_cell_collection.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
load_h5ad_multi(directory_path, max_workers=5, use_processes=False)
Loads one or more AnnData files and adds them to the collection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory_path
|
str
|
The path to the directory with the AnnData files |
required |
max_workers
|
int
|
the maximal number of workers to use |
5
|
use_processes
|
bool
|
If True, use ProcessPoolExecutor; otherwise, use ThreadPoolExecutor |
False
|
Raises: FileNotFoundError: If no h5ad files are found in the directory. RuntimeError: If an error occurs in the loading of any of the h5ad files.
Source code in bionemo/scdl/io/single_cell_collection.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
number_nonzero_values()
Sum of the number of non zero entries in each dataset.
Source code in bionemo/scdl/io/single_cell_collection.py
162 163 164 |
|
number_of_rows()
The number of rows in the dataset.
Returns:
Type | Description |
---|---|
int
|
The number of rows in the dataset |
Raises: ValueError if the length of the number of rows in the feature index does not correspond to the number of stored rows.
Source code in bionemo/scdl/io/single_cell_collection.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
number_of_values()
Sum of the number of values in each dataset.
Source code in bionemo/scdl/io/single_cell_collection.py
166 167 168 |
|
number_of_variables()
If ragged, returns a list of variable lengths.
If not ragged, returns a list with one entry. A ragged collection is one where the datasets have different lengths.
Source code in bionemo/scdl/io/single_cell_collection.py
190 191 192 193 194 195 196 197 198 199 200 |
|
shape()
Get the shape of the dataset.
This is the number of entries by the the length of the feature index corresponding to that variable.
Returns:
Type | Description |
---|---|
int
|
The total number of elements across dataset |
List[int]
|
A list containing the number of variables for each entry in the RowFeatureIndex. |
Source code in bionemo/scdl/io/single_cell_collection.py
202 203 204 205 206 207 208 209 210 211 212 213 |
|
version()
Returns a version number.
(following
Source code in bionemo/scdl/io/single_cell_collection.py
106 107 108 109 110 111 |
|