Skip to content

Preprocess

ResourcePreprocessor dataclass

Bases: ABC

Interface defining a ResourcePreprocessor. Implementors promise to provide both a complete RemoteResource and a freeform preprocess method. This interface can be used to generically define a workflow from a config file.

remote -> prepare -> prepared data.
Source code in bionemo/geneformer/data/preprocess.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
@dataclass
class ResourcePreprocessor(ABC):
    """Interface defining a ResourcePreprocessor. Implementors promise to provide both a complete RemoteResource and a freeform
    preprocess method. This interface can be used to generically define a workflow from a config file.

        remote -> prepare -> prepared data.
    """  # noqa: D205

    root_directory: Optional[str] = field(default_factory=RemoteResource.get_env_tmpdir)
    dest_directory: str = "data"

    def get_checksums(self) -> List[str]:  # noqa: D102
        return [resource.checksum for resource in self.get_remote_resources()]

    def get_urls(self) -> List[str]:  # noqa: D102
        return [resource.url for resource in self.get_remote_resources()]

    @abstractmethod
    def get_remote_resources(self) -> List[RemoteResource]:
        """Gets the remote resources associated with this preparor."""
        raise NotImplementedError()

    @abstractmethod
    def prepare(self) -> List:
        """Returns a list of prepared filenames."""
        raise NotImplementedError()

get_remote_resources() abstractmethod

Gets the remote resources associated with this preparor.

Source code in bionemo/geneformer/data/preprocess.py
44
45
46
47
@abstractmethod
def get_remote_resources(self) -> List[RemoteResource]:
    """Gets the remote resources associated with this preparor."""
    raise NotImplementedError()

prepare() abstractmethod

Returns a list of prepared filenames.

Source code in bionemo/geneformer/data/preprocess.py
49
50
51
52
@abstractmethod
def prepare(self) -> List:
    """Returns a list of prepared filenames."""
    raise NotImplementedError()