Fasta to jsonl
Convert FASTA files to JSONL format for use with inference --prompt-file.
Each FASTA record becomes one JSONL line::
{"id": "sequence_header", "prompt": "ATCGATCG..."}
Usage::
bionemo_fasta_to_jsonl input.fasta output.jsonl
bionemo_fasta_to_jsonl input.fa output.jsonl --upper
This module is used by multiple recipes via bionemo-recipeutils.
It must not import megatron-core, megatron-bridge, or NeMo.
fasta_to_jsonl(input_path, output_path, *, uppercase=False)
Convert a FASTA file to JSONL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_path
|
Path
|
Path to input FASTA file (.fasta, .fa, .fna, etc.). |
required |
output_path
|
Path
|
Path to output JSONL file. |
required |
uppercase
|
bool
|
If True, convert sequences to uppercase. |
False
|
Returns:
| Type | Description |
|---|---|
int
|
Number of records written. |
Source code in bionemo/recipeutils/io/fasta_to_jsonl.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
main()
CLI entry point.
Source code in bionemo/recipeutils/io/fasta_to_jsonl.py
94 95 96 97 98 99 100 101 | |
parse_args()
Parse CLI arguments.
Source code in bionemo/recipeutils/io/fasta_to_jsonl.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |