NV

Infrastructure

Documentation Support EL

Workloads

Training jobs Checkpoints Datasets Clusters Deployments
Training Epoch 14 / 20 ETA: 4h 23m
24h 7d 30d Export

Model 120B Fine-tune

Training Loss

0.0234

-42%
Per epoch

Validation Loss

0.0312

+2%
Per epoch

Learning Rate

2.4e-5

GPU Memory

76.2 GB

/ 80 GB

Throughput

1,842

tokens/s

Checkpoint Evaluations

Checkpoint Epoch Train Loss Val Loss Trend Status ckpt-014 14 0.0234
0.031
Current
ckpt-010 10 0.0298
0.029
Best
ckpt-005 5 0.0512
0.048
Saved