NV

Infrastructure

EL

Workloads

Training

Epoch 14 / 20

ETA: 4h 23m

24h

7d

30d

Export

Model 120B Fine-tune

Training Loss

0.0234

-42%

Per epoch

Validation Loss

0.0312

+2%

Per epoch

Learning Rate

2.4e-5

GPU Memory

76.2 GB

/ 80 GB

Throughput

1,842

tokens/s

Checkpoint Evaluations

Checkpoint

Epoch

Train Loss

Val Loss Trend

Status

ckpt-014

14

0.0234

0.031

Current

ckpt-010

10

0.0298

0.029

Best

ckpt-005

5

0.0512

0.048

Saved