validate_puzzle_with_multi_replacements

Validates puzzle solutions by applying layer replacements and evaluating model performance.

TODO: Consider moving this a separate module dedicated for scoring

Functions

can_realize_as_symlinks

load_puzzle_solutions

validate_puzzle_solutions

Validate puzzle solutions by applying layer replacements and evaluating model performance.

Parameters:

layer_replacements (list[dict])

Return type:

bool

load_puzzle_solutions(solutions_path, sort_solutions_by, bigger_is_better)
Parameters:
  • solutions_path (Path)

  • sort_solutions_by (str | None)

  • bigger_is_better (bool)

Return type:

list[dict]

validate_puzzle_solutions(args)

Validate puzzle solutions by applying layer replacements and evaluating model performance.

Parameters:

args (DictConfig) –

Configuration object containing the following attributes:

Puzzle Configuration (Required): - replacement_library_path (Path): Path to the replacement library JSON file. - solutions_path (Path): Path to puzzle solutions JSON file or directory containing solution files. - solutions_to_validate (list[int], optional): Indices of specific solutions to validate. Validates all solutions if None. - sort_solutions_by (str, optional): JSON field path to sort solutions by before validation. - bigger_is_better (bool): If True, sort solutions in descending order. Used with sort_solutions_by. - skip_validation (bool): If True, skip model validation and only save models if requested. - save_models (bool): If True, save realized model checkpoints for each solution.

Teacher/Tokenizer Configuration: - teacher_dir (Path, optional): Path to teacher model directory. Auto-inferred if not provided. - tokenizer_name (str, optional): Tokenizer name/path. Uses teacher_dir if not specified.

Model Configuration (Required if skip_validation=False): - model_dtype (str or torch.dtype): Model data type (e.g., “torch.bfloat16”, torch.float16). - autocast_dtype (str or torch.dtype): Autocast data type for mixed precision.

Dataset Configuration (Required if skip_validation=False): - dataset_path (str): Path to the validation dataset. - data_column (str): Column name in dataset containing text data. - block_size (int): Maximum sequence length for tokenization. - eval_samples (int, optional): Number of samples to evaluate. - val_dataset_name (str): Name of validation dataset split. - source_datasets_to_discard (list[str], optional): List of source datasets to exclude. - load_dataset_fn (callable, optional): Custom function to load the dataset.

Data Processing (Required if skip_validation=False): - micro_batch_size (int): Batch size for evaluation. - seed (int): Random seed for reproducibility. - shuffle_seed (int, optional): Seed for shuffling data. - varlen (bool): Enable variable-length sequences. - bos_rate (float): Rate of adding BOS token. - fim_rate (float): Fill-in-the-middle rate for code completion tasks. - fim_spm_rate (float): SPM-based fill-in-the-middle rate.

Output Configuration: - output_dir (Path, optional): Directory to save validation results. Auto-generated from solutions_path if not provided.

Execution Options (Optional if skip_validation=False): - calc_losses_on_cpu (bool): Calculate losses on CPU to avoid OOM. - write_results (bool): Write validation results to file. - activations_log_dir (str, optional): Directory to log activation scores. - activation_hooks_kwargs (str or dict, optional): Arguments for activation hooks.

Returns:

None. Saves validation results and optionally model checkpoints to disk.

Return type:

None