Rank Filter
- class nvidia_resiliency_ext.inprocess.rank_filter.MaxActiveWorldSize(max_active_world_size)[source]
MaxActiveWorldSize
ensures that the active world size is no greater than the specifiedmax_active_world_size
. Ranks with indices less than the active world size are active and calling the wrapped function, while ranks outside this range are inactive (sleeping).
- class nvidia_resiliency_ext.inprocess.rank_filter.RankFilter[source]
RankFilter
selects which ranks are active in the current restart iteration ofinprocess.Wrapper
.Active ranks call the provided wrapped function. Inactive ranks are waiting idle, and could serve as a pool of static, preallocated and preinitialized spare ranks. Spare ranks would be activated in a subsequent restart iteration if previously active ranks were terminated or became unhealthy.
Multiple instances of
RankFilter
could be composed withinprocess.Compose
to achieve the desired behavior.
- class nvidia_resiliency_ext.inprocess.rank_filter.WorldSizeDivisibleBy(divisor=1)[source]
WorldSizeDivisibleBy
ensures that the active world size is divisible by a given number. Ranks within the adjusted world size are marked as active and are calling the wrapped function, while ranks outside this range are marked as inactive (sleeping).- Parameters:
divisor (int) – the divisor to adjust the active world size by