Appendix#

Entitled NVIDIA Driver Builds No Longer Supported#

Introduction#

Important

Entitled NVIDIA driver builds are deprecated and not supported starting with Red Hat OpenShift 4.10.

The Driver Toolkit (DTK) enables entitlement-free deployments of the GPU Operator. In the past, entitled builds were used pre-DTK and for some OpenShift versions where Driver Toolkit images were broken.

If you encounter the “broken driver toolkit detected” warning on OpenShift 4.10 or later, you should troubleshoot to find the root cause instead of falling back to entitled driver builds.

If the broken DTK warning is encountered on an older version of OpenShift, refer to the documentation for an older version of the NVIDIA GPU operator to enable entitled builds. Keep in mind that older versions of OpenShift might no longer be supported.

Troubleshooting Broken Driver Toolkit Errors#

The most likely reason for the broken DTK message is Node Feature Discovery (NFD) not working correctly. NFD might be disabled, failing, or not updating the kernel version label for other reasons. Another cause might be a missing or incomplete DTK image stream, e.g. because of broken mirroring.

Follow these steps for initial troubleshooting of Node Feature Discovery:

  1. Check Node Feature Discovery (NFD) status:

    $ oc get pods -n openshift-nfd
    

    Ensure NFD pods are running and healthy. If NFD is not deployed or is failing, this can cause DTK issues.

  2. Verify kernel version labels are present and correct:

    $ oc get nodes -o jsonpath='{range .items[*]}{.metadata.name}{":\t"}{.metadata.labels.feature\.node\.kubernetes\.io/kernel-version\.full}{"\n"}{end}'
    

    Ensure nodes have proper kernel version labels that match current OpenShift version of the cluster.

  3. Check Driver Toolkit image stream:

    $ oc get -n openshift is/driver-toolkit
    

    Verify the driver-toolkit image stream exists and has the correct tags that correspond to current OpenShift version.

For additional troubleshooting resources: