Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Release Notes

This document contains release notes for the NCX Infra Controller (NICo) project.

Bare Metal Manager 0.2.0

This release of Bare Metal Manager is open-source software (OSS).

Improvements

  • The REST API now supports external identity providers (IdPs) for JWT authentication.
  • The new /carbide/instance/batch REST API endpoint allows for batch instances creation.
  • Instances can now be rebooted by passing an instance_id argument, in addition to the existing machine_id argument.
  • The State Controller is now split into two independent components: The PeriodicEnqueuer, which periodically enqueues state handling tasks using the Enqueuer::enqueue_object API for each resource/object managed by NICo, and the StateProcessor, which continuously de-queues the state handling tasks for each object type and executes the state handler on them.
  • The state handler for objects is now scheduled again whenever the outcome of the state handler is Transition. This reduces the wait time for many state transitions by up to 30 seconds.
  • The state handler is now re-scheduled for immediate execution if the DPU reports a different version from the previous check. This should reduce the time for wait states like WaitingForNetworkConfig.
  • During the pre-ingestion phase, NICo will now set the time zone to UTC if it detects that time is out of sync. This allows the system to correctly interpret NTP timestamps from the time server.
  • The Scout agent can now perform secure erase of NVMe devices asynchronously.
  • NVLink interfaces are now marked as Pending when an update request is being sent.
  • The update logic for NVLink Logical Partition inventory metadata has been improved.
  • The DpuExtensionService now supports name as an argument for the orderBy parameter.
  • NICo now supports bulk creation/update of ExpectedMachine objects.
  • The Go version has been updated to v1.25.4.
  • The nv-redfish package has been updated to v0.1.3.

Bug Fixes

  • The above nv-redfish package update fixes a critical bug with the BMC cache, which caused multiple cache miss errors, preventing the health monitor from re-discovery of monitored entities.

Bare Metal Manager EA

What This Release Enables

  • Microservice: Our goal is to make NICo deployable and independent of NGC dependencies, enabling a "Disconnected NICo" deployment model.
  • GB200 Support: This release enables GB200 Node Ingestion and NVLink Partitioning, with the ability to provision both single and dual DPUs, ingest the GB200 compute trays, and validate the SKU. After ingestion, partners can create NVLink partitions, select instances, and configure the NVLink settings using the Admin CLI.
  • Deployment Flexibility: The release includes both the source code and instructions to compile containers for NICo. Our goal is to make the NICo deployable and independent of NGC dependencies, enabling a "Disconnected NICo" deployment model.

What You Can Test

The following key functionalities should be available for testing via the Admin CLI:

  • GB200 Node Ingestion: Partners should be able to:
    • Install NICo.
    • Provision the DPUs (Dual DPUs are also supported).
    • Ingest the expected machines (GB200 compute trays).
    • Validate the SKU.
    • Assign instance types (Note that this currently requires encoding the rack location for GB200).
  • NVLink Partitioning: Once the initial ingestion is complete, partners can do the following:
    • Create allocations and instances.
    • Create a partition.
    • Select an instance.
    • Set the NVLink configuration.
  • Disconnected NICo: This release allows for operation without any dependency on NGC.

Dependencies

CategoryRequired ComponentsDescription
SoftwareVault, postgres, k8s cluster, Certificate Management, TemporalPartners are required to bring in NICo dependencies
HardwareSupported server and switch functionality(e.g. x86 nodes, specific NIC firmware, compatible BMCs, Switches that support BGP, EVPN, and RFC 5549 (unnumbered IPs))The code assumes predictable hardware attributes; unsupported SKUs may require custom configuration.
Network TopologyL2/L3 connectivity, DHCP/PXE servers, out-of-band management networks, specific switch side port configurationsAll modules (e.g. discovery, provisioning) require pre-configured subnets and routing policies, as well as delegation of IP prefixes, ASN numbers, and EVPN VNI numbers.
External SystemsDNS resolvers/recursers, NTP, Authentication (Azure OIDC, Keycloak), Observability StackNICo provides clients with DNS resolver and NTP server information in the DHCP response. External authentication source that supports OIDC. NICo sends open-telemetry metrics and logs into an existing visualization/storage system

Supported Switches:

  • Optics Compatibility w/B3220 BF-3
  • RFC5549 BGP Unnumbered routed ports
  • IPv4/IPv6 Unicast BGP address family
  • EVPN BGP address family
  • LLDP
  • BGP External AS
  • DHCP Relay that supports Option 82