Adding Support for New Hardware
This guide explains how to add or extend hardware support in the NICo stack when new BMC/server hardware arrives that does not work out of the box. The general process is: ingest the hardware, observe where it fails, and patch the appropriate layer based on which of the three scenarios applies.
Important: Changes for new hardware must not break support for existing hardware. Guard new behavior behind vendor/model/firmware checks rather than modifying shared code paths.
For background on how NICo uses Redfish end-to-end, see Redfish Workflow. For the list of currently supported hardware, see the Hardware Compatibility List.
Overview
NICo discovers and manages bare-metal hosts through their BMC (Baseboard Management Controller) via the DMTF Redfish standard. Two Rust Redfish client libraries handle this:
| Library | Role | Where Used |
|---|---|---|
| nv-redfish | Schema-driven, fast: site exploration reports, firmware inventory, sensor collection, health monitoring. Preferred for exploration. | Site Explorer exploration (crates/api/src/site_explorer/), Hardware Health (crates/health/src/) |
| libredfish | Stateful BMC interactions: boot config, BIOS setup, power control, account/credential management, lockdown | Site Explorer state controller operations (crates/api/src/site_explorer/) |
Site Explorer supports both libraries for generating EndpointExplorationReports, controlled by the explore_mode configuration setting (SiteExplorerExploreMode):
| Mode | Behavior |
|---|---|
nv-redfish | Use nv-redfish for exploration (preferred - significantly faster) |
libredfish | Use libredfish for exploration (legacy) |
compare-result | Run both and compare results (transition/validation) |
When new hardware arrives, failures can surface in either library. Exploration failures show up in whichever explore_mode is active (increasingly nv-redfish). State controller failures (boot order, BIOS setup, lockdown, credential rotation) show up in libredfish, which remains the library used for all write operations against BMCs. Both libraries may need changes to support a new platform.
Beyond the Redfish libraries, NICo itself has vendor-aware logic that also needs updating - see Changes in NICo.
The Three Scenarios
Scenario 1: Completely New BMC Vendor
The hardware uses a BMC firmware stack that does not map to any existing RedfishVendor variant.
What to do:
-
Add a
RedfishVendorvariant inlibredfish/src/model/service_root.rs. -
Extend vendor detection in
ServiceRoot::vendor()(same file). The vendor string comes fromGET /redfish/v1- theVendorfield, or failing that, the first key in theOemobject. If the vendor string alone is not enough to distinguish the BMC (e.g., the vendor is "Lenovo" but some models use an AMI-based BMC), use secondary signals likeself.has_ami_bmc()orself.product. -
Create a vendor module (or reuse an existing one). Each vendor has a file
libredfish/src/<vendor>.rscontaining aBmcstruct that implements theRedfishtrait. If the new vendor's BMC is very close to an existing one (e.g., LenovoAMI reusesami::Bmc), you can route to the existing implementation. -
Wire up
set_vendorinlibredfish/src/standard.rsto dispatch the new variant to the appropriateBmcimplementation. -
Implement the
Redfishtrait for the newBmc. Start by delegating toRedfishStandardand override methods as needed. The methods below are grouped by how they are used in the state machine; almost all need vendor-specific overrides.BIOS / machine setup - called during initial ingestion and instance creation to configure UEFI settings:
machine_setup()- applies BIOS attributes (names differ per vendor and model)machine_setup_status()- polls whether allmachine_setupchanges have taken effectis_bios_setup()- lightweight check used during instance creation (PollingBiosSetup) to confirm BIOS is ready before proceeding to boot order configuration
Lockdown - called to secure the BMC before tenant use and unlocked during instance termination or reconfiguration:
lockdown()- enable/disable BMC security lockdownlockdown_status()- polled by the state controller to confirm lockdown state; wrong results cause machines to get stucklockdown_bmc()- lower-level BMC-specific lockdown (e.g., iDRAC lockdown on Dell, distinct from BIOS lockdown)
Boot order - called during ingestion to set DPU-first boot and during DPU reprovisioning:
set_boot_order_dpu_first()- reorder boot options so the DPU boots first (platform-specific boot option discovery)boot_once()- one-time boot from a specific target (e.g.,UefiHttpfor DPU HTTP boot path)boot_first()- persistently change boot order to a given target
Serial console - SSH console access setup:
setup_serial_console()- configure BMC serial-over-LANserial_console_status()- polled to confirm setup; incorrect results stall provisioning
Credential management - called during initial ingestion to rotate factory defaults:
change_password()- rotate BMC user passwordchange_uefi_password()/clear_uefi_password()- UEFI password management (only tested on Dell, Lenovo, NVIDIA)set_machine_password_policy()- apply password-never-expires policy (vendor-specific)
Important: Pay careful attention to all status/polling methods (
is_bios_setup(),lockdown_status(),machine_setup_status(),serial_console_status(), etc.). The state controller polls these during provisioning, instance creation, instance termination, and reprovisioning to decide when to advance state. If they return incorrect results, machines will get stuck in polling states, fail to terminate properly, or skip required configuration steps. -
Add OEM model types if needed in
libredfish/src/model/oem/<vendor>.rs. -
Add unit tests for vendor detection and create a mockup directory for integration tests (see Testing).
-
Update nv-redfish - since nv-redfish is the preferred library for site exploration, it will likely need changes too. See nv-redfish Quirks.
-
Update NICo - add the vendor to
BMCVendor,HwType, and handle any state controller quirks. See Changes in NICo.
Scenario 2: New Server Model with Quirks
The hardware uses an already-supported BMC vendor but the specific model has quirks: different BIOS attribute names, unusual boot option paths, model-specific OEM extensions, etc.
What to do:
-
Identify the model string.
GET /redfish/v1/Systems/{id}returns aModelfield. The functionmodel_coerce()inlibredfish/src/lib.rsnormalizes this by replacing spaces with underscores. -
Use BIOS / OEM manager profiles for config-driven differences. NICo supports per-vendor, per-model BIOS settings via the
BiosProfileVendortype inlib.rs, letting you define model-specific attributes in config (TOML) without code changes. -
Add model-specific branches in the vendor module when profiles are not enough. Use the model/product string from
ComputerSystemto gate behavior. -
Handle missing or renamed attributes. Check the actual BIOS attributes via
GET /redfish/v1/Systems/{id}/Bioson the target hardware. If an attribute is missing, add a guard that logs and skips rather than failing.
Scenario 3: New Firmware for an Existing Model
A firmware update for an already-supported model introduces regressions: removed endpoints, changed response schemas, renamed attributes, etc.
What to do:
-
Compare old and new firmware Redfish responses. Use
curlorcarbide-admin-cli redfish browsetoGETendpoints on both versions and diff. -
Add defensive handling where endpoints may no longer exist - catch
404errors and fall through. -
Fix deserialization issues: null values in arrays (custom deserializers), new enum values, missing required fields (
Option<T>). -
Adjust OEM-specific paths if the firmware reorganizes its Redfish tree.
-
Guard behavioral changes behind firmware version checks if needed, using
ServiceRoot.redfish_versionor firmware inventory versions.
Changes in NICo
Beyond the Redfish libraries, NICo itself has vendor-aware logic that needs updating for new hardware.
BMCVendor enum (crates/bmc-vendor/src/lib.rs)
NICo has its own BMCVendor enum, distinct from libredfish's RedfishVendor. It is used throughout NICo for vendor-specific branching in the state controller, credential management, and exploration. When adding a new vendor:
- Add the variant to
BMCVendor. - Add the
From<RedfishVendor>mapping so libredfish's vendor detection flows into NICo's enum. - Add parsing in
From<&str>,from_udev_dmi(), andfrom_tls_issuer()as applicable.
HwType enum (crates/bmc-explorer/src/hw/mod.rs)
The bmc-explorer crate (used by the nv-redfish exploration path) classifies hardware into HwType variants. Each variant maps to a BMCVendor via bmc_vendor(). For a new hardware type, add a variant to HwType and implement the required methods. If the hardware type has unique exploration behavior, add a corresponding module under crates/bmc-explorer/src/hw/.
State controller vendor branches
The state controller (crates/api/src/state_controller/machine/handler.rs) has vendor-specific logic gated on BMCVendor for operations that cannot be handled generically in libredfish. Examples:
- Factory credential rotation: On first exploration, NICo changes the factory default BMC password. This is vendor-aware - ensure the new vendor's credential rotation path works correctly.
- UEFI password setting: Only tested on Dell, Lenovo, and NVIDIA - other vendors log a warning and skip.
- Power cycling: Lenovo SR650 V4s use IPMI chassis reset instead of Redfish
ForceRestartto avoid killing DPU power. Lenovo BMCs need an explicitbmc_reset()after firmware upgrades. - Lockdown: Dell requires BMC lockdown to be disabled separately before UEFI password changes.
Review handler.rs for bmc_vendor().is_*() calls and add branches for the new vendor where its behavior differs.
Testing with carbide-admin-cli redfish
The fastest way to validate libredfish changes against a real BMC is to compile carbide-admin-cli with a local checkout of libredfish and use the redfish subcommand to test specific operations directly, rather than waiting for Site Explorer or the state machine to exercise the code path.
Setup: Use a local libredfish checkout
Place your libredfish checkout inside the NICo workspace (or anywhere accessible), then override the dependency in the workspace Cargo.toml:
# Cargo.toml (workspace root)
[workspace.dependencies]
# Comment out the git version:
# libredfish = { git = "https://github.com/NVIDIA/libredfish.git", tag = "v0.43.5" }
# Point to your local checkout instead:
libredfish = { path = "libredfish" }
Then build the CLI from the crates/admin-cli directory:
cd crates/admin-cli
cargo build
Running commands against a real BMC
The redfish subcommand talks directly to a BMC - no NICo deployment needed:
# Check if vendor detection and basic connectivity work
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> get-power-state
# Read BIOS attributes to see what the BMC exposes
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> bios-attrs
# Test machine setup (the core provisioning step)
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> machine-setup
# Check if machine setup succeeded
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> machine-setup-status
# Test boot order (set DPU first)
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> set-boot-order-dpu-first --boot-interface-mac <dpu-mac>
# Test lockdown
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> lockdown-enable
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> lockdown-status
# Browse any Redfish endpoint directly
./target/debug/carbide-admin-cli redfish --address <bmc-ip> --username <user> --password <pass> browse --uri /redfish/v1
If all of these commands work correctly, there is a good chance the hardware will work end-to-end through Site Explorer and the state machine.
Code Structure Reference
libredfish/
├── src/
│ ├── lib.rs # Redfish trait, BiosProfile types, model_coerce()
│ ├── standard.rs # RedfishStandard: defaults + set_vendor() dispatch
│ ├── network.rs # create_client(): ServiceRoot → vendor → set_vendor
│ ├── ami.rs, dell.rs, hpe.rs, # Vendor-specific Redfish trait implementations
│ │ lenovo.rs, supermicro.rs, ...
│ └── model/
│ ├── service_root.rs # RedfishVendor enum, vendor detection
│ ├── oem/ # Vendor-specific OEM data models
│ └── testdata/ # JSON fixtures for unit tests
├── tests/
│ ├── integration_test.rs # Per-vendor integration tests
│ ├── mockups/<vendor>/ # Redfish JSON mockup trees
│ └── redfishMockupServer.py # Python server for mockups
nico/
├── crates/bmc-vendor/src/lib.rs # BMCVendor enum + From<RedfishVendor>
├── crates/bmc-explorer/src/hw/mod.rs # HwType enum (nv-redfish exploration)
├── crates/api/src/state_controller/ # Vendor-specific state machine logic
└── crates/admin-cli/src/redfish/ # carbide-admin-cli redfish subcommand
Adding nv-redfish Quirks for Exploration and Health Monitoring
nv-redfish is the preferred library for site exploration reports and is also used for health monitoring (carbide-hw-health). If the new hardware causes failures in either path, the fix goes into nv-redfish.
-
Add a
Platformvariant innv-redfish/redfish/src/bmc_quirks.rsif the quirk is platform-specific. -
Map the variant in
BmcQuirks::new()using the vendor string, redfish version, and product from the service root. -
Add quirk methods for each workaround. Common quirks:
bug_missing_root_nav_properties()- BMC omitsSystems/Chassis/Managersfrom service rootexpand_is_not_working_properly()-$expandquery parameter brokenwrong_resource_status_state()- non-standardStatus.Stateenum valuesfw_inventory_wrong_release_date()- invalid date formats
-
Add OEM feature support if needed. OEM extensions are gated behind Cargo features (
oem-ami,oem-dell,oem-hpe, etc.) innv-redfish/redfish/Cargo.toml.
Testing
Unit Tests
Add vendor detection tests in libredfish/src/model/service_root.rs. For complex detection (like LenovoAMI which checks the Oem field), use JSON test fixtures in src/model/testdata/.
Testing Against Real Hardware
Use carbide-admin-cli redfish with a local libredfish checkout (see above) to validate all key operations before deploying. Then test the full cycle through a NICo instance: discovery → ingestion → BIOS setup → boot order → lockdown → health monitoring.