Tutorial: SHA-2 Hash Function

This tutorial walks you through building a SHA-256 hash application using cuPQC-Hash. You'll learn how to use GPU-accelerated SHA-2 hash for data integrity verification.

Step 1: Project Setup

Clone the cuPQC repository:

git clone https://github.com/NVIDIA/cuPQC.git
cd cuPQC/examples/hash

This will download all examples including the SHA-2 example. The Makefile in this directory will compile all examples, including example_sha2.

Step 2: Include Required Headers

Start by including the necessary cuPQC SDK headers:

#include <vector>
#include <iomanip>
#include <iostream>
#include <cstring>

#include <hash.hpp>

using namespace cupqc;

The hash.hpp header provides access to SHA-2 hash functions and other cryptographic hash primitives.

Step 3: Define Hash Function Type

Define the SHA-256 hash function type:

using SHA2_256_THREAD = decltype(SHA2_256() + Thread());

SHA2_256() chooses SHA-2 with 256-bit output.
Thread() specifies that the operator will execute independently for each thread.

Step 4: Create Hash Kernel

Create a CUDA kernel to compute the SHA-256 hash function:

__global__ void hash_sha2_kernel(
    uint8_t* digest, 
    const uint8_t* msg, 
    size_t inbuf_len)
{
    SHA2_256_THREAD hash {};

    if (threadIdx.x == 0) {
        hash.reset();
        hash.update(msg, inbuf_len);
        hash.finalize();
        hash.digest(digest, SHA2_256_THREAD::digest_size);
    }
}

hash.reset(): Initialize the hash state to begin a new hash computation
hash.update(msg, inbuf_len): Add input message data to be hashed, can be called multiple times for streaming data
hash.finalize(): Complete the hash process and prepare for digest extraction
hash.digest(digest, SHA2_256_THREAD::digest_size): Extract the final hash value (digest) into the output buffer, using SHA2_256_THREAD::digest_size to specify the output read size

The hash computation follows a standard pattern: reset the state, update with input message data, finalize the computation, and then extract the digest.

Step 5: Implement Host Function

Create a host function to manage GPU memory and launch the kernel. Only thread 0 performs the hash (single message per block):

void hash_sha2(std::vector<uint8_t>& digest, std::vector<uint8_t>& msg)
{
    uint8_t* d_msg;
    uint8_t* d_digest;

    // Allocate device memory
    cudaMalloc(reinterpret_cast<void**>(&d_msg), msg.size());
    cudaMalloc(reinterpret_cast<void**>(&d_digest), digest.size());

    // Copy input message to device
    cudaMemcpy(d_msg, msg.data(), msg.size(), cudaMemcpyHostToDevice);

    // Launch kernel (1 block, 32 threads)
    hash_sha2_kernel<<<1, 32>>>(d_digest, d_msg, msg.size());

    // Copy result back to host
    cudaMemcpy(digest.data(), d_digest, digest.size(), cudaMemcpyDeviceToHost);

    // Cleanup
    cudaFree(d_msg);
    cudaFree(d_digest);
}

Step 6: Create the Main Application

Implement the main function with example usage. It will take a sample message, compute the SHA-2 hash, and compare it with the expected value:

int main(int argc, char* argv[]) {
    std::cout << "SHA-2 (SHA-256) Hash Function Example\n";
    std::cout << "=====================================\n\n";

    std::cout << "This example demonstrates SHA-256 hash using cuPQC SDK.\n";
    std::cout << "SHA-256 is a cryptographic hash function that produces a "
              << "256-bit (32-byte) hash value. It is widely used for "
              << "data integrity verification.\n\n";

    // Input message
    const char * msg_str = "The quick brown fox jumps over the lazy dog";
    std::vector<uint8_t> msg(
        reinterpret_cast<const uint8_t*>(msg_str), 
        reinterpret_cast<const uint8_t*>(msg_str) + strlen(msg_str)
    );

    std::cout << "Input Data:\n";
    std::cout << "  Text: \"" << msg_str << "\"\n";
    std::cout << "  Size: " << msg.size() << " bytes\n\n";

    // Known expected hash (standard SHA-256 test vector)
    const char* expected_hash_str = 
        "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592";
    std::vector<uint8_t> expected_digest(SHA2_256::digest_size, 0);

    // Convert expected hash string to bytes
    for (size_t i = 0; i < expected_digest.size(); i++) {
        std::string byte_str = std::string(1, expected_hash_str[i*2]) + 
                               expected_hash_str[i*2+1];
        expected_digest[i] = static_cast<uint8_t>(std::stoul(byte_str, nullptr, 16));
    }

    // Compute hash
    std::cout << "Computing SHA-256 Hash...\n";
    std::vector<uint8_t> digest(SHA2_256::digest_size, 0);
    hash_sha2(digest, msg);

    // Display computed hash
    std::cout << "\nComputed Hash:\n  ";
    for (uint8_t num : digest) {
        std::cout << std::hex << std::setw(2) << std::setfill('0') 
                  << static_cast<int>(num);
    }
    std::cout << std::dec << "\n\n";

    // Display expected hash
    std::cout << "Expected Hash:\n  ";
    for (uint8_t num : expected_digest) {
        std::cout << std::hex << std::setw(2) << std::setfill('0') 
                  << static_cast<int>(num);
    }
    std::cout << std::dec << "\n\n";

    // Verify
    bool match = true;
    for (size_t i = 0; i < digest.size(); i++) {
        if (digest[i] != expected_digest[i]) {
            match = false;
            break;
        }
    }

    if (match) {
        std::cout << "✓ Hash verification: VALID\n";
        std::cout << "  The computed hash matches the expected hash.\n";
    } else {
        std::cout << "✗ Hash verification: INVALID\n";
    }

    return 0;
}

Step 7: Build and Run

The Makefile will build all examples in the folder. Run the SHA-2 example:

make
./example_sha2

Step 8: Understanding the Output

Expected output:

SHA-2 (SHA-256) Hash Function Example
=====================================

Input Data:
  Text: "The quick brown fox jumps over the lazy dog"
  Size: 43 bytes

Computing SHA-256 Hash...

Computed Hash:
  d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592

Expected Hash:
  d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592

✓ Hash verification: VALID
  The computed hash matches the expected hash.

Customization Tips

Hash Multiple Messages in Parallel

To hash multiple messages simultaneously. This example assumes all messages have the same length. For messages with different sizes, you need to provide an array of message lengths as an additional input:

__global__ void hash_multiple_sha2_kernel(
    uint8_t* digests, 
    const uint8_t* msgs, 
    size_t msg_len,
    size_t num_messages)
{
    size_t idx = blockIdx.x * blockDim.x + threadIdx.x;

    if (idx < num_messages) {
        SHA2_256_THREAD hash {};
        hash.reset();
        hash.update(msgs + idx * msg_len, msg_len);
        hash.finalize();
        hash.digest(digests + idx * SHA2_256_THREAD::digest_size, 
                   SHA2_256_THREAD::digest_size);
    }
}

Use Different Hash Functions

cuPQC SDK supports multiple hash functions. A few other examples are below:

// SHA-2 variants
using SHA2_384_THREAD = decltype(SHA2_384() + Thread());
using SHA2_512_THREAD = decltype(SHA2_512() + Thread());

// SHA-3 variants
using SHA3_256_THREAD = decltype(SHA3_256() + Thread());
using SHA3_512_THREAD = decltype(SHA3_512() + Thread());

A complete list of supported hash functions is available here: Hash Documentation

Streaming Hash for Large Data

For hash of large files or data streams that uses multiple update:

__global__ void hash_stream_kernel(
    uint8_t* digest,
    const uint8_t* msg,
    size_t total_len,
    size_t chunk_size)
{
    SHA2_256_THREAD hash {};
    hash.reset();

    // Process data in chunks
    for (size_t offset = 0; offset < total_len; offset += chunk_size) {
        size_t len = min(chunk_size, total_len - offset);
        hash.update(msg + offset, len);
    }

    hash.finalize();
    hash.digest(digest, SHA2_256_THREAD::digest_size);
}

Learn More

Example Source Code - Complete SHA-2 hash example implementation
cuPQC-Hash User Guide - Usage guide with examples
cuPQC-Hash API Reference - Complete API documentation
cuPQC-Hash Features - Supported hash functions and capabilities