Inference Request
The main class to describe requests to GptManager
is InferenceRequest
. This is structured as a map of tensors and a uint64_t requestId
.
The mandatory tensors to create a valid InferenceRequest
object are described below. Sampling config params are documented in the C++ GPT Runtime section. Descriptions have been omitted in the table.
Name |
Shape |
Type |
Description |
---|---|---|---|
|
[1,1] |
|
Max number of output tokens |
|
[1, num_input_tokens] |
|
Tensor of input tokens |
Optional tensors that can be supplied to InferenceRequest
are shown below. Default values, where applicable are specified.:
Name |
Shape |
Type |
Description |
---|---|---|---|
|
[1] |
|
(Default= |
|
[1] |
|
(Default=1) Beam width for this request; set to 1 for greedy sampling |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
Sampling Config param: |
|
[1] |
|
End token Id. If not specified, defaults to -1 |
|
[1] |
|
Pad token Id |
|
[1] |
|
Embedding bias |
|
[2, num_bad_words] |
|
Bad words list |
|
[2, num_stop_words] |
|
Stop words list |
|
[1] |
|
P-tuning prompt embedding table |
|
[1] |
|
P-tuning prompt vocab size |
|
[1] |
|
Task ID for the given lora_weights. This ID is expected to be globally unique. To perform inference with a specific LoRA for the first time |
|
[num_lora_modules_layers, D x Hi + Ho x D] |
|
weights for a LoRA adapter. Refer to Run gpt-2b + LoRA using GptManager / cpp runtime for more information. |
|
[num_lora_modules_layers, 3] |
|
LoRA configuration tensor. |
|
[1] |
|
When |
|
[1] |
|
When |
|
[1] |
|
When |
|
[num_draft_tokens] |
|
Draft tokens to be leveraged in generation phase to potentially generate multiple output tokens in one inflight batching iteration |
|
[num_draft_tokens, vocab_size] |
|
Draft logits associated with |