nv_ingest_api.util.image_processing package#

Submodules#

nv_ingest_api.util.image_processing.clustering module#

nv_ingest_api.util.image_processing.clustering.boxes_are_close_or_overlap( b1: List[int], b2: List[int], threshold: float = 10.0, ) → bool[source]#

Determine if two bounding boxes either overlap or are within a certain distance threshold.

The function expands each bounding box by threshold in all directions and checks if the expanded regions overlap on both the x-axis and y-axis.

Parameters:

(tuple) (b2)
(tuple)
(float (threshold) – each bounding box before checking for overlap. Defaults to 10.0.
optional) (The distance (in pixels or points) by which to expand) – each bounding box before checking for overlap. Defaults to 10.0.

Returns:

True if the two bounding boxes overlap or are within the specified threshold distance of each other, False otherwise.

Return type:

bool

Example

>>> box1 = (100, 100, 150, 150)
>>> box2 = (160, 110, 200, 140)
>>> boxes_are_close_or_overlap(box1, box2, threshold=10)
True  # Because box2 is within 10 pixels of box1 along the x-axis

nv_ingest_api.util.image_processing.clustering.combine_groups_into_bboxes( boxes: List[List[int]], groups: List[List[int]], min_num_components: int = 1, ) → List[List[int]][source]#

Merge bounding boxes based on grouped indices.

Given:

A list of bounding boxes (boxes), each in the form (xmin, ymin, xmax, ymax).
A list of groups (groups), where each group is a list of indices referring to bounding boxes in boxes.

For each group, this function:

Collects all bounding boxes in that group.
Computes a single bounding box that tightly encompasses all of those bounding boxes by taking the minimum of all xmins and ymins, and the maximum of all xmaxs and ymaxs.
If the group has fewer than min_num_components bounding boxes, it is skipped.

Parameters:

tuple) (boxes (list of) – The original bounding boxes, each in (xmin, ymin, xmax, ymax) format.
int) (groups (list of list of) – A list of groups, where each group is a list of indices into boxes.
(int (min_num_components) – The minimum number of bounding boxes a group must have to produce a merged bounding box. Defaults to 1.
optional) – The minimum number of bounding boxes a group must have to produce a merged bounding box. Defaults to 1.

Returns:

A list of merged bounding boxes, one for each group that meets or exceeds min_num_components. Each bounding box is in the format (xmin, ymin, xmax, ymax).

Return type:

list of list of int

nv_ingest_api.util.image_processing.clustering.group_bounding_boxes( boxes: List[List[int]], threshold: float = 10.0, max_num_boxes: int = 1000, max_depth: int | None = None, ) → List[List[int]][source]#

Group bounding boxes that either overlap or lie within a given proximity threshold.

This function first checks whether the number of bounding boxes exceeds max_num_boxes, returning an empty list if it does (to avoid excessive computation). Then, it builds an adjacency list by comparing each pair of bounding boxes (using boxes_are_close_or_overlap). Any bounding boxes determined to be within threshold distance (or overlapping) are treated as connected.

Using a Depth-First Search (DFS), we traverse these connections to form groups (connected components). Each group is a list of indices referencing bounding boxes in the original boxes list.

Parameters:

tuple) (boxes (list of) – A list of bounding boxes in the format (xmin, ymin, xmax, ymax).
(float (threshold) – The distance threshold used to determine if two boxes are considered “close enough” to be in the same group. Defaults to 10.0.
optional) – The distance threshold used to determine if two boxes are considered “close enough” to be in the same group. Defaults to 10.0.
(int (max_depth) – The maximum number of bounding boxes to process. If the length of boxes exceeds this, a warning is logged and the function returns an empty list. Defaults to 1,000.
optional) – The maximum number of bounding boxes to process. If the length of boxes exceeds this, a warning is logged and the function returns an empty list. Defaults to 1,000.
(int – The maximum depth for the DFS. If None, there is no limit to how many layers deep the search may go when forming connected components. If set, bounding boxes beyond that depth in the adjacency graph will not be included in the group. Defaults to None.
optional) – The maximum depth for the DFS. If None, there is no limit to how many layers deep the search may go when forming connected components. If set, bounding boxes beyond that depth in the adjacency graph will not be included in the group. Defaults to None.

Returns:

Each element is a list (group) containing the indices of bounding boxes that are connected (overlapping or within threshold distance of each other).

Return type:

list of list of int

nv_ingest_api.util.image_processing.clustering.remove_superset_bboxes( bboxes: List[List[int]], ) → List[List[int]][source]#

Remove any bounding box that strictly contains another bounding box.

Specifically, for each bounding box box_a, if it fully encloses another bounding box box_b in all dimensions (with at least one edge strictly larger rather than exactly equal), then box_a is excluded from the results.

Parameters:: (List[List[int]]) (bboxes) – A list of bounding boxes, where each bounding box is a list or tuple of four integers in the format: [x_min, y_min, x_max, y_max].
Returns:: A new list of bounding boxes, excluding those that are strict supersets of any other bounding box in bboxes.
Return type:: List[List[int]]

Example

>>> bboxes = [
...     [0, 0, 5, 5],   # box A
...     [1, 1, 2, 2],   # box B
...     [3, 3, 4, 4]    # box C
... ]
>>> # Box A strictly encloses B and C, so it is removed
>>> remove_superset_bboxes(bboxes)
[[1, 1, 2, 2], [3, 3, 4, 4]]

nv_ingest_api.util.image_processing.processing module#

nv_ingest_api.util.image_processing.processing.extract_tables_and_charts_from_image( annotation_dict, original_image, page_idx, tables_and_charts, )[source]#

Extract and process table and chart regions from the provided image based on detection annotations.

Parameters:

annotation_dict (dict) – A dictionary containing detected objects and their bounding boxes, e.g. keys “table” and “chart”.
original_image (np.ndarray) – The original image from which objects were detected.
page_idx (int) – The index of the current page being processed.
tables_and_charts (list of tuple) – A list to which extracted table/chart data will be appended. Each item is a tuple (page_idx, CroppedImageWithContent).

Notes

This function iterates over the detected table and chart objects. For each detected object, it:

Crops the original image based on the bounding box.
Converts the cropped image to a base64 encoded string.
Wraps the encoded image along with its bounding box and the image dimensions in a standardized data structure.

Additional model inference or post-processing can be added where needed.

Examples

>>> annotation_dict = {"table": [ [...], [...] ], "chart": [ [...], [...] ]}
>>> original_image = np.random.rand(1536, 1536, 3)
>>> tables_and_charts = []
>>> extract_tables_and_charts(annotation_dict, original_image, 0, tables_and_charts)

nv_ingest_api.util.image_processing.processing.extract_tables_and_charts_yolox( pages: List[Tuple[int, ndarray]], config: dict, trace_info: List | None = None, ) → List[Tuple[int, object]][source]#

Given a list of (page_index, image) tuples and a configuration dictionary, this function calls the YOLOX-based inference service to extract table and chart annotations from all pages.

Parameters:

pages (List[Tuple[int, np.ndarray]]) – A list of tuples containing the page index and the corresponding image.
config (dict) –
A dictionary containing configuration parameters such as:
- ’yolox_endpoints’
- ’auth_token’
- ’yolox_infer_protocol’
trace_info (Optional[List], optional) – Optional tracing information for logging/debugging purposes.

Returns:

For each page, returns a tuple (page_index, joined_content) where joined_content is the result of combining annotations from the inference.

Return type:

List[Tuple[int, object]]

nv_ingest_api.util.image_processing.table_and_chart module#

nv_ingest_api.util.image_processing.table_and_chart.assign_boxes(ocr_box, boxes, delta=2.0, min_overlap=0.25)[source]#

Assigns the closest bounding boxes to a reference ocr_box based on overlap.

Parameters:

ocr_box (list or numpy.ndarray) – Reference bounding box [x_min, y_min, x_max, y_max].
boxes (numpy.ndarray) – Array of candidate bounding boxes with shape (N, 4).
delta (float, optional) – Factor for matches relative to the best overlap. Defaults to 2.0.
min_overlap (float, optional) – Minimum required overlap for a match. Defaults to 0.25.

Returns:

Indices of the matched boxes sorted by decreasing overlap.: Returns an empty list if no matches are found.

Return type:

list

nv_ingest_api.util.image_processing.table_and_chart.build_markdown(df)[source]#

Convert a dataframe into a markdown table.

Parameters:: df (pandas DataFrame) – The dataframe to convert.
Returns:: A list of lists representing the markdown table.
Return type:: list[list]

nv_ingest_api.util.image_processing.table_and_chart.convert_ocr_response_to_psuedo_markdown(bboxes, texts)[source]#

nv_ingest_api.util.image_processing.table_and_chart.display_markdown( data: list[list[str]], use_header: bool = False, ) → str[source]#

Convert a list of lists of strings into a markdown table.

Parameters:

data (list[list[str]]) – The table data. The first sublist should contain headers.
use_header (bool, optional) – Whether to use the first sublist as headers. Defaults to True.

Returns:

A markdown-formatted table as a string.

Return type:

str

nv_ingest_api.util.image_processing.table_and_chart.join_yolox_graphic_elements_and_ocr_output( yolox_output, ocr_boxes, ocr_txts, )[source]#: Matching boxes We need to associate a text to the ocr detections. For each class and for each CACHED detections, we look for overlapping text bboxes with IoU > max_iou / delta where max_iou is the biggest found overlap. Found texts are added to the class representation, and removed from the texts to match

nv_ingest_api.util.image_processing.table_and_chart.join_yolox_table_structure_and_ocr_output( yolox_cell_preds, ocr_boxes, ocr_txts, )[source]#

nv_ingest_api.util.image_processing.table_and_chart.match_bboxes(yolox_box, ocr_boxes, already_matched=None, delta=2.0)[source]#

Associates a yolox-graphic-elements box to PaddleOCR bboxes, by taking overlapping boxes. Criterion is iou > max_iou / delta where max_iou is the biggest found overlap. Boxes are expeceted in format (x0, y0, x1, y1) :param yolox_box: Cached Bbox. :type yolox_box: np array [4] :param ocr_boxes: PaddleOCR boxes :type ocr_boxes: np array [n x 4] :param already_matched: Already matched ids to ignore. :type already_matched: list or None, Optional :param delta: IoU delta for considering several boxes. Defaults to 2.. :type delta: float, Optional

Returns:: Indices of the match bboxes
Return type:: np array or list

nv_ingest_api.util.image_processing.table_and_chart.merge_text_in_cell(df_cell)[source]#

Merges text from multiple rows into a single cell and recalculates its bounding box. Values are sorted by rounded (y, x) coordinates.

Parameters:: df_cell (pandas.DataFrame) – DataFrame containing cells to merge.
Returns:: Updated DataFrame with merged text and a single bounding box.
Return type:: pandas.DataFrame

nv_ingest_api.util.image_processing.table_and_chart.process_yolox_graphic_elements(yolox_text_dict)[source]#

Process the inference results from yolox-graphic-elements model.

Parameters:: yolox_text (str) – The result from the yolox model inference.
Returns:: The concatenated and processed chart content as a string.
Return type:: str

nv_ingest_api.util.image_processing.table_and_chart.remove_empty_row(mat)[source]#

Remove empty rows from a matrix.

Parameters:: mat (list[list]) – The matrix to remove empty rows from.
Returns:: The matrix with empty rows removed.
Return type:: list[list]

nv_ingest_api.util.image_processing.table_and_chart.reorder_boxes(boxes, texts, confs, mode='top_left', dbscan_eps=10)[source]#

Reorders the boxes in reading order. If mode is “center”, the boxes are reordered using bbox center. If mode is “top_left”, the boxes are reordered using the top left corner. If dbscan_eps is not 0, the boxes are reordered using DBSCAN clustering.

Parameters:

boxes (np array [n x 4 x 2]) – The bounding boxes of the OCR results.
texts (np array [n]) – The text of the OCR results.
confs (np array [n]) – The confidence scores of the OCR results.
mode (str, optional) – The mode to reorder the boxes. Defaults to “center”.
dbscan_eps (float, optional) – The epsilon parameter for DBSCAN. Defaults to 10.

Returns:

The reordered bounding boxes. List[str]: The reordered texts. List[float]: The reordered confidence scores.

Return type:

List[List[int, …]]

nv_ingest_api.util.image_processing.transforms module#

nv_ingest_api.util.image_processing.transforms.base64_to_disk(base64_string: str, output_path: str) → bool[source]#

Write base64-encoded image data directly to disk without conversion.

This function performs efficient base64 decoding and direct file writing, preserving the original image format without unnecessary decode/encode cycles. Used as the foundation for higher-level image saving operations.

Parameters:

base64_string (str) – Base64-encoded image data. May include data URL prefix.
output_path (str) – Path where the image should be saved.

Returns:

True if successful, False otherwise.

Return type:

bool

Examples

>>> success = base64_to_disk(image_b64, "/path/to/output.jpeg")
>>> if success:
...     print("Image saved successfully")

nv_ingest_api.util.image_processing.transforms.base64_to_numpy(base64_string: str) → ndarray[source]#

Convert a base64-encoded image string to a NumPy array using OpenCV. Returns images in RGB format for consistency.

Parameters:: base64_string (str) – Base64-encoded string representing an image.
Returns:: NumPy array representation of the decoded image in RGB format (for color images). Grayscale images are returned as-is.
Return type:: numpy.ndarray
Raises:: ValueError – If the base64 string is invalid or cannot be decoded into an image.

Examples

>>> base64_str = '/9j/4AAQSkZJRgABAQAAAQABAAD/2wBD...'
>>> img_array = base64_to_numpy(base64_str)
>>> # img_array is now in RGB format (for color images)

nv_ingest_api.util.image_processing.transforms.check_numpy_image_size( image: ndarray, min_height: int, min_width: int, ) → bool[source]#

Checks if the height and width of the image are larger than the specified minimum values.

Parameters: image (np.ndarray): The image array (assumed to be in shape (H, W, C) or (H, W)). min_height (int): The minimum height required. min_width (int): The minimum width required.

Returns: bool: True if the image dimensions are larger than or equal to the minimum size, False otherwise.

nv_ingest_api.util.image_processing.transforms.crop_image( array: array, bbox: Tuple[int, int, int, int], min_width: int = 1, min_height: int = 1, ) → ndarray | None[source]#

Crops a NumPy array representing an image according to the specified bounding box.

Parameters:

array (np.array) – The image as a NumPy array.
bbox (Tuple[int, int, int, int]) – The bounding box to crop the image to, given as (w1, h1, w2, h2).
min_width (int, optional) – The minimum allowable width for the cropped image. If the cropped width is smaller than this value, the function returns None. Default is 1.
min_height (int, optional) – The minimum allowable height for the cropped image. If the cropped height is smaller than this value, the function returns None. Default is 1.

Returns:

The cropped image as a NumPy array, or None if the bounding box is invalid.

Return type:

Optional[np.ndarray]

nv_ingest_api.util.image_processing.transforms.ensure_base64_format(

base64_image: str,

target_format: str = 'PNG',

**kwargs,

) → str[source]#

Ensures the given base64-encoded image is in the specified format. Converts if necessary. Skips conversion if the image is already in the target format.

Parameters:

base64_image (str) – Base64-encoded image string.
target_format (str, optional) – The target image format. Supported formats are “PNG”, “JPEG”/”JPG”. Defaults to “PNG”.
**kwargs – Additional keyword arguments passed to the format-specific encoding function. For JPEG: quality (int, default=100) - JPEG quality (1-100). For PNG: compression (int, default=3) - PNG compression level (0-9).

Returns:

Base64-encoded image string in the specified format.

Return type:

str

Raises:

ValueError – If there is an error during format conversion or if an unsupported format is provided.

nv_ingest_api.util.image_processing.transforms.normalize_image( array: ndarray, r_mean: float = 0.485, g_mean: float = 0.456, b_mean: float = 0.406, r_std: float = 0.229, g_std: float = 0.224, b_std: float = 0.225, ) → ndarray[source]#

Normalizes an RGB image by applying a mean and standard deviation to each channel.

Parameters:#

arraynp.ndarray: The input image array, which can be either grayscale or RGB. The image should have a shape of (height, width, 3) for RGB images, or (height, width) or (height, width, 1) for grayscale images. If a grayscale image is provided, it will be converted to RGB format by repeating the grayscale values across all three channels (R, G, B).
r_meanfloat, optional: The mean to be subtracted from the red channel (default is 0.485).
g_meanfloat, optional: The mean to be subtracted from the green channel (default is 0.456).
b_meanfloat, optional: The mean to be subtracted from the blue channel (default is 0.406).
r_stdfloat, optional: The standard deviation to divide the red channel by (default is 0.229).
g_stdfloat, optional: The standard deviation to divide the green channel by (default is 0.224).
b_stdfloat, optional: The standard deviation to divide the blue channel by (default is 0.225).

Returns:#

np.ndarray: A normalized image array with the same shape as the input, where the RGB channels have been normalized by the given means and standard deviations.

Notes:#

The input pixel values should be in the range [0, 255], and the function scales these values to [0, 1] before applying normalization.

If the input image is grayscale, it is converted to an RGB image by duplicating the grayscale values across the three color channels.

nv_ingest_api.util.image_processing.transforms.numpy_to_base64(

array: ndarray,

format: str = 'PNG',

**kwargs,

) → str[source]#

Converts a NumPy array representing an image to a base64-encoded string.

The function takes a NumPy array, preprocesses it, and then encodes the image in the specified format as a base64 string. The input array is expected to be in a format that can be converted to a valid image, such as having a shape of (H, W, C) where C is the number of channels (e.g., 3 for RGB).

Parameters:

array (np.ndarray) – The input image as a NumPy array. Must have a shape compatible with image data.
format (str, optional) – The image format to use for encoding. Supported formats are “PNG” and “JPEG”. Defaults to “PNG”.
**kwargs – Additional keyword arguments passed to the format-specific encoding function. For JPEG: quality (int, default=100) - JPEG quality (1-100).

Returns:

The base64-encoded string representation of the input NumPy array in the specified format.

Return type:

str

Raises:

ValueError – If the input array cannot be converted into a valid image format, or if an unsupported format is specified.
RuntimeError – If there is an issue during the image conversion or base64 encoding process.

Examples

>>> array = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
>>> encoded_str = numpy_to_base64(array, format="PNG")
>>> isinstance(encoded_str, str)
True
>>> encoded_str_jpeg = numpy_to_base64(array, format="JPEG", quality=90)
>>> isinstance(encoded_str_jpeg, str)
True

nv_ingest_api.util.image_processing.transforms.numpy_to_base64_jpeg(array: ndarray, quality: int = 100) → str[source]#

Converts a preprocessed NumPy array representing an image to a base64-encoded JPEG string using OpenCV.

Parameters:

array (np.ndarray) – The preprocessed input image as a NumPy array. Must have a shape compatible with image data.
quality (int, optional) – JPEG quality (1-100), by default 100. Higher values mean better quality but larger file size.

Returns:

The base64-encoded JPEG string representation of the input NumPy array.

Return type:

str

Raises:

RuntimeError – If there is an issue during the image conversion or base64 encoding process.

nv_ingest_api.util.image_processing.transforms.numpy_to_base64_png(array: ndarray) → str[source]#

Converts a preprocessed NumPy array representing an image to a base64-encoded PNG string using OpenCV.

Parameters:: array (np.ndarray) – The preprocessed input image as a NumPy array. Must have a shape compatible with image data.
Returns:: The base64-encoded PNG string representation of the input NumPy array.
Return type:: str
Raises:: RuntimeError – If there is an issue during the image conversion or base64 encoding process.

nv_ingest_api.util.image_processing.transforms.pad_image( array: ~numpy.ndarray, target_width: int = 1024, target_height: int = 1280, background_color: int = 255, dtype=<class 'numpy.uint8'>, how: str = 'center', ) → Tuple[ndarray, Tuple[int, int]][source]#

Pads a NumPy array representing an image to the specified target dimensions.

If the target dimensions are smaller than the image dimensions, no padding will be applied in that dimension. If the target dimensions are larger, the image will be centered within the canvas of the specified target size, with the remaining space filled with white padding.

The padding can be done around the center (how=”center”), or to the bottom right (how=”bottom_right”).

Parameters:

array (np.ndarray) – The input image as a NumPy array of shape (H, W, C).
target_width (int, optional) – The desired target width of the padded image. Defaults to DEFAULT_MAX_WIDTH.
target_height (int, optional) – The desired target height of the padded image. Defaults to DEFAULT_MAX_HEIGHT.
how (str, optional) – The method to pad the image. Defaults to “center”.

Returns:

padded_array (np.ndarray) – The padded image as a NumPy array of shape (target_height, target_width, C).
padding_offsets (Tuple[int, int]) – A tuple containing the horizontal and vertical offsets (pad_width, pad_height) applied to center the image.

Notes

If the target dimensions are smaller than the current image dimensions, no padding will be applied in that dimension, and the image will retain its original size in that dimension.

Examples

>>> image = np.random.randint(0, 255, (600, 800, 3), dtype=np.uint8)
>>> padded_image, offsets = pad_image(image, target_width=1000, target_height=1000)
>>> padded_image.shape
(1000, 1000, 3)
>>> offsets
(100, 200)

nv_ingest_api.util.image_processing.transforms.save_image_to_disk(

base64_content: str,

output_path: str,

target_format: str = 'auto',

**kwargs,

) → bool[source]#

Save base64 image to disk with optional format conversion.

This function provides a high-level interface for saving images that combines format conversion capabilities with efficient disk writing. It automatically chooses between direct writing (when no conversion needed) and format conversion to optimize performance while maintaining flexibility.

Parameters:

base64_content (str) – Base64-encoded image data.
output_path (str) – Path where the image should be saved.
target_format (str, optional) – Target format (“PNG”, “JPEG”, “auto”). Default is “auto” (preserve original). Use “auto” to preserve the original format for maximum speed.
**kwargs – Additional arguments passed to ensure_base64_format() for conversion. For JPEG: quality (int, default=100) - JPEG quality (1-100). For PNG: compression (int, default=3) - PNG compression level (0-9).

Returns:

True if successful, False otherwise.

Return type:

bool

Examples

>>> # Preserve original format (fastest)
>>> success = save_image_to_disk(image_b64, "/path/to/output.jpeg", "auto")
>>>
>>> # Convert to JPEG with specific quality
>>> success = save_image_to_disk(image_b64, "/path/to/output.jpeg", "JPEG", quality=85)

nv_ingest_api.util.image_processing.transforms.scale_image_to_encoding_size(

base64_image: str,

max_base64_size: int = 180000,

initial_reduction: float = 0.9,

format: str = 'PNG',

**kwargs,

) → Tuple[str, Tuple[int, int]][source]#

Decodes a base64-encoded image, resizes it if needed, and re-encodes it as base64. Ensures the final image size is within the specified limit.

Parameters:

base64_image (str) – Base64-encoded image string.
max_base64_size (int, optional) – Maximum allowable size for the base64-encoded image, by default 180,000 characters.
initial_reduction (float, optional) – Initial reduction step for resizing, by default 0.9.
format (str, optional) – The image format to use for encoding. Supported formats are “PNG” and “JPEG”. Defaults to “PNG”.
**kwargs – Additional keyword arguments passed to the format-specific encoding function. For JPEG: quality (int, default=100) - JPEG quality (1-100). For PNG: compression (int, default=3) - PNG compression level (0-9).

Returns:

A tuple containing: - Base64-encoded image string in the specified format, resized if necessary. - The new size as a tuple (width, height).

Return type:

Tuple[str, Tuple[int, int]]

Raises:

Exception – If the image cannot be resized below the specified max_base64_size.

nv_ingest_api.util.image_processing.transforms.scale_numpy_image( img_arr: ndarray, scale_tuple: Tuple[int, int] | None = None, interpolation=1, ) → ndarray[source]#

Scales a NumPy image array using OpenCV with aspect ratio preservation.

This function provides OpenCV-based image scaling that mimics PIL’s thumbnail behavior by maintaining aspect ratio and scaling to fit within the specified dimensions.

Parameters:

img_arr (np.ndarray) – The input image as a NumPy array.
scale_tuple (Optional[Tuple[int, int]], optional) – A tuple (width, height) to resize the image to. If provided, the image will be resized to fit within these dimensions while maintaining aspect ratio (similar to PIL’s thumbnail method). Defaults to None.
interpolation (int, optional) – OpenCV interpolation method. Defaults to cv2.INTER_LANCZOS4.

Returns:

A NumPy array representing the scaled image data.

Return type:

np.ndarray

Module contents#

nv_ingest_api.util.image_processing.scale_image_to_encoding_size(

base64_image: str,

max_base64_size: int = 180000,

initial_reduction: float = 0.9,

format: str = 'PNG',

**kwargs,

) → Tuple[str, Tuple[int, int]][source]#

Decodes a base64-encoded image, resizes it if needed, and re-encodes it as base64. Ensures the final image size is within the specified limit.

Parameters:

base64_image (str) – Base64-encoded image string.
max_base64_size (int, optional) – Maximum allowable size for the base64-encoded image, by default 180,000 characters.
initial_reduction (float, optional) – Initial reduction step for resizing, by default 0.9.
format (str, optional) – The image format to use for encoding. Supported formats are “PNG” and “JPEG”. Defaults to “PNG”.
**kwargs – Additional keyword arguments passed to the format-specific encoding function. For JPEG: quality (int, default=100) - JPEG quality (1-100). For PNG: compression (int, default=3) - PNG compression level (0-9).

Returns:

A tuple containing: - Base64-encoded image string in the specified format, resized if necessary. - The new size as a tuple (width, height).

Return type:

Tuple[str, Tuple[int, int]]

Raises:

Exception – If the image cannot be resized below the specified max_base64_size.