Spark Rapids ML#
Feature#
Classification#
| 
 | LogisticRegression is a machine learning model where the response y is modeled by the sigmoid (or softmax for more than 2 classes) function applied to a linear combination of the features in X. | 
| 
 | Model fitted by  | 
| 
 | RandomForestClassifier implements a Random Forest classifier model which fits multiple decision tree classifiers in an ensemble. | 
| 
 | Model fitted by  | 
Clustering#
| 
 | The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a non-parametric data clustering algorithm based on data density. | 
| 
 | |
| 
 | KMeans algorithm partitions data points into a fixed number (denoted as k) of clusters. | 
| 
 | KMeans gpu model for clustering input vectors to learned k centers. | 
Regression#
| 
 | LinearRegression is a machine learning model where the response y is modeled by a linear combination of the predictors in X. | 
| 
 | Model fitted by  | 
| 
 | RandomForestRegressor implements a Random Forest regressor model which fits multiple decision tree in an ensemble. | 
| 
 | Model fitted by  | 
Nearest Neighbors#
| 
 | ApproximateNearestNeighbors retrieves k approximate nearest neighbors (ANNs) in item vectors for each query. | 
| 
 | |
| 
 | NearestNeighbors retrieves the exact k nearest neighbors in item vectors for each query vector. | 
| 
 | 
Tuning#
| 
 | K-fold cross validation performs model selection by splitting the dataset into a set of non-overlapping randomly partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing. |