Spark Rapids ML#
Feature#
Classification#
|
LogisticRegression is a machine learning model where the response y is modeled by the sigmoid (or softmax for more than 2 classes) function applied to a linear combination of the features in X. |
|
Model fitted by |
|
RandomForestClassifier implements a Random Forest classifier model which fits multiple decision tree classifiers in an ensemble. |
|
Model fitted by |
Clustering#
|
The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a non-parametric data clustering algorithm based on data density. |
|
|
|
KMeans algorithm partitions data points into a fixed number (denoted as k) of clusters. |
|
KMeans gpu model for clustering input vectors to learned k centers. |
Regression#
|
LinearRegression is a machine learning model where the response y is modeled by a linear combination of the predictors in X. |
|
Model fitted by |
|
RandomForestRegressor implements a Random Forest regressor model which fits multiple decision tree in an ensemble. |
|
Model fitted by |
Nearest Neighbors#
|
ApproximateNearestNeighbors retrieves k approximate nearest neighbors (ANNs) in item vectors for each query. |
|
|
|
NearestNeighbors retrieves the exact k nearest neighbors in item vectors for each query vector. |
|
Tuning#
|
K-fold cross validation performs model selection by splitting the dataset into a set of non-overlapping randomly partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing. |