API Reference¶

This page provides a detailed API reference for the main classes and functions in fast-select.

Feature Selection Algorithms¶

ReliefF¶

class fast_select.ReliefF.ReliefF(n_features_to_select: int | float = 0.2, discrete_limit: int = 10, n_neighbors: int = 3, backend: str = 'auto', verbose: bool = False, n_jobs: int = -1)[source]¶

Bases: TransformerMixin, BaseEstimator

GPU and CPU-accelerated feature selection using the ReliefF algorithm.

This estimator provides a unified API for running ReliefF on either a CPU (using Numba’s parallel JIT) or a GPU (using Numba CUDA).

Parameters:

n_features_to_select (int | float, default=0.2) – The number of top features to select. If variable is a float, that percent of features will be selected (0.2 = 20% of features will be returned from transform or fit_transform). If variable is an int, that number of features will be returned.
discrete_limit (int, default=10) – The limit for the number of independent feature values to be considered discrete or continuous (affects distance calculation).
n_neighbors (int, default=3) – The number of nearest neighbors to use for score calculation.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend to use.
verbose (bool, default=False) – Controls whether progress updates are printed during the fit. Only avaliable if backend=’cpu’.
n_jobs (int, default=-1) – Controls the number of threads utilized by Numba while running on the cpu. -1 uses all threads avaliable by default. Set to a low number if experiencing difficulties and lagging running the script.

n_features_in_¶

The number of features seen during fit.

Type:: int

feature_importances_¶

The calculated importance scores for each feature.

Type:: ndarray of shape (n_features,)

effective_backend_¶

The backend that was actually used during fit (‘gpu’ or ‘cpu’).

Type:: str

fit(x: ndarray, y: ndarray)[source]¶: Calculates feature importances using the ReliefF algorithm on a GPU/CPU. … (docstring remains the same) …

fit_transform(x: ndarray, y: ndarray) → ndarray[source]¶

Fit to data, then transform it.

A convenience method that fits the model and applies the transformation to the same data.

Parameters:

x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).

Returns:

x_new – The transformed input samples.

Return type:

ndarray of shape (n_samples, n_features_to_select)

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') → ReliefF¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to fit.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for x parameter in fit.

selfobject
The updated object.

set_transform_request(*, x: bool | None | str = '$UNCHANGED$') → ReliefF¶

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to transform.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for x parameter in transform.

selfobject
The updated object.

transform(x: ndarray) → ndarray[source]¶

Reduces x to the selected features.

Parameters:: x (array-like of shape (n_samples, n_features)) – The input samples to transform.
Returns:: x_new – The input samples with only the selected features.
Return type:: ndarray of shape (n_samples, n_features_to_select)

SURF¶

class fast_select.SURF.SURF(n_features_to_select: int | float = 0.2, backend: str = 'auto', use_star: bool = False, discrete_limit: int = 10, n_jobs: int = -1, verbose: bool = False)[source]¶

Bases: TransformerMixin, BaseEstimator

GPU and CPU-accelerated feature selection using the SURF algorithm.

This estimator provides a unified, scikit-learn compatible API for running SURF or SURF* on either a CPU or a GPU. The implementation is designed for performance and scalability, avoiding the memory bottlenecks of older implementations by calculating distances on-the-fly.

Parameters:

n_features_to_select (int or float, default=0.2) – The number of top features to select. - If an int, the exact number of features to select. - If a float between (0, 1], the percentage of features to select.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend to use. ‘auto’ will use a GPU if available.
use_star (bool, default=False) – If True, runs the SURF* algorithm, which includes updates from “far” neighbors. If False (default), runs the standard SURF algorithm.
discrete_limit (int, default=10) – Features with this many or fewer unique values are treated as discrete.
n_jobs (int, default=-1) – Number of CPU threads to use for the ‘cpu’ backend. -1 means all. This parameter is ignored for the ‘gpu’ backend.
verbose (bool, default=False) – Controls whether to print progress messages during fit.

n_features_in_¶

The number of features seen during fit.

Type:: int

feature_importances_¶

The calculated importance scores for each feature.

Type:: ndarray of shape (n_features,)

top_features_¶

The indices of the selected top features.

Type:: ndarray of shape (n_features_to_select,)

effective_backend_¶

The backend that was actually used during fit (‘gpu’ or ‘cpu’).

Type:: str

fit(X: ndarray, y: ndarray)[source]¶

Calculates feature importances using the SURF or SURF* algorithm.

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples. NaN values are not supported.
y (array-like of shape (n_samples,)) – The target values (class labels). Must be numeric.

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X: ndarray, y: ndarray) → ndarray[source]¶

Fit to data, then transform it.

A convenience method that fits the model and applies the transformation to the same data.

Parameters:

x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).

Returns:

x_new – The transformed input samples.

Return type:

ndarray of shape (n_samples, n_features_to_select)

set_transform_request(*, x: bool | None | str = '$UNCHANGED$') → SURF¶

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to transform.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for x parameter in transform.

selfobject
The updated object.

transform(x: ndarray) → ndarray[source]¶

Reduces x to the selected features.

Parameters:: x (array-like of shape (n_samples, n_features)) – The input samples to transform.
Returns:: x_new – The input samples with only the selected features.
Return type:: ndarray of shape (n_samples, n_features_to_select)

MultiSURF¶

class fast_select.MultiSURF.MultiSURF(n_features_to_select: int | float = 0.2, backend: str = 'auto', use_star: bool = False, discrete_limit: int = 10, n_jobs: int = -1, verbose: bool = False)[source]¶

Bases: TransformerMixin, BaseEstimator

GPU and CPU-accelerated feature selection using the MultiSURF algorithm.

This estimator provides a unified API for running MultiSURF on either a CPU (using Numba’s parallel JIT) or a GPU (using Numba CUDA).

Parameters:

n_features_to_select (int, default=10) – The number of top features to select.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend to use. - ‘auto’: Use ‘gpu’ if a compatible NVIDIA GPU is detected, otherwise fall back to ‘cpu’. - ‘gpu’: Force use of the GPU. Raises an error if not available. - ‘cpu’: Force use of the CPU.
use_star (bool, default=False) – Whether to run the MultiSURF* adaptation of the algorithm. By default, the standard MultiSURF algorithm is used.
discrete_limit (int, default=10) – The limit of individual feature values to determine whether or not a given feature is discrete or continuous. (Effects distance calculation)
verbose (bool, default=False) – Controls whether progress updates are printed during the fit. Limited benefit currently, will be expanded in future versions.
n_jobs (int, default=-1) – Controls the number of threads utilized by Numba while running on the cpu. -1 uses all threads avaliable by default. Set to a low number if experiencing difficulties and lagging running the script.

n_features_in_¶

The number of features seen during fit.

Type:: int

feature_importances_¶

The calculated importance scores for each feature.

Type:: ndarray of shape (n_features,)

effective_backend_¶

The backend that was actually used during fit (‘gpu’ or ‘cpu’).

Type:: str

fit(x: ndarray, y: ndarray)[source]¶

Calculates feature importances using the MultiSURF algorithm on a GPU/CPU.

Parameters:

x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(x: ndarray, y: ndarray) → ndarray[source]¶

Fit to data, then transform it.

A convenience method that fits the model and applies the transformation to the same data.

Parameters:

x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).

Returns:

x_new – The transformed input samples.

Return type:

ndarray of shape (n_samples, n_features_to_select)

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') → MultiSURF¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to fit.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for x parameter in fit.

selfobject
The updated object.

set_transform_request(*, x: bool | None | str = '$UNCHANGED$') → MultiSURF¶

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to transform.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for x parameter in transform.

selfobject
The updated object.

transform(x: ndarray) → ndarray[source]¶

Reduces x to the selected features.

Parameters:: x (array-like of shape (n_samples, n_features)) – The input samples to transform.
Returns:: x_new – The input samples with only the selected features.
Return type:: ndarray of shape (n_samples, n_features_to_select)

TuRF¶

class fast_select.TuRF.TuRF(estimator, n_features_to_select: int = 10, pct_remove: float = 0.1, n_iterations: int | None = None, verbose: bool = False)[source]¶

Bases: TransformerMixin, BaseEstimator

A meta-estimator that implements the Iterative Relief (TuRF) algorithm.

TuRF iteratively removes features with the lowest scores as determined by a base Relief-based estimator. This process is repeated until a desired number of features remains, which can improve robustness against noise.

This implementation is designed to wrap any scikit-learn compatible estimator that provides a feature_importances_ attribute after fitting, such as the ReliefF, SURF, or MultiSURF classes in this library.

Parameters:

estimator (estimator object) – The base estimator to use for scoring features at each iteration. This object is cloned and not modified.
n_features_to_select (int, default=10) – The final number of features to select.
pct_remove (float, default=0.1) – The percentage of the remaining features to remove at each iteration. Must be between 0 and 1.
n_iterations (int or None, default=None) – The number of iterations to run. If None, the process continues until the number of features is less than or equal to n_features_to_select.
verbose (bool, default=False) – Controls whether progress updates are printed during the fit. Limited benefit currently, will be expanded in future versions.

n_features_in_¶

The number of features seen during fit.

Type:: int

feature_importances_¶

The feature importance scores calculated by the base estimator on the full, original feature set during the first iteration.

Type:: ndarray of shape (n_features_in_,)

top_features_¶

The indices of the selected top features, sorted by importance.

Type:: ndarray of shape (n_features_to_select,)

fit(X: ndarray, y: ndarray)[source]¶

Fits the TuRF model.

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X: ndarray, y: ndarray) → ndarray[source]¶: Fit to data, then transform it.

transform(X: ndarray) → ndarray[source]¶: Reduces X to the selected features.

Chi2¶

class fast_select.Chi2.chi2(X: ndarray, y: ndarray)[source]¶

Bases:

Computes Chi-squared statistics between each feature and the target vector.

This function calculates the Chi-squared test for independence between each non-negative feature and the class labels (similar to SciKit-Learn). It is suitable for features that represent frequencies or counts (e.g., word counts in text classification).

Parameters:

X (np.ndarray) – The input sample matrix of shape (n_samples, n_features). Must contain non-negative, count-based feature values.
y (np.ndarray) – The target vector of class labels, shape (n_samples,).

Returns:

A tuple containing:

chi2_stats: The Chi-squared statistics for each feature.
p_values: The p-values for each feature.

Return type:

tuple[np.ndarray, np.ndarray]

mRMR¶

class fast_select.mRMR.mRMR(n_features_to_select: int, method: str = 'MID', backend: str = 'cpu')[source]¶

Bases: BaseEstimator, TransformerMixin

A scikit-learn compatible feature selector based on the mRMR algorithm.

This implementation is designed for discrete data and uses Numba for high-performance computation of mutual information matrices.

Parameters:

n_features_to_select (int) – The number of top features to select.
method ({'MID', 'MIQ'}, default='MID') – The mRMR selection criterion to use. - ‘MID’ (Mutual Information Difference): f_score = I(f; y) - mean(I(f; S)) - ‘MIQ’ (Mutual Information Quotient): f_score = I(f; y) / mean(I(f; S))
backend ({'cpu', 'gpu'}, default='cpu') – The computational backend to use. ‘gpu’ requires a compatible NVIDIA GPU and Numba with CUDA support installed.

fit(X: ndarray, y: ndarray)[source]¶

Fits the mRMR model to select the best features.

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples. Assumed to be discrete.
y (array-like of shape (n_samples,)) – The target values. Assumed to be discrete.

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X: ndarray, y: ndarray) → ndarray[source]¶: Fit to data, then transform it.

transform(X: ndarray) → ndarray[source]¶: Reduces X to the selected features.

CFS¶

class fast_select.CFS.CFS(n_bins=10, strategy='uniform', backend='auto', n_jobs=-1)[source]¶

Bases: BaseEstimator, SelectorMixin

GPU and CPU-accelerated Correlation-based Feature Selection (CFS).

This selector evaluates feature subsets on the hypothesis that a good subset contains features highly correlated with the class, yet uncorrelated with each other. Symmetrical Uncertainty is used as the correlation measure.

The algorithm performs a greedy “best-first” search to find the best subset. It supports both CPU and GPU backends for the computationally intensive correlation matrix calculation.

Parameters:

n_bins (int, default=10) – Number of bins for discretizing continuous features.
strategy ({'uniform', 'quantile', 'kmeans'}, default='uniform') – Strategy for binning continuous features.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend. ‘auto’ uses GPU if available.
n_jobs (int, default=-1) – Number of CPU threads to use. Ignored for the ‘gpu’ backend.

n_features_in_¶

Number of features seen during fit.

Type:: int

feature_names_in_¶

Names of features seen during fit.

Type:: ndarray of shape (n_features_in_,)

selected_indices_¶

Indices of the selected features.

Type:: ndarray of shape (n_selected_features,)

support_mask_¶

A boolean mask of the selected features.

Type:: ndarray of shape (n_features_in_,)

merit_¶

The CFS merit score of the selected feature subset.

Type:: float

fit(X, y)[source]¶

Fits the CFS model to find the best feature subset by evaluating feature correlation with the target and inter-feature correlation.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data. Can be continuous, discrete, or mixed.
y (array-like of shape (n_samples,)) – Target values. Must be discrete (Classification).

Returns:

self – Returns the instance itself.

Return type:

object

transform(X)[source]¶

Reduces X to the selected features.

Parameters:: X (array-like of shape (n_samples, n_features)) – The input samples to transform.
Returns:: X_new – The input samples with only the selected features.
Return type:: ndarray of shape (n_samples, n_selected_features)

MDR¶

class fast_select.MDR.MDR(k: int = 2, cv: int = 10, backend: str = 'auto', verbose: bool = False)[source]¶

Bases: BaseEstimator, ClassifierMixin

Multifactor Dimensionality Reduction with GPU or CPU backend. This implementation targets the canonical use-case of MDR: SNP genotypes coded 0, 1, 2. All features must take exactly three discrete values (0/1/2); other data types should be encoded or discretised accordingly before calling fit.

Parameters:

k (int, default=2) – Interaction order to search (e.g. k=2 - pairwise). Max is 6, and this is only feasible with powerful hardware (and lots of memory), or with very small datasets.
cv (int, default=10) – Stratified K-folds for model selection.
backend ({'auto', 'CPU', 'GPU'}, default='auto') – Execution backend preference.
verbose (bool, default=False) – Print progress information during training.

fit(X, y)[source]¶

Fits the MDR model to find the best feature subset by evaluating feature correlation with the target and inter-feature correlation.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data. Can be continuous, discrete, or mixed.
y (array-like of shape (n_samples,)) – Target values. Must be discrete (Classification).

Returns:

self – Returns the instance itself.

Return type:

object

predict(X)[source]¶

predict_proba(X)[source]¶

Not implemented.

MDR is fundamentally a hard classifier; this implementation does not attempt to derive calibrated probabilities. If you need risk probabilities, consider wrapping MDR in scikit-learn’s CalibratedClassifierCV or implement cell-frequency posteriors.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MDR¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to score.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for sample_weight parameter in score.

selfobject
The updated object.

transform(X)[source]¶