API Reference¶
This page provides a detailed API reference for the main classes and functions in fast-select.
Feature Selection Algorithms¶
ReliefF¶
- class fast_select.ReliefF.ReliefF(n_features_to_select: int | float = 0.2, discrete_limit: int = 10, n_neighbors: int = 3, backend: str = 'auto', verbose: bool = False, n_jobs: int = -1)[source]¶
Bases:
TransformerMixin,BaseEstimatorGPU and CPU-accelerated feature selection using the ReliefF algorithm.
This estimator provides a unified API for running ReliefF on either a CPU (using Numba’s parallel JIT) or a GPU (using Numba CUDA).
- Parameters:
n_features_to_select (int | float, default=0.2) – The number of top features to select. If variable is a float, that percent of features will be selected (0.2 = 20% of features will be returned from transform or fit_transform). If variable is an int, that number of features will be returned.
discrete_limit (int, default=10) – The limit for the number of independent feature values to be considered discrete or continuous (affects distance calculation).
n_neighbors (int, default=3) – The number of nearest neighbors to use for score calculation.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend to use.
verbose (bool, default=False) – Controls whether progress updates are printed during the fit. Only avaliable if backend=’cpu’.
n_jobs (int, default=-1) – Controls the number of threads utilized by Numba while running on the cpu. -1 uses all threads avaliable by default. Set to a low number if experiencing difficulties and lagging running the script.
- n_features_in_¶
The number of features seen during fit.
- Type:
int
- feature_importances_¶
The calculated importance scores for each feature.
- Type:
ndarray of shape (n_features,)
- effective_backend_¶
The backend that was actually used during fit (‘gpu’ or ‘cpu’).
- Type:
str
- fit(x: ndarray, y: ndarray)[source]¶
Calculates feature importances using the ReliefF algorithm on a GPU/CPU. … (docstring remains the same) …
- fit_transform(x: ndarray, y: ndarray) ndarray[source]¶
Fit to data, then transform it.
A convenience method that fits the model and applies the transformation to the same data.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).
- Returns:
x_new – The transformed input samples.
- Return type:
ndarray of shape (n_samples, n_features_to_select)
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') ReliefF¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xparameter infit.
- selfobject
The updated object.
- set_transform_request(*, x: bool | None | str = '$UNCHANGED$') ReliefF¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xparameter intransform.
- selfobject
The updated object.
- transform(x: ndarray) ndarray[source]¶
Reduces x to the selected features.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The input samples to transform.
- Returns:
x_new – The input samples with only the selected features.
- Return type:
ndarray of shape (n_samples, n_features_to_select)
SURF¶
- class fast_select.SURF.SURF(n_features_to_select: int | float = 0.2, backend: str = 'auto', use_star: bool = False, discrete_limit: int = 10, n_jobs: int = -1, verbose: bool = False)[source]¶
Bases:
TransformerMixin,BaseEstimatorGPU and CPU-accelerated feature selection using the SURF algorithm.
This estimator provides a unified, scikit-learn compatible API for running SURF or SURF* on either a CPU or a GPU. The implementation is designed for performance and scalability, avoiding the memory bottlenecks of older implementations by calculating distances on-the-fly.
- Parameters:
n_features_to_select (int or float, default=0.2) – The number of top features to select. - If an int, the exact number of features to select. - If a float between (0, 1], the percentage of features to select.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend to use. ‘auto’ will use a GPU if available.
use_star (bool, default=False) – If True, runs the SURF* algorithm, which includes updates from “far” neighbors. If False (default), runs the standard SURF algorithm.
discrete_limit (int, default=10) – Features with this many or fewer unique values are treated as discrete.
n_jobs (int, default=-1) – Number of CPU threads to use for the ‘cpu’ backend. -1 means all. This parameter is ignored for the ‘gpu’ backend.
verbose (bool, default=False) – Controls whether to print progress messages during fit.
- n_features_in_¶
The number of features seen during fit.
- Type:
int
- feature_importances_¶
The calculated importance scores for each feature.
- Type:
ndarray of shape (n_features,)
- top_features_¶
The indices of the selected top features.
- Type:
ndarray of shape (n_features_to_select,)
- effective_backend_¶
The backend that was actually used during fit (‘gpu’ or ‘cpu’).
- Type:
str
- fit(X: ndarray, y: ndarray)[source]¶
Calculates feature importances using the SURF or SURF* algorithm.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples. NaN values are not supported.
y (array-like of shape (n_samples,)) – The target values (class labels). Must be numeric.
- Returns:
self – Returns the instance itself.
- Return type:
object
- fit_transform(X: ndarray, y: ndarray) ndarray[source]¶
Fit to data, then transform it.
A convenience method that fits the model and applies the transformation to the same data.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).
- Returns:
x_new – The transformed input samples.
- Return type:
ndarray of shape (n_samples, n_features_to_select)
- set_transform_request(*, x: bool | None | str = '$UNCHANGED$') SURF¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xparameter intransform.
- selfobject
The updated object.
- transform(x: ndarray) ndarray[source]¶
Reduces x to the selected features.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The input samples to transform.
- Returns:
x_new – The input samples with only the selected features.
- Return type:
ndarray of shape (n_samples, n_features_to_select)
MultiSURF¶
- class fast_select.MultiSURF.MultiSURF(n_features_to_select: int | float = 0.2, backend: str = 'auto', use_star: bool = False, discrete_limit: int = 10, n_jobs: int = -1, verbose: bool = False)[source]¶
Bases:
TransformerMixin,BaseEstimatorGPU and CPU-accelerated feature selection using the MultiSURF algorithm.
This estimator provides a unified API for running MultiSURF on either a CPU (using Numba’s parallel JIT) or a GPU (using Numba CUDA).
- Parameters:
n_features_to_select (int, default=10) – The number of top features to select.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend to use. - ‘auto’: Use ‘gpu’ if a compatible NVIDIA GPU is detected, otherwise fall back to ‘cpu’. - ‘gpu’: Force use of the GPU. Raises an error if not available. - ‘cpu’: Force use of the CPU.
use_star (bool, default=False) – Whether to run the MultiSURF* adaptation of the algorithm. By default, the standard MultiSURF algorithm is used.
discrete_limit (int, default=10) – The limit of individual feature values to determine whether or not a given feature is discrete or continuous. (Effects distance calculation)
verbose (bool, default=False) – Controls whether progress updates are printed during the fit. Limited benefit currently, will be expanded in future versions.
n_jobs (int, default=-1) – Controls the number of threads utilized by Numba while running on the cpu. -1 uses all threads avaliable by default. Set to a low number if experiencing difficulties and lagging running the script.
- n_features_in_¶
The number of features seen during fit.
- Type:
int
- feature_importances_¶
The calculated importance scores for each feature.
- Type:
ndarray of shape (n_features,)
- effective_backend_¶
The backend that was actually used during fit (‘gpu’ or ‘cpu’).
- Type:
str
- fit(x: ndarray, y: ndarray)[source]¶
Calculates feature importances using the MultiSURF algorithm on a GPU/CPU.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).
- Returns:
self – Returns the instance itself.
- Return type:
object
- fit_transform(x: ndarray, y: ndarray) ndarray[source]¶
Fit to data, then transform it.
A convenience method that fits the model and applies the transformation to the same data.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like of shape (n_samples,)) – The target values (class labels).
- Returns:
x_new – The transformed input samples.
- Return type:
ndarray of shape (n_samples, n_features_to_select)
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') MultiSURF¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xparameter infit.
- selfobject
The updated object.
- set_transform_request(*, x: bool | None | str = '$UNCHANGED$') MultiSURF¶
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
xparameter intransform.
- selfobject
The updated object.
- transform(x: ndarray) ndarray[source]¶
Reduces x to the selected features.
- Parameters:
x (array-like of shape (n_samples, n_features)) – The input samples to transform.
- Returns:
x_new – The input samples with only the selected features.
- Return type:
ndarray of shape (n_samples, n_features_to_select)
TuRF¶
- class fast_select.TuRF.TuRF(estimator, n_features_to_select: int = 10, pct_remove: float = 0.1, n_iterations: int | None = None, verbose: bool = False)[source]¶
Bases:
TransformerMixin,BaseEstimatorA meta-estimator that implements the Iterative Relief (TuRF) algorithm.
TuRF iteratively removes features with the lowest scores as determined by a base Relief-based estimator. This process is repeated until a desired number of features remains, which can improve robustness against noise.
This implementation is designed to wrap any scikit-learn compatible estimator that provides a feature_importances_ attribute after fitting, such as the ReliefF, SURF, or MultiSURF classes in this library.
- Parameters:
estimator (estimator object) – The base estimator to use for scoring features at each iteration. This object is cloned and not modified.
n_features_to_select (int, default=10) – The final number of features to select.
pct_remove (float, default=0.1) – The percentage of the remaining features to remove at each iteration. Must be between 0 and 1.
n_iterations (int or None, default=None) – The number of iterations to run. If None, the process continues until the number of features is less than or equal to n_features_to_select.
verbose (bool, default=False) – Controls whether progress updates are printed during the fit. Limited benefit currently, will be expanded in future versions.
- n_features_in_¶
The number of features seen during fit.
- Type:
int
- feature_importances_¶
The feature importance scores calculated by the base estimator on the full, original feature set during the first iteration.
- Type:
ndarray of shape (n_features_in_,)
- top_features_¶
The indices of the selected top features, sorted by importance.
- Type:
ndarray of shape (n_features_to_select,)
Chi2¶
- class fast_select.Chi2.chi2(X: ndarray, y: ndarray)[source]¶
Bases:
Computes Chi-squared statistics between each feature and the target vector.
This function calculates the Chi-squared test for independence between each non-negative feature and the class labels (similar to SciKit-Learn). It is suitable for features that represent frequencies or counts (e.g., word counts in text classification).
- Parameters:
X (np.ndarray) – The input sample matrix of shape (n_samples, n_features). Must contain non-negative, count-based feature values.
y (np.ndarray) – The target vector of class labels, shape (n_samples,).
- Returns:
- A tuple containing:
chi2_stats: The Chi-squared statistics for each feature.
p_values: The p-values for each feature.
- Return type:
tuple[np.ndarray, np.ndarray]
mRMR¶
- class fast_select.mRMR.mRMR(n_features_to_select: int, method: str = 'MID', backend: str = 'cpu')[source]¶
Bases:
BaseEstimator,TransformerMixinA scikit-learn compatible feature selector based on the mRMR algorithm.
This implementation is designed for discrete data and uses Numba for high-performance computation of mutual information matrices.
- Parameters:
n_features_to_select (int) – The number of top features to select.
method ({'MID', 'MIQ'}, default='MID') – The mRMR selection criterion to use. - ‘MID’ (Mutual Information Difference): f_score = I(f; y) - mean(I(f; S)) - ‘MIQ’ (Mutual Information Quotient): f_score = I(f; y) / mean(I(f; S))
backend ({'cpu', 'gpu'}, default='cpu') – The computational backend to use. ‘gpu’ requires a compatible NVIDIA GPU and Numba with CUDA support installed.
- fit(X: ndarray, y: ndarray)[source]¶
Fits the mRMR model to select the best features.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples. Assumed to be discrete.
y (array-like of shape (n_samples,)) – The target values. Assumed to be discrete.
- Returns:
self – Returns the instance itself.
- Return type:
object
CFS¶
- class fast_select.CFS.CFS(n_bins=10, strategy='uniform', backend='auto', n_jobs=-1)[source]¶
Bases:
BaseEstimator,SelectorMixinGPU and CPU-accelerated Correlation-based Feature Selection (CFS).
This selector evaluates feature subsets on the hypothesis that a good subset contains features highly correlated with the class, yet uncorrelated with each other. Symmetrical Uncertainty is used as the correlation measure.
The algorithm performs a greedy “best-first” search to find the best subset. It supports both CPU and GPU backends for the computationally intensive correlation matrix calculation.
- Parameters:
n_bins (int, default=10) – Number of bins for discretizing continuous features.
strategy ({'uniform', 'quantile', 'kmeans'}, default='uniform') – Strategy for binning continuous features.
backend ({'auto', 'gpu', 'cpu'}, default='auto') – The compute backend. ‘auto’ uses GPU if available.
n_jobs (int, default=-1) – Number of CPU threads to use. Ignored for the ‘gpu’ backend.
- n_features_in_¶
Number of features seen during fit.
- Type:
int
- feature_names_in_¶
Names of features seen during fit.
- Type:
ndarray of shape (n_features_in_,)
- selected_indices_¶
Indices of the selected features.
- Type:
ndarray of shape (n_selected_features,)
- support_mask_¶
A boolean mask of the selected features.
- Type:
ndarray of shape (n_features_in_,)
- merit_¶
The CFS merit score of the selected feature subset.
- Type:
float
- fit(X, y)[source]¶
Fits the CFS model to find the best feature subset by evaluating feature correlation with the target and inter-feature correlation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data. Can be continuous, discrete, or mixed.
y (array-like of shape (n_samples,)) – Target values. Must be discrete (Classification).
- Returns:
self – Returns the instance itself.
- Return type:
object
MDR¶
- class fast_select.MDR.MDR(k: int = 2, cv: int = 10, backend: str = 'auto', verbose: bool = False)[source]¶
Bases:
BaseEstimator,ClassifierMixinMultifactor Dimensionality Reduction with GPU or CPU backend. This implementation targets the canonical use-case of MDR: SNP genotypes coded 0, 1, 2. All features must take exactly three discrete values (0/1/2); other data types should be encoded or discretised accordingly before calling fit.
- Parameters:
k (int, default=2) – Interaction order to search (e.g. k=2 - pairwise). Max is 6, and this is only feasible with powerful hardware (and lots of memory), or with very small datasets.
cv (int, default=10) – Stratified K-folds for model selection.
backend ({'auto', 'CPU', 'GPU'}, default='auto') – Execution backend preference.
verbose (bool, default=False) – Print progress information during training.
- fit(X, y)[source]¶
Fits the MDR model to find the best feature subset by evaluating feature correlation with the target and inter-feature correlation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data. Can be continuous, discrete, or mixed.
y (array-like of shape (n_samples,)) – Target values. Must be discrete (Classification).
- Returns:
self – Returns the instance itself.
- Return type:
object
- predict_proba(X)[source]¶
Not implemented.
MDR is fundamentally a hard classifier; this implementation does not attempt to derive calibrated probabilities. If you need risk probabilities, consider wrapping MDR in scikit-learn’s CalibratedClassifierCV or implement cell-frequency posteriors.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MDR¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- selfobject
The updated object.