mbapy.stats.cluster
KMeans
Description
KMeans clustering algorithm implementation.
Attributes
- space (list): A list of lists representing the search space for Bayesian optimization.
- centers (np.ndarray): The final cluster centers.
Methods
- reset: Reset the KMeans instance to its initial state.
- loss_fn: Calculate the loss function for the given data and centers.
- fit: Fit the KMeans model to the given data.
- fit_times: Fit the model to the data multiple times and predict the cluster labels, return the best one.
- fit_predict: Fit the model to the data and predict the cluster labels.
- predict: Predict the cluster labels for the given data.
Notes
- KMeans is suitable for smaller datasets as it iterates through all data points to minimize the variance within clusters.
Example
# Initialize KMeans model
kmeans = KMeans(n_clusters=3)
# Fit the model to the data
kmeans.fit(data)
# Predict the cluster labels for new data
labels = kmeans.predict(new_data)
KBayesian
Description
KBayesian is a subclass of KMeans that implements the Bayesian version of the K-means clustering algorithm. It extends the KMeans class and adds additional functionality for Bayesian optimization.
Attributes
- space (list): A list of lists representing the search space for Bayesian optimization.
- centers (np.ndarray): The final cluster centers.
Methods
- reset: Reset the KBayesian instance to its initial state.
- _init_space: Initialize the search space for Bayesian optimization.
- _loss_fn: Calculate the loss function for Bayesian optimization.
- _objective: Define the objective function for Bayesian optimization.
- fit: Fit the KBayesian model to the data using Bayesian optimization.
- predict: Predict the cluster labels for the given data.
- fit_predict: Fit the model to the data and predict the cluster labels.
Notes
- KBayesian uses Bayesian optimization to move cluster centers.
Example
# Initialize KBayesian model
kbayesian = KBayesian(n_clusters=3)
# Fit the model to the data using Bayesian optimization
kbayesian.fit(data)
# Predict the cluster labels for new data
labels = kbayesian.predict(new_data)
KOptim
Description
KOptim is a subclass of KMeans that implements the gradient optimization version of the K-means clustering algorithm.
Attributes
- centers (np.ndarray): The final cluster centers.
- loss (np.ndarray): The loss value.
Methods
- fit: Fit the model to the given data.
- fit_times: Fit the model to the given data for a specified number of times.
Notes
- KOptim uses gradient optimization to determine cluster centers.
Example
# Initialize KOptim model
koptim = KOptim(n_clusters=3)
# Fit the model to the data
koptim.fit(data)
cluster
Description
Clusters data using various clustering methods.
Parameters
- data (array-like): The input data to be clustered.
- n_clusters (int): The number of clusters to create.
- method (str): The clustering method to use, one of ['DBSCAN', 'Birch', 'KMeans', 'MiniBatchKMeans', 'MeanShift', 'GaussianMixture', 'AgglomerativeClustering', 'AffinityPropagation', 'BAKMeans', 'KBayesian', 'KOptim'].
- norm (str, optional): The normalization method to use. Defaults to None.
- norm_dim (int, optional): The dimension to normalize over. Defaults to None.
- copy_norm (bool, optional): Whether to copy the data before normalizing. Defaults to True.
- **kwargs: Additional keyword arguments specific to each clustering method.
Returns
- labels (np.ndarray): The cluster labels.
- centers (np.ndarray or None): The cluster centers. if is not supported, None will be returned.
- loss (float): The loss value. if is not supported, -1 will be returned.
Notes
- This function provides a unified interface for clustering data using different clustering methods.
Example
# Cluster the data using KMeans
labels, centers, loss = cluster(data, n_clusters=3, method='KMeans')