mbapy.stats.cluster

KMeans

Description

KMeans clustering algorithm implementation.

Attributes

space (list): A list of lists representing the search space for Bayesian optimization.
centers (np.ndarray): The final cluster centers.

Methods

reset: Reset the KMeans instance to its initial state.
loss_fn: Calculate the loss function for the given data and centers.
fit: Fit the KMeans model to the given data.
fit_times: Fit the model to the data multiple times and predict the cluster labels, return the best one.
fit_predict: Fit the model to the data and predict the cluster labels.
predict: Predict the cluster labels for the given data.

Notes

KMeans is suitable for smaller datasets as it iterates through all data points to minimize the variance within clusters.

Example

# Initialize KMeans model
kmeans = KMeans(n_clusters=3)
# Fit the model to the data
kmeans.fit(data)
# Predict the cluster labels for new data
labels = kmeans.predict(new_data)

KBayesian

Description

KBayesian is a subclass of KMeans that implements the Bayesian version of the K-means clustering algorithm. It extends the KMeans class and adds additional functionality for Bayesian optimization.

Attributes

space (list): A list of lists representing the search space for Bayesian optimization.
centers (np.ndarray): The final cluster centers.

Methods

reset: Reset the KBayesian instance to its initial state.
_init_space: Initialize the search space for Bayesian optimization.
_loss_fn: Calculate the loss function for Bayesian optimization.
_objective: Define the objective function for Bayesian optimization.
fit: Fit the KBayesian model to the data using Bayesian optimization.
predict: Predict the cluster labels for the given data.
fit_predict: Fit the model to the data and predict the cluster labels.

Notes

KBayesian uses Bayesian optimization to move cluster centers.

Example

# Initialize KBayesian model
kbayesian = KBayesian(n_clusters=3)
# Fit the model to the data using Bayesian optimization
kbayesian.fit(data)
# Predict the cluster labels for new data
labels = kbayesian.predict(new_data)

KOptim

Description

KOptim is a subclass of KMeans that implements the gradient optimization version of the K-means clustering algorithm.

Attributes

centers (np.ndarray): The final cluster centers.
loss (np.ndarray): The loss value.

Methods

fit: Fit the model to the given data.
fit_times: Fit the model to the given data for a specified number of times.

Notes

KOptim uses gradient optimization to determine cluster centers.

Example

# Initialize KOptim model
koptim = KOptim(n_clusters=3)
# Fit the model to the data
koptim.fit(data)

cluster

Description

Clusters data using various clustering methods.

Parameters

data (array-like): The input data to be clustered.
n_clusters (int): The number of clusters to create.
method (str): The clustering method to use, one of ['DBSCAN', 'Birch', 'KMeans', 'MiniBatchKMeans', 'MeanShift', 'GaussianMixture', 'AgglomerativeClustering', 'AffinityPropagation', 'BAKMeans', 'KBayesian', 'KOptim'].
norm (str, optional): The normalization method to use. Defaults to None.
norm_dim (int, optional): The dimension to normalize over. Defaults to None.
copy_norm (bool, optional): Whether to copy the data before normalizing. Defaults to True.
**kwargs: Additional keyword arguments specific to each clustering method.

Returns

labels (np.ndarray): The cluster labels.
centers (np.ndarray or None): The cluster centers. if is not supported, None will be returned.
loss (float): The loss value. if is not supported, -1 will be returned.

Notes

This function provides a unified interface for clustering data using different clustering methods.

Example

# Cluster the data using KMeans
labels, centers, loss = cluster(data, n_clusters=3, method='KMeans')