Name	Name	Last commit message	Last commit date
parent directory ..
Lab4.ipynb	Lab4.ipynb
README.md	README.md
data3.txt	data3.txt

Name

Last commit message

Last commit date

Lab4

Hierarchical clustering

The data consists of 2 feature columns. Dataset.

Load the data according to your variant. Construct a graphical representation of the experimental data.
Compute distances between objects. Use measures to calculate distances: Euclidean distance, standardized Euclidean distance, Minkowski distance with p = 4.
Perform cluster analysis of the original data using hierarchical clustering methods: single linkage clustering, complete linkage clustering, median linkage clustering.
Perform clustering quality analysis by calculating the cophenetic correlation coefficient. Fill in the table for the cophenetic correlation coefficient.
Determine the most and least effective hierarchical clustering methods for analyzing the original dataset (maximum and minimum coefficients and their corresponding clustering methods). For the most effective hierarchical clustering method, construct a dendrogram of the clustering analysis results.
Determine the number of reliable clusters. To identify significant clusters, use a threshold value calculated using distance metrics or by specifying a fixed number of clusters.
Calculate the centroids and intra-cluster dispersion of the obtained clusters, geometric distances from the elements to the cluster centers, and distances between the cluster centers. Graphically display the identified clusters and their centroids (use a scatter plot in color).

docs"

The data was loaded from a txt file and converted into a Pandas DataFrame.
A scatter plot was constructed showing all values on the plane, with possible clusters also indicated by gradients.
The elbow method was used to determine the number of clusters, which in this case is 4 clusters.
A table was created to store correlation coefficients.
Hierarchical clustering was performed using the scipy library for three different linkage methods. For each linkage method, the correlation coefficient was calculated and entered into the table.
For the best clustering method (standardized Euclidean distance with median linkage), a dendrogram was constructed showing the clustering process.
The cluster centers and dispersion within each cluster were calculated, along with descriptive statistics for each cluster.
A jointplot was constructed to visualize the distribution of objects across clusters.