Building a tree of clusters, also known as a dendrogram.
Hierarchical Clustering is another powerful unsupervised learning algorithm that creates a hierarchy of clusters. Unlike K-Means, it doesn't require you to specify the number of clusters beforehand. Instead, it produces a tree-based representation of the data, called a dendrogram, which can be used to decide on the number of clusters. There are two main types of hierarchical clustering: agglomerative and divisive. Agglomerative (bottom-up) is the more common approach. It starts by treating each data point as its own cluster. Then, at each step, it merges the two closest clusters until only one single cluster remains. The result is a tree-like structure where the root is the single cluster containing all data, and the leaves are the individual data points. Divisive (top-down) clustering works in the opposite direction. It starts with all data points in one cluster and recursively splits them into smaller clusters. A key component of this algorithm is the linkage criterion, which determines the distance between clusters. Common criteria include 'ward' (minimizes the variance of the clusters being merged), 'average' (uses the average of the distances between all pairs of points), and 'complete' (uses the maximum distances between all pairs of points). By looking at the dendrogram, a data scientist can choose a distance threshold to cut the tree at, which defines the final number of clusters.