Pairwise Constrained K-Means Clustering

Tags: adaptive-intelligence

A variant of K-Means clustering which uses semi-supervised learning to produce more predictable and reliable clustering models through [see page 9, constraints] on the learning process.

The final algorithm here is the same as the k-means algorithm, except we've adjusted the error function to include our constraints.

Assumptions

This approach generally [see page 5, assumes]:

Assumption	Description
Smoothness	Points that're close in the input space should lead to similar outputs (example: same classes).
Cluster assumption	If points are in the same cluster then they're likely to be of the same class.

Because of these the decision boundary for a clustering be in a low-density region, where there aren't very many data-points clumped together. The data clusters should be relatively well separated, meaning the boundary ends up far away from the prototypes.

Formulation

Constraints

The constraints we use in our model are defined as relationships between two or more points in our training set:

Constraint	Description
Must Link	These two points must occur in the same cluster.
Can't link	These two points cannot occur in the same cluster.

Observe that in reality these are really only one declaration. We partition a subset of our training data into classes and thereafter assert whether all the points from class are clustered into the same cluster (must-link) and away from any points in a different class (can't-link).

For example: consider a data-set like [see page 6, so] where we [see page 8, know] the classes of some points in the set. Point 2 and 3 are both yellow, if we cluster them into seperate clusters then we should have a large error.

Updated Error Function

We expand our previous definition of the [see page 10, objective function] to be:

\begin{align} \label{eq:obj-fun} E_{PCKM} = \sum_^{K} \sum_{\mu \in C_k}^{n_k} [ ({x}^{\mu} - {m}_{k})^2 + \sum_{\mu' \in ML(\mu)} ({x}^{\mu} - {x}^{\mu'})2 (1 - \delta({C}^{\mu}, {C}^{\mu'})) + \sum_{\mu' \in CL(\mu)} (F^2 - ({x}^{\mu} - {x}^{\mu'}))2 \delta({C}^{\mu}, {C}^{\mu'}) ] \end{align}

Where:

The [see page 11, \nth{1}] term matches our previous objective function.
The [see page 12, \nth{2}] term penalises any clusterings which violate a known must-link relationship by adding on the distance between the two points. This only applies when two points must be in the same cluster but currently aren't.
The [see page 13, \nth{3}] term penalises putting two points in the same cluster when we know they cant-link together. \( F^2 \) is the error penalty (largest separation between two points in our data-set) \( \max{|{x}^{\mu} - {x}^{\mu'}|} \).

Note: The delta function is a Kronecker delta.

Observe that both the \nth{2} and \nth{3} terms in

\) won't change so how we update the centres won't change. All that's changed is how we [see page 16, assign] points to a cluster, by taking account of the constraints.

Links to this note

Metric Pairwise K-Means Clustering