Metric Pairwise K-Means Clustering

Tags: adaptive-intelligence

A variant of pairwise K-means clustering which works by [see page 3, warping] the data-space to minimise the separation within clusters and maximise the separation of clusters.

Formulation

\begin{align} |{x}^{\mu} - {m}_{k}|^2_w = ({x}^{\mu} - {m}_{k})^{T} W ({x}^{\mu} - {m}_{k}) \end{align}

Where \(W\) is a matrix which shrinks or expands each dimension of the input data to clearly separate clusters.

With this we define our new [see page 5, objective error function], with added constraints, as:

\begin{align} \label{eq:obj-fun} {E}_{MPCKM} &= \sum_^{K} \sum_{\mu \in C_k}^{n_k} \sum_{l} [ {W}_{ll} ({x}^{\mu} - {m}_{k})^2 \\

          &= - \log{{W}\_{ll}} \\\\
          &+ \sum\_{\mu' \in \text{ML}(\mu)}{{W}\_{ll} ({x}^{\mu} - {x}^{\mu'})^2 (1 - \delta({C}^{\mu}, {C}^{\mu'}))} \\\\
          &+ \sum\_{\mu' \in \text{CL}(\mu)}{{W}\_{ll} ({F}^{2} - ({x}^{\mu} - {x}^{\mu'})^2) \delta({C}^{\mu}, {C}^{\mu'})} ]

\end{align}

Note: The \( - \text{log}(W_{ll}) \) term is there to prevent the weights of our network from becoming negative (and flipping the axis).

Like with pairwise clustering neither of the adjusted terms in eq:obj-fun depend on the cluster centers so their affect on our update equation is negligible. However we need to [see page 11, minimise] the error with respect to both \(m_k\) and \(W_{ll}\), using the latter to update the weight matrix \( W \) after updating the cluster centers in the K-means algorithm.

Therefore the delta-equation with respect to \( {W}_{jj} \) is: TODO: Copy in derivation [see page 13, math].

\begin{align} \frac{\delta {{E}_{MPCKM}}}{\delta {W}_{jj}} = - \frac{n}{{W}_{jj}} + \sum_^{K} \sum_{i \in C_k}^{n_k} [ ({x}_{ij} - {m}_{kj})^2 + \text{penalties} ] \end{align}