Metric Pairwise K-Means Clustering
A variant of pairwise K-means clustering which works by [see page 3, warping] the data-space to minimise the separation within clusters and maximise the separation of clusters.
Formulation
\begin{align} |{x}^{\mu} - {m}_{k}|^2_w = ({x}^{\mu} - {m}_{k})^{T} W ({x}^{\mu} - {m}_{k}) \end{align}
Where \(W\) is a matrix which shrinks or expands each dimension of the input data to clearly separate clusters.
With this we define our new [see page 5, objective error function], with added constraints, as:
\begin{align} \label{eq:obj-fun} {E}_{MPCKM} &= \sum_^{K} \sum_{\mu \in C_k}^{n_k} \sum_{l} [ {W}_{ll} ({x}^{\mu} - {m}_{k})^2 \\
&= - \log{{W}\_{ll}} \\\\
&+ \sum\_{\mu' \in \text{ML}(\mu)}{{W}\_{ll} ({x}^{\mu} - {x}^{\mu'})^2 (1 - \delta({C}^{\mu}, {C}^{\mu'}))} \\\\
&+ \sum\_{\mu' \in \text{CL}(\mu)}{{W}\_{ll} ({F}^{2} - ({x}^{\mu} - {x}^{\mu'})^2) \delta({C}^{\mu}, {C}^{\mu'})} ]
\end{align}
Note: The \( - \text{log}(W_{ll}) \) term is there to prevent the weights of our network from becoming negative (and flipping the axis).
Like with pairwise clustering neither of the adjusted terms in eq:obj-fun depend on the cluster centers so their affect on our update equation is negligible. However we need to [see page 11, minimise] the error with respect to both \(m_k\) and \(W_{ll}\), using the latter to update the weight matrix \( W \) after updating the cluster centers in the K-means algorithm.
Therefore the delta-equation with respect to \( {W}_{jj} \) is: TODO: Copy in derivation [see page 13, math].
\begin{align} \frac{\delta {{E}_{MPCKM}}}{\delta {W}_{jj}} = - \frac{n}{{W}_{jj}} + \sum_^{K} \sum_{i \in C_k}^{n_k} [ ({x}_{ij} - {m}_{kj})^2 + \text{penalties} ] \end{align}