Fused Gromov-Wasserstein Correspondences

Overview

Fused Gromov-Wasserstein (FGW) 1 is a variant of the Gromov-Wasserstein distance that, in the context of GWProt, allows biochemical data to be incorporated into structural alignments.

In FGW, each residue is described not only by its spatial coordinates but also by additional data, such as hydrophobicity, isoelectric point, or a substitution score (e.g., from the BLOSUM62 matrix). FGW adds a penalty term for aligning biochemically dissimilar residues, enabling more biologically meaningful alignments.

Mathematical Formulation

Let \(\delta(x, y)\) denote the difference in the biochemical data associated with residues \(x\) and \(y\). The fused Gromov-Wasserstein distance between proteins \(X\) and \(Y\) is defined as:

\[FGW(X, Y) = \min_T \frac{1}{2} \left( \sum_{i, j, k, l} \left[ \alpha \cdot |d_X(x_i, x_j) - d_Y(y_k, y_l)|^2 + (1 - \alpha) \cdot \delta(x_i, y_k) \right] T_{i, k} T_{j, l} \right)^{1/2}\]

where \(\alpha \in [0, 1]\) determines the weight of the geometric cost relative to the biochemical penalty. Optimal values of \(\alpha\) may vary depending on the type of biochemical data used.

References

1

Vayer, T., Chapel, L., Flamary, R., Tavenard, R., & Courty, N. (2019). Fused Gromov-Wasserstein distance for structured objects. Advances in Neural Information Processing Systems, 32.