Multi-modal Machine Learning

Definitions

modality: the way in which something happens or is experienced. Typically associated with sensory modalities.
multi-modal models: models that can process and relate information from multiple modalities (e.g. speech and vision).

representation: learning how to represent and summarize multimodal data in a way that exploits the complementarity and redundancy of multiple modalities (see Multi-modal Representation)
translation: how to map data from one modality to another (see Multi-modal Translation)
alignment: identifying the direct relationships between sub-elements of two or more different modalities (see Multi-modal Alignment)
fusion: Join information from two ore more modalities to perform a prediction (with possibly missing data from modalities). Different modalities may have varying predictive power and noise topology. (see Multi-modal Fusion)
co-learning: Transferring knowledge between modalities, their representation, and predictive models. (see Co-learning)

<biblio.bib>