Multi-modal Machine Learning
Definitions
- modality
- the way in which something happens or is experienced. Typically associated with sensory modalities.
- multi-modal models
- models that can process and relate information from multiple modalities (e.g. speech and vision).
Key Challenges
- representation
- learning how to represent and summarize multimodal data in a way that exploits the complementarity and redundancy of multiple modalities (see Multi-modal Representation)
- translation
- how to map data from one modality to another (see Multi-modal Translation)
- alignment
- identifying the direct relationships between sub-elements of two or more different modalities (see Multi-modal Alignment)
- fusion
- Join information from two ore more modalities to perform a prediction (with possibly missing data from modalities). Different modalities may have varying predictive power and noise topology. (see Multi-modal Fusion)
- co-learning
- Transferring knowledge between modalities, their representation, and predictive models. (see Co-learning)
<biblio.bib>