Jethro's Braindump

Multi-modal Machine Learning


the way in which something happens or is experienced. Typically associated with sensory modalities.
multi-modal models
models that can process and relate information from multiple modalities (e.g. speech and vision).

Key Challenges

learning how to represent and summarize multimodal data in a way that exploits the complementarity and redundancy of multiple modalities (see Multi-modal Representation)
how to map data from one modality to another (see Multi-modal Translation)
identifying the direct relationships between sub-elements of two or more different modalities (see Multi-modal Alignment)
Join information from two ore more modalities to perform a prediction (with possibly missing data from modalities). Different modalities may have varying predictive power and noise topology. (see Multi-modal Fusion)
Transferring knowledge between modalities, their representation, and predictive models. (see Co-learning)


Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.