Audiovisual Correlations
When attempting to create an engaging, dynamic and aesthetically pleasing visual representation of an audio signal, one of the fundamental aims is to ensure that specific aspects of an audio signal can be recognized and relatable to what is being seen. One method of enacting this is to select single attributes of each perception and methodically link them together, so that changes in one auditory attribute will result in similar changes applied to a corresponding visual attribute, and vice versa.But which auditory and visual attributes are the best to correlate with each other? This question has given rise to numerous studies over the years, with the most direct approach taken by S. Lipscomb and E. Kim, whose research explores possible correlations between four auditory and four visual attributes. The auditory attributes consisted of pitch, loudness, timbre and duration, and the visual attributes consisted of colour, location, shape and size. Participants in this study were asked to provide a degree of match between what was seen and what was heard, with the response being a preference rating between 0 and 100. The results of this experiment have shown that there were correlations between certain audio and visual stimuli, and that pairings receiving higher ratings were pitch with location, loudness with size and timbre with shape, whilst duration and colour showed no obvious correlations with any other attributes.

The way in which loudness and visual size correlate lays in the ability of each attribute to accentuate specific parts of its own respective perceptual stream. In the auditory domain, the psychoacoustic phenomenon of loudness forces us to focus on the sound with the most energy. This same principle is true in the visual domain, as objects that occupy more visual space will be noticeable before smaller objects. The similarity between the noticeability and amount of focus given to larger scale events provides an acceptable argument for the mapping of perceptual auditory loudness with visual size.
Timbre and shape is a correlation often used in audiovisual composition, and the reason for this could be that each attribute describes the complexity of its overall entity. Timbre is generally described as the quality of a sound, related to the amount and type of component frequencies (harmonics) that constitute a given tone. This definition is similar to the one given for a complex shape, which can be described as the combination of two or more simple shapes. The focus on the amount of component parts and the additive nature of each perceptual attribute creates a viable connection between timbre and shape, as a sound with more harmonics could be visualized as a shape with more complexity. Fritz Wilhelm Winckel, arguable the first person to synthesize images directly from an audio signal, gave this description for the pairing of timbre and visual complexity; ‘The fuller the timbre, the more overtones are contained in the sound, and the more complex, therefore, is the corresponding pattern’.

The relationship between colour and sound has been researched by Datteri and Howard, who investigated the possible correlations between the frequency of colour and the frequency of sound waves. Participants were exposed to a series of pure sine tones and coloured boxes, and asked to determine which colour ‘fitted best’ with each tone. A pattern emerged showing that low frequency colours corresponded to higher frequency musical pitches, and that high frequency colours corresponded to lower frequency notes. This inverse relationship highlights the fact that different colours can be linked to various musical pitches and timbres.The relationship between sound and colour has been investigated further, with Hubbard researching the correlation between visual lightness and pitch, concluding that lighter visual stimuli were matched best with higher pitches and darker visual stimuli with lower pitches. Also tested was the correlation between visual lightness and musical intervals, with results suggesting lighter visual stimuli corresponded with ascending intervals and darker stimuli with descending intervals. The size of the musical interval had a significant effect on the choice of lightness, as larger intervals prompted subjects to choose more extreme lighter or darker stimuli. Similar research conducted by Marks revealed the correlation between the brightness of a visual object and auditory loudness, with subjects associating brighter visual objects with louder auditory stimuli.
Correlations can be formed between auditory and visual phenomena when there is a similarity of movement between the sound and the movement it represents. Chion describes this as ‘isomorphism’, and demonstrates another way in which auditory and visual events can become connected as the implied motion can be recognized within each perception. This has also been stated in gestalt psychologies as the principle of common fate, which states that ‘elements that move in the same way tend to be grouped together’. This relationship can be realized by correlating louder, more active sections of auditory phenomena with faster sections of visual phenomena.
There are many more correlations that can be formed between auditory and visual events, most of which involve the aesthetic, emotional and artistic connection brought about when sound and imagery are combined. The various studies outlined above give an insight into how we perceive and link attributes from differing senses, and provide scientific evidence for the pairing of individual auditory and visual stimuli.
References:
- Lipscomb, S. D. and Kim, E. M. (2004) Perceived Match Between Visual Parameters and Auditory Correlates: An Experimental Multimedia Investigation
- Winkel, F. (1930) Technik und aufgaben des fernsehens
- Datteri, D. L. and Howard, J. N. (2004) The Sound of Colour
- Hubbard, T.L. (1996). Synesthesia-like Mappings of Lightness, Pitch and Melodic Interval
- Marks, L.E. (1974) On Associations of Light and Sound: The Mediation of Brightness, Pitch, and Loudness.
- Chion, M. (1994) Audio-vision: Sound on Screen.