Basics of Audio Compression
Advances in digital audio technology are fueled by two sources: hardware developments and new signal processing techniques. When processors dissipated tens of watts of power and memory densities were on the order of kilobits per square inch, portable playback devices like an MP3 player were not possible. Now, however, power dissipation, memory densities, and processor speeds have improved by several orders of magnitude.
Advancements in signal processing are exemplified by Internet broadcast applications: if the desired sound quality for an internet broadcast used 16-bit PCM encoding at 44.1 KHz, such an application would require a 1.4 Mbps (2 x 16 x 44k) channel for a stereo signal! Fortunately new bit rate reduction techniques in signal processing for audio of this quality are constantly being released.
Increasing hardware efficiency and an expanding array of digital audio representation formats are giving rise to a wide variety of new digital audio applications. These applications include portable music playback devices, digital surround sound for cinema, high-quality digital radio and television broadcast, Digital Versatile Disc (DVD), and many others.
This paper introduces digital audio signal compression, a technique essential to the implementation of many digital audio applications. Digital audio signal compression is the removal of redundant or otherwise irrelevant information from a digital audio signal, a process that is useful for conserving both transmission bandwidth and storage space. We begin by defining some useful terminology. We then present a typical encoder (as compression algorithms are often called) and explain how it functions. Finally consider some standards that employ digital audio signal compression, and discuss the future of the field.
Psychoacoustics is the study of subjective human perception of sounds. Effectively, it is the study of acoustical perception. Psychoacoustic modeling has long-since been an integral part of audio compression. It exploits properties of the human auditory system to remove the redundancies inherent in audio signals that the human ear cannot perceive. More powerful signals at certain frequencies Ëœmaskâ„¢ less powerful signals at nearby frequencies by de-sensitizing the human earâ„¢s basilar membrane (which is responsible for resolving the frequency components of a signal). The entire MP3 phenomenon is made possible by the confluence of several distinct but interrelated elements: a few simple insights into the nature of human psychoacoustics, a whole lot of number crunching, and conformance to a tightly specified format for encoding and decoding audio into compact bitstreams.