Portable audio engine


As part of Emagic‘s flagship product, Logic Audio, I developed a portable audio engine. This audio engine (nicknamed PAD) powered Logic Audio from 1996 to 2000. In addition, it was used in VMR and WaveBurner. The following key objectives have been followed during the design:

  • portability, code re-usability
  • performance
  • layered design
  • low latency
  • realtime DSP

At the time of design, typical audio engines were usually specifically tailored to the hardware/ operating system. As Emagic was a small company, an effort to increase the amount of code shared across platforms was more than welcome. The bottleneck of an audio engine was mainly the disk access. This was solved by intensive caching with intelligent read-ahead schemes. Realtime DSP processing was in its infancy, dominated by the process of mixing the output tracks. Memory access was already an issue here, but due to limited DSP processing, the focus was laid on implementing a scatter-gather scheme throughout the complete engine to avoid superfluous copying of data. Logic Audio was a Win 16 application at that time, programmed with Watcom‘s 32-bit extender. Windows 95, especially in conjunction with 16-bit applications had unpredictable real-time behavior. PAD reduced latency to about 200ms, which was an excellent value in 1996.

class hierarchy PAD


The PAD audio engine implements the following features:

  • portability to various operating systems and programming models
    • successful port to Win32, Mac OS 9, Be OS
    • thunk layer to interface with Win16 applications
    • more than 90% of hardware-independent code
    • clear object-oriented design
    • no use of assembly language
  • high performance
    • up to 30 tracks of audio (44.1kHz, 16bit) from disk to Windows Multimedia
    • on a Pentium 100MHz with Windows 95 (in 1996)
    • multi-threaded design
    • use of asynchronous disk i/o
    • broad hardware support
    • adapted to Windows Multimedia, MacOS SoundManager, Audiowerk, EASI, ASIO
  • specific low-level features for digital audio workstations
    • sample position interpolation by digital PLL
    • zero-latency start to support several hardware devices running simultaneously
    • level metering
    • input monitoring
  • support for tracks
    • every track can be routed and panned to two audio outputs
    • mixing of tracks to outputs by host CPU or dedicated hardware
  • support for regions
    • every region has a sample position and a length within the song
    • associated to an audio file and an offset inside the file
  • support for realtime DSP
    • plugins may be inserted into tracks, buses, outputs
    • support for proprietary format and VST
  • disk cache
    • read-ahead to reduce seeks
    • support for WAV and AIFF audio files
    • support for mono and stereo material
  • sophisticated debug features
    • RIP hook to recover after crashes
    • debug log feature to ease beta-test phase
    • built-in performance tester to gather data during beta-test phase


Since the design of PAD several things have changed. With increasing power of host CPUs, the focus shifted more and more to host-based DSP processing. This also shifted the bottleneck of the audio engine to memory bandwidth. Another thing we learnt was, that class hierarchies with lots of abstraction layers are not as maintenance friendly as believed, and that restructuring such a hierarchy is a major pain.

class hierarchy PAD-delta

PAD-delta was a redesign of PAD from 1999 to 2000, in an attempt to overcome these issues by taking the following approach:

  • single-precision floating point arithmetic
    • better throughput than integer on Pentium and PowerPC CPUs
    • more dynamic range
    • better filter stability, easier coefficient calculation
  • reduced set of classes, less abstraction layers, flat hierarchy
    • data source: disk cache or audio input
    • data sink: playback device or disk cache
    • DSP engine: the mechanics to perform DSP tasks
  • DSP engine executes linear list of tasks
    • tasks specified by input buffer, output buffer, processing function
    • concepts like tracks, buses, mixers all abstracted to tasks
    • tasks can be added or removed dynamically
    • order of tasks optimized in order to promote in-place processing
    • small processing buffer size (typically 64 audio frames) to fit data into CPU caches
  • advanced disk cache
    • consequent use of asynchronous i/o
    • even more intelligent read-ahead to reduce disk seeks
    • cache includes format conversion to floating point
    • float data cached

While PAD-delta remained the best performing audio engine in the music industry, it was replaced by an engine nicknamed MD in 2000 due to company internal political reasons and licensing issues.

© Felix Bertram 2002-2015. Last update September 2016.