Dance: Visualizers

June 25, 2022

In the last post of this series, we discussed Dance’s plugin system. In this post, we’ll implement our first plugin, a traditional, FFT-based music visualizer. As this is the culmination of this project, I’d definitely recommend reading the preceding posts first!

Windows Audio

The original idea for this project was to create a seamless visualizer for music playing on the desktop. Now that we’ve built up our window managers and plugin architecture, we can finally get into the audio system.

Listening In

The Windows Multimedia API allows us to hook into system audio devices and mirror their output into our own buffer. We can do this by creating an IAudioClient, initializing it with AUDCLNT_SHAREMODE_SHARED and AUDCLNT_STREAMFLAGS_LOOPBACK, and accessing its IAudioCaptureClient subservice. Once we’ve enabled the IAudioCaptureClient, we can continuously query it for new packets.

The AudioListener is our base class for this subsystem. You can check out the header and source if you’re interested in how exactly it works.

class AudioListener
{
public:
    // A bunch of stuff is omitted for conciseness
    AudioListener(ComPtr<IMMDevice> device, REFERENCE_TIME duration);
    virtual HRESULT Enable();
    virtual HRESULT Disable();

    /// Iterates through any newly available audio packets and invokes the handler on each one.
    /// 
    /// @exception ComError if any capture client operations fail.
    /// @returns whether any new packets were received by the listener and passed to handle.
    /// @seealso https://docs.microsoft.com/en-us/windows/win32/api/audioclient/nf-audioclient-iaudiocaptureclient-getbuffer
    virtual bool Listen()
    
protected:
    /// Virtual handler for new audio packets from the bound audio device
    /// captured by the audio client. This method is invoked by 
    /// AudioListener::Listen.
    /// 
    /// @param data is a pointer to the available audio capture packet.
    /// @param count is the number of frames in the packet.
    /// @param flags contains information about discontinuities, silence, etc.
    /// @see AudioListener::Listen
    virtual void Handle(const void* data, size_t count, DWORD flags) = 0;
};

Between construction and destruction, the AudioListener’s use is AudioListener::Listen, which iterates through any new audio packets from the capture client and passes them to AudioListener::Handle. This handler function is implemented further down the class hierarchy by our AudioAnalyzer.

Note that to use the AudioListener you have to enable COM. This is done via ::CoInitializeEx. Dance uses threading model COINITBASE_MULTITHREADED. I’m not 100% on how all of that works, I just nod and go along with it.

References:

Transformation

Our goal is to produce a stylized bar graph where the x-axis corresponds to pitch bins and the y-axis corresponds to intensity. This is a pretty common visualization for music published with visual media, e.g. on YouTube; here’s an example. However, all we have to work with is digital audio data, which generally takes the form of a discrete series of scalars that represent sound waves.

In order to go from audio packets to frequency-intensity bins, we’re gonna need a Fourier transform. I’m neither qualified nor confident enough to explain the theory behind the Fourier transform, but if you’d like to learn more, I’d recommend this 3blue1brown video.

I ended up using fftw3’s fftwf_plan_dft_r2c_1d, which does a discrete, 1-dimensional Fourier transform. In order to keep the bars in sync with the music, we apply this transform to sequential windows of audio data in real time. This is handled in the AudioAnalyzer, which maintains a ring buffer for the incoming data and executes the same fftwf_plan on it each time, writing the output to a statically sized std::vector<FFTWFComplex> spectrum;. You can see how this works in the header and source, but it’s easiest to just look the two most important methods:

void AudioAnalyzer::Handle(const void* data, size_t count, DWORD flags)
{
    // https://stackoverflow.com/questions/64158704/wasapi-captured-packets-do-not-align
    if (flags & AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY)
    {
        this->buffer.Reset();
        TRACE("discontinuity!");
    }
    if (flags & AUDCLNT_BUFFERFLAGS_SILENT)
    {
        TRACE("silent!");
    }

    // Write to the wring buffer
    this->adapter->Write(this->buffer, data, count);
}

void AudioAnalyzer::Analyze()
{
    // Execute the FFT, which is already targeted at the contents of our ring buffer
    this->fft.Execute();
}

To use the analyzer, we continuously invoke AudioAnalyzer::Listen() to make sure our audio data buffer is in sync with whatever’s playing. Then, whenever we want the latest audio spectrum, we can run AudioAnalyzer::Analyze() and call for a handle to the bins AudioAnalyzer::Spectrum().

Ah, but hold on! Before we integrate this into our BarsVisualizer, we’re gonna take a quick detour over to even more program architecture.

Encapsulation

After writing a prototype of what would eventually become the BarsVisualizer, I decided to encapsulate a couple of the more reusable aspects of the code into their own base classes. In addition to cleaning up the source, this would help speed up development of other plugins down the line.

Take Two

The first order of business was the 2D rendering pipeline, which was fairly general-purpose; that became TwoVisualizer. The header and source live in the Libraries directory of the repository.

class TwoVisualizer : public virtual Dance::API::Visualizer
{
public:
    TwoVisualizer(const Dependencies& dependencies);

    virtual HRESULT Unsize();
    virtual HRESULT Resize(const RECT& size);

protected:
    ComPtr<IDXGISwapChain1> dxgiSwapChain;
    ComPtr<ID2D1Device1> d2dDevice;
    ComPtr<IDXGISurface2> dxgiSurface;
    ComPtr<ID2D1Bitmap1> d2dBitmap;
    ComPtr<ID2D1DeviceContext> d2dDeviceContext;

    HRESULT CreateSurface();
    HRESULT ReleaseSurface();
    HRESULT CreateBitmap();
    HRESULT ReleaseBitmap();
};

The TwoVisualizer is remarkably straightforward; its sole purpose is to provide a ID2D1Bitmap1 and ID2D1DeviceContext the plugin can use to render two-dimensional shapes, text, etc. Beside resource allocation and initialization, the majority of its code revolves around efficiently resizing the bitmap when the window is resized. Of note is the use of virtual inheritance, which was new to me.

The purpose of virtual inheritance is to avoid duplicating grandparent class members when inheriting from multiple sibling classes. TwoVisualizer, ThreeVisualizer, and AudioVisualizer all inherit from and use member variables of Visualizer. Since it is expected that a given visualizer will inherit at least two of those classes, we use virtual inheritance to keep the final class structurally concise, not to mention avoid miscommunication between base classes.

Take Three

Since we already needed an ID3D11Device to set up our transparent visualizer window, I went ahead and put together a ThreeVisualizer as well. The nuances of the TwoVisualizer are mirrored here; the ID2D1Bitmap1 becomes a ID3D11RenderTargetView and we tack on an ID3D11DepthStencilView and an ID3D11Texture2D for proper z-ordering. The header and source are in another Libraries project.

class ThreeVisualizer : public virtual Dance::API::Visualizer
{
public:
    ThreeVisualizer(const Dependencies& dependencies);

    virtual HRESULT Unsize();
    virtual HRESULT Resize(const RECT& size);

protected:
    ComPtr<IDXGISwapChain1> dxgiSwapChain;
    ComPtr<ID3D11Device> d3dDevice;
    ComPtr<ID3D11DeviceContext> d3dDeviceContext;
    ComPtr<ID3D11RenderTargetView> d3dBackBufferView;
    ComPtr<ID3D11DepthStencilView> d3dDepthStencilView;
    ComPtr<ID3D11Texture2D> d3dDepthTexture;
    ComPtr<ID3D11SamplerState> d3dSamplerState;

    HRESULT CreateRenderTarget();
    HRESULT ReleaseRenderTarget();
    HRESULT CreateDepthStencil(const RECT& size);
    HRESULT ReleaseDepthStencil();
};

Admittedly, this corner of the project is still a work in progress. The 3D example I adapted this from was super minimal, only requiring rudimentary linear algebra, cameras, and GPU buffers. As a result, it’s probably rather anemic compared to the TwoVisualizer in a way I just haven’t figured out yet via dogfooding. Do check out the buffer code, though, I thought it was pretty creative.

Take Five

Now that we’ve laid the groundwork for our visualizer base classes, we can jump back to our audio analyzer. The AudioVisualizer serves as a bridge from Visualizer to AudioAnalyzer, and it’s pretty straightforward. Whenever the visualizer receives a main thread Update(double delta), we run this->analyzer.Listen(). If there were new packets, we also run this->analyzer.Analyze(). Further subclasses may then access this->analyzer.Spectrum() at any time.

Tying it All Together

With the hard work out of the way, we can finally implement our BarsVisualizer. For everyone playing along at home, I’d recommend looking at the header and source directly. It’s a bit long and mathy, but they key points are as follows:

  • The BarsVisualizer::Unsize and BarsVisualizer::Resize mirror the parent methods. The resize also chooses an appropriate number and size for the bars on the graph.
  • v and rgb are convenience functions that convert HSL to RGB, so we can get cool rainbow colors parametrized on a scalar.
  • BarsVisualizer::Render clears the canvas, does a bunch of math for plotting the bar graph based on this->analyzer.Spectrum(), draws the bars, then presents the frame. There’s also a bit of logic in there to do smoothing between frames so the movement isn’t as jarring.
  • Finally, BarsVisualizer::Update calls the parent AudioVisualizer::Update so that the frequency spectrum can be recomputed.

I tried to name variables clearly and keep control flow tricks to a minimum to make the visualizer code as readable as possible. While it might be a little tricky to piece together what’s going on in BarsVisualizer::Render, I’m planning on going back and adding documentation in the future.