Download Explicit Vector Wave Digital Filter Modeling of Circuits with a Single Bipolar Junction Transistor
The recently developed extension of Wave Digital Filters based on vector wave variables has broadened the class of circuits with linear two-port elements that can be modeled in a modular and explicit fashion in the Wave Digital (WD) domain. In this paper, we apply the vector definition of wave variables to nonlinear twoport elements. In particular, we present two vector WD models of a Bipolar Junction Transistor (BJT) using characteristic equations derived from an extended Ebers-Moll model. One, implicit, is based on a modified Newton-Raphson method; the other, explicit, is based on a neural network trained in the WD domain and it is shown to allow fully explicit implementation of circuits with a single BJT, which can be executed in real time.
Download Antialiasing Piecewise Polynomial Waveshapers
Memoryless waveshapers are commonly used in audio signal processing. In discrete time, they suffer from well-known aliasing artifacts. We present a method for applying antiderivative antialising (ADAA), which mitigates aliasing, to any waveshaping function that can be represented as a piecewise polynomial. Specifically, we treat the special case of a piecewise linear waveshaper. Furthermore, we introduce a method for for replacing the sharp corners and jump discontinuities in any piecewise linear waveshaper with smoothed polynomial approximations, whose derivatives match the adjacent line segments up to a specified order. This piecewise polynomial can again be antialiased as a special case of the general piecewise polynomial. Especially when combined with light oversampling, these techniques are effective at reducing aliasing and the proposed method for rounding corners in piecewise linear waveshapers can also create more “realistic” analog-style waveshapers than standard piecewise linear functions.
Download A General Use Circuit for Audio Signal Distortion Exploiting Any Non-Linear Electron Device
In this paper, we propose the use of the transimpedance amplifier configuration as a simple generic circuit for electron device-based audio distortion. The goal is to take advantage of the non-linearities in the transfer curves of any device, such as diode, JFET, MOSFET, and control the level and type of harmonic distortion only through bias voltages and signal amplitude. The case of a nMOSFET is taken as a case study, revealing a rich dependence of generated harmonics on the region of operation (linear to saturation), and from weak to strong inversion. A continuous and analytical Lambert-W based model was used for simulations of harmonic distortion, which were verified through measurements.
Download Modulation Extraction for LFO-driven Audio Effects
Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measurement from the audio signal is nontrivial, hindering the modeling process. To address this, we propose a framework capable of extracting arbitrary LFO signals from processed audio across multiple digital audio effects, parameter settings, and instrument configurations. Since our system imposes no restrictions on the LFO signal shape, we demonstrate its ability to extract quasiperiodic, combined, and distorted modulation signals that are relevant to effect modeling. Furthermore, we show how coupling the extraction model with a simple processing network enables training of end-to-end black-box models of unseen analog or digital LFO-driven audio effects using only dry and wet audio pairs, overcoming the need to access the audio effect or internal LFO signal. We make our code available and provide the trained audio effect models in a real-time VST plugin1 .
Download Dynamic Pitch Warping for Expressive Vocal Retuning
This work introduces the use of the Dynamic Pitch Warping (DPW) method for automatic pitch correction of singing voice audio signals. DPW is designed to dynamically tune any pitch trajectory to a predefined scale while preserving its expressive ornamentation. DPW has three degrees of freedom to modify the fundamental frequency (f0 ) signal: detection interval, critical time, and transition time. Together, these parameters allow us to define a pitch velocity condition that triggers an adaptive correction of the pitch trajectory (pitch warping). We compared our approach to Antares Autotune (the most commonly used software brand, abbreviated as ATA in this article). The pitch correction in ATA has two degrees of freedom: a triggering threshold (flextune) and the transition time (retune speed). The pitch trajectories that we compare were extracted from autotuned-in-ATA audio signals, and the DPW algorithm implemented over the f0 of the input audio tracks. We studied specifically pitch correction for three typical situations of f0 curves: staircase, vibrato, free-path. We measured the proximity of the corrected pitch trajectories to the original ones for each case obtaining that the DPW pitch correction method is better to preserve vibrato while keeping the f0 free path. In contrast, ATA is more effective in generating staircase curves, but fails for notsmall vibratos and free-path curves. We have also implemented an off-line automatic picth tuner using DPW.
Download Towards High Sampling Rate Sound Synthesis on FPGA
This “Late Breaking Results” paper presents an ongoing project aiming at providing an accessible and easy-to-use platform for high sampling rate real-time audio Digital Signal Processing (DSP). The current version can operate in the megahertz range and we aim to achieve sampling rates as high as 20 MHz in the near future. It relies on the Syfala compiler which can be used to program Field Programmable Gate Array (FPGA) platforms at a high level using the FAUST programming language. In our system, the audio DAC is directly implemented on the FPGA chip, providing exceptional performances in terms of audio latency as well. After giving an overview of the state of the art of this field, we describe the way this tool works and we present ongoing and future developments.
Download P-RAVE: Improving RAVE through pitch conditioning and more with application to singing voice conversion
In this paper, we introduce means of improving fidelity and controllability of the RAVE generative audio model by factorizing pitch and other features. We accomplish this primarily by creating a multi-band excitation signal capturing pitch and/or loudness information, and by using it to FiLM-condition the RAVE generator. To further improve fidelity when applied to a singing voice application explored here, we also consider concatenating a supervised phonetic encoding to its latent representation. An ablation analysis highlights the improved performance of our incremental improvements relative to the baseline RAVE model. As our primary enhancement involves adding a stable pitch conditioning mechanism into the RAVE model, we simply call our method P-RAVE.
Download Designing a Library for Generative Audio in Unity
This paper overviews URALi, a library designed to add generative sound synthesis capabilities to Unity. This project, in particular, is directed towards audiovisual artists keen on working with algorithmic systems in Unity but can not find native solutions for procedural sound synthesis to pair with their visual and control ones. After overviewing the options available in Unity concerning audio, this paper reports on the functioning and architecture of the library, which is an ongoing project.
Download Interpretable timbre synthesis using variational autoencoders regularized on timbre descriptors
Controllable timbre synthesis has been a subject of research for several decades, and deep neural networks have been the most successful in this area. Deep generative models such as Variational Autoencoders (VAEs) have the ability to generate a high-level representation of audio while providing a structured latent space. Despite their advantages, the interpretability of these latent spaces in terms of human perception is often limited. To address this limitation and enhance the control over timbre generation, we propose a regularized VAE-based latent space that incorporates timbre descriptors. Moreover, we suggest a more concise representation of sound by utilizing its harmonic content, in order to minimize the dimensionality of the latent space.
Download What you hear is what you see: Audio quality from Image Quality Metrics
In this study, we investigate the feasibility of utilizing stateof-the-art perceptual image metrics for evaluating audio signals by representing them as spectrograms. The encouraging outcome of the proposed approach is based on the similarity between the neural mechanisms in the auditory and visual pathways. Furthermore, we customise one of the metrics which has a psychoacoustically plausible architecture to account for the peculiarities of sound signals. We evaluate the effectiveness of our proposed metric and several baseline metrics using a music dataset, with promising results in terms of the correlation between the metrics and the perceived quality of audio as rated by human evaluators.