Download Optimization techniques for a physical model of human vocalisation
We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target nonspeech human audio signals –yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between real and generated audio. We validated the most common optimization techniques reported in the literature and a specifically designed neural network. We evaluated several popular quality metrics as error functions. These include both objective quality metrics and subjective-equivalent metrics. We compared the results in terms of total error and computational demand. Results show that genetic and swarm optimizers outperform least squares algorithms at the cost of executing slower and that specific combinations of optimizers and audio representations offer significantly different results. The proposed methodology could be used in benchmarking other physical models and audio types.
Download Upcylcing Android Phones into Embedded Audio Platforms
There are millions of sophisticated Android phones in the world that get disposed of at a very high rate due to consumerism. Their computational power and built-in features, instead of being wasted when discarded, could be repurposed for creative applications such as musical instruments and interactive audio installations. However, audio programming on Android is complicated and comes with restrictions that heavily impact performance. To address this issue, we present LDSP, an open-source environment that can be used to easily upcycle Android phones into embedded platforms optimized for audio synthesis and processing. We conducted a benchmark study to compare the number of oscillators that can be run in parallel on LDSP with an equivalent audio app designed according to modern Android standards. Our study tested six phones ranging from 2014 to 2018 and running different Android versions. The results consistently demonstrate that LDSP provides a significant boost in performance, with some cases showing an increase of more than double, making even very old phones suitable for fairly advanced audio applications.
Download Efficient finite-difference room acoustics simulation incorporating extended-reacting elements
A method is proposed that allows finite-difference (FD) simulation of room acoustics to incorporate extended-reacting porous elements without adding major computational cost. The porous elements are described by a rigid-frame equivalent fluid model and are incorporated into the time-domain formulation through auxiliary differential equations. By using a local staggered grid scheme for the boundaries of the porous elements, the method allows an efficient second-order scalar approach to be used for the uniform air and porous element interior regions that make up the majority of the computational domain. Both the scalar and staggered schemes are based on a face-centered cubic grid to minimize numerical dispersion. A software implementation running on GPU shows the accuracy of the method compared to a theoretical reference, and demonstrates the method’s computational efficiency through a benchmark example.
Download Physically inspired signal model for harmonium sound synthesis
The hand harmonium is arguably the most popular instrument for vocal accompaniment in Hindustani music today. However, it lacks microtonality and the ability to produce controlled pitch glides, which are both important in Hindustani music. A harmonium sound synthesis model with a source-filter structure was previously presented by the authors in which the harmonium reed sound is synthesized using a physical model and the effect of the wooden enclosure is applied by a filter estimated from a recorded note. In this paper, we propose a simplified and perceptually informed signal model capable of real time synthesis with timbre control. In the signal model, the source is constructed as a band-limited waveform matching the spectral characteristics of the source signal in the physical model. Simplifications are suggested to parametrize the filter on the basis of prominent peaks in the filter frequency response. The signal model is implemented as a Pure Data [1] patch for live performance using a standard MIDI keyboard.
Download The Threshold of Perceptual Significance for TV Soundtracks
Hearing loss affects 1.5 billion people world-wide [1], affecting many aspects of life, including the ability to hear the television. Simply increasing the volume may restore audibility of the quietest elements, but at a cost of making other elements undesirably loud. Therefore, at the very least, dynamic range compression could also be useful, fitted to an individual’s frequency-dependent hearing loss. However, it is not clear whether the audibility of the quietest parts of TV audio needs to be preserved. This experiment aims to measure which elements of the audio are important by presenting normal-hearing listeners with binary masked versions of TV audio presented at 60 dB(A), muting audio below a given sensation level. It was hypothesised that spectro-temporal regions with the most power density would dominate perception, such that the less active regions may not be missed. To find this threshold of perceptual significance, a two-alternative forced choice signal detection experiment was designed in which excerpts from BBC television shows were binary masked and presented to the participants, with the task to identify which clips sounded more processed. The results suggest that discarding audio below 10 phons would rarely be noticed by most listeners.
Download Power-Balanced Dynamic Modeling of Vactrols: Application to a VTL5C3/2
Vactrols, which consist of a photoresistor and a light-emitting element that are optically coupled, are key components in optical dynamic compressors. Indeed, the photoresistor’s program-dependent dynamic characteristics make it advantageous for automatic gain control in audio applications. Vactrols are becoming more and more difficult to find, while the interest for optical compression in the audio community does not diminish. They are thus good candidates for virtual analog modeling. In this paper, a model of vactrols that is entirely physical, passive, with a program-dependent dynamic behavior, is proposed. The model is based on first principles that govern semi-conductors, as well as the port-Hamiltonian systems formalism, which allows the modeling of nonlinear, multiphysical behaviors. The proposed model is identified with a real vactrol, then connected to other components in order to simulate a simple optical compressor.
Download Neural Modeling of Magnetic Tape Recorders
The sound of magnetic recording media, such as open-reel and cassette tape recorders, is still sought after by today’s sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices with applications in music production and audio antiquing tasks.
Download Neural Grey-Box Guitar Amplifier Modelling with Limited Data
This paper combines recurrent neural networks (RNNs) with the discretised Kirchhoff nodal analysis (DK-method) to create a grey-box guitar amplifier model. Both the objective and subjective results suggest that the proposed model is able to outperform a baseline black-box RNN model in the task of modelling a guitar amplifier, including realistically recreating the behaviour of the amplifier equaliser circuit, whilst requiring significantly less training data. Furthermore, we adapt the linear part of the DK-method in a deep learning scenario to derive multiple state-space filters simultaneously. We frequency sample the filter transfer functions in parallel and perform frequency domain filtering to considerably reduce the required training times compared to recursive state-space filtering. This study shows that it is a powerful idea to separately model the linear and nonlinear parts of a guitar amplifier using supervised learning.
Download Antialiased State Trajectory Neural Networks for Virtual Analog Modeling
In recent years, virtual analog modeling with neural networks experienced an increase in interest and popularity. Many different modeling approaches have been developed and successfully applied. In this paper we do not propose a novel model architecture, but rather address the problem of aliasing distortion introduced from nonlinearities of the modeled analog circuit. In particular, we propose to apply the general idea of antiderivative antialiasing to a state-trajectory network (STN). Applying antiderivative antialiasing to a stateful system in general leads to an integral of a multivariate function that can only be solved numerically, which is too costly for real-time application. However, an adapted STN can be trained to approximate the solution while being computationally efficient. It is shown that this approach can decrease aliasing distortion in the audioband significantly while only moderately oversampling the network in training and inference.
Download How Smooth Do You Think I Am: An Analysis on the Frequency-Dependent Temporal Roughness of Velvet Noise
Velvet noise is a sparse pseudo-random signal, with applications in late reverberation modeling, decorrelation, speech generation, and extending signals. The temporal roughness of broadband velvet noise has been studied earlier. However, the frequency-dependency of the temporal roughness has little previous research. This paper explores which combinative qualities such as pulse density, filter type, and filter shape contribute to frequency-dependent temporal roughness. An adaptive perceptual test was conducted to find minimal densities of smooth noise at octave bands as well as corresponding lowpass bands. The results showed that the cutoff frequency of a lowpass filter as well as the center frequency of an octave filter is correlated with the perceived minimal density of smooth noise. When the lowpass filter with the lowest cutoff frequency, 125 Hz, was applied, the filtered velvet noise sounded smooth at an average of 725 pulses/s and an average of 401 pulses/s for octave filtered noise at a center frequency of 125 Hz. For the broadband velvet noise, the minimal density of smoothness was found to be at an average of 1554 pulses/s. The results of this paper are applicable in designing velvet-noise-based artificial reverberation with minimal pulse density.