This is a test audio file
Real-time Factor test
Due to lack of space in the paper, we left out a comparison of the real-time factor between our DDX7 model (400k parameters) and the HpN Baseline (4.5 M parameters). We execute on Pytorch the inference of audio excerpts of different length (to accomodate for different latencies) for both our model and the baseline on a laptop CPU ( Intel i7-6700HQ ). We render the audio excerpts a hundred times and extract the Real-time Factor according to the following formula, extracting the mean and standard deviation of the runs.
rt_factor = time_to_compute / length_of_audio_generated
An algorithm that can operate on real-time has to have a real time factor smaller than 1. The results shown in Table 1 indicate that DDX7 can run with as little as 32 ms of latency in real time on a laptop CPU, but the HpN Baseline needs at least 128 ms. These metrics can be improved further for both models if a different framework is used (for instance, TorchScript).
|Real Time Factor|
|Latency (ms)||DDX7||HpN Baseline|
|256||0.079 (0.005)||0.231 (0.0124)|
|128||0.158 (0.011)||0.466 (0.0229)|
|64||0.343 (0.039)||1.04 (0.192)|
|32||0.637 (0.042)||1.88 (0.111)|
|16||1.31 (0.169)||3.71 (0.188)|
|8||2.51 (0.161)||7.39 (0.32)|
|4||5.01 (0.215)||15.2 (1.19)|
Table 1: Mean and std (in italics) of the Real-time Factor for DDX7 and the HpN Baseline.
Minimum feasible latencies are shown in bold for both models.