DDX7
This is a test audio file
Real-time Factor test
Due to lack of space in the paper, we left out a comparison of the real-time factor between our DDX7 model (400k parameters) and the HpN Baseline (4.5 M parameters). We execute on Pytorch the inference of audio excerpts of different length (to accomodate for different latencies) for both our model and the baseline on a laptop CPU ( Intel i7-6700HQ ). We render the audio excerpts a hundred times and extract the Real-time Factor according to the following formula, extracting the mean and standard deviation of the runs.
rt_factor = time_to_compute / length_of_audio_generated
An algorithm that can operate on real-time has to have a real time factor smaller than 1. The results shown in Table 1 indicate that DDX7 can run with as little as 32 ms of latency in real time on a laptop CPU, but the HpN Baseline needs at least 128 ms. These metrics can be improved further for both models if a different framework is used (for instance, TorchScript).
Real Time Factor | ||
---|---|---|
Latency (ms) | DDX7 | HpN Baseline |
256 | 0.079 (0.005) | 0.231 (0.0124) |
128 | 0.158 (0.011) | 0.466 (0.0229) |
64 | 0.343 (0.039) | 1.04 (0.192) |
32 | 0.637 (0.042) | 1.88 (0.111) |
16 | 1.31 (0.169) | 3.71 (0.188) |
8 | 2.51 (0.161) | 7.39 (0.32) |
4 | 5.01 (0.215) | 15.2 (1.19) |
Table 1: Mean and std (in italics) of the Real-time Factor for DDX7 and the HpN Baseline.
Minimum feasible latencies are shown in bold for both models.