Nanowarp
Studio-grade audio time stretching algorithm.
Reference implementation is going to be in Go, and a possible C implementation will share
this repo with a Go version.
Includes a modified version of github.com/youpy/go-wav (ISC license) with added 32-bit
float WAV support export. © 2013–2025 youpy.
Current state: algorithm done, working on streaming. No user-facing API exists yet.
Installation and usage
- Install Go
- Open terminal/shell/command prompt.
- Install Nanowarp
go install github.com/neputevshina/nanowarp/cmd/nanowarp@latest
- Use it
nanowarp -i inputfile.wav -t <stretch> [-o outputfile.wav]
or
nanowarp -i inputfile.wav -from <bpm> -to <bpm> -st <semitones> [-o outputfile.wav]
If your system can't find nanowarp executable, you have probably changed PATH variable in your system.
Probably the simplest way to bring it back if you are under Windows is by reinstalling the Go.
On Linux, you should probably know what to do.
Consult
nanowarp -help
to get the list of available options.
Implementation
Nanowarp is a phase gradient heap integration (PGHI) phase vocoder (aka PVDR)[1] where partial derivatives
of phase are obtained through time-frequency reassignment[2]. This way accurate phase-time
advance can be obtained using only one windowed grain instead of two.
Like in original implementation of PVDR, FFT is oversampled by factor of 2 with zero-padding.
Stereo coherence is obtained through stretching mono and adding complex phase difference of
respective side channels after stretching back[4].
A phase ramp for the entire output signal is generated. Onsets are detected using rectified
complex-domain novelty function[3]. If onset is detected, phase ramp will have a derivative of
1 in a region around detected onset. Starting points of these sample regions are scaled by the
stretch size, and points between regions are linearly interpolated.
Then the large-grained (nfft=4096) PVDR is applied, using phase ramp for the input
sample indexes. If the derivative of the signal is 1, samples are passed through to the output
unmodified.
Demos
Listen here. Obsolete.
Notes
- There exists a “beat-emphasis onset detection function”.
- Or just make an informed guess of
-poolms based on estimated uniform BPM. Much simpler and faster, probably as much effective.
- In the latter case, the novelty curve “max pool” (per time bin) detection (vs. dilation as now) is probably more preferable.
- Even simpler: use
-from as a source of truth.
- Or keep a cumulative average, and select peaks only from above it.
- Or do live classification of the material and make the bin size smaller where no voice is found. Phase interruptions on instruments are much less noticeable.
Or use the large BPM-independent peak selection window (≥300 ms) and force the reset when enough correlation with the original is obtained.
Very cheap, we always have both spectra for each grain, xcorr is one conjugate and multiplication away.
Might require doing it per-band and using some psychoacoustics to do it both regularly and unnoticeable.
- Some more onset detectors:
- SELEBI exists (preprint): https://arxiv.org/abs/2602.16421
PGHI, being a “brute-force sinusoidal modeling”, probably can be abused as a tonality measure for ruling out erroneous onset detections. It can't, but it's still a cool concept to keep in mind.
- Non-causal PGHI is ineffective because PGHI integrates the phase locally, ignoring overlap,
so it is impossible to obtain globally coherent phase with phase resets using this method. We need a some way to use the phase of up to overlap number of frames.
- Resamplers: https://codeberg.org/BillyDM/awesome-audio-dsp/src/branch/main/content/deip.pdf
- Formant shifting must be implemented after streaming.
- We can probably reset the phases not for the whole frame, but only for a most prominent region. Either:
- define several “phase reset bands”. Just return the per-band sums of the novelty function; or
use the total sum (like now), but find a prominent bin range and reset the phase only in it.
- We can probably never reset the bass. Probably.
- It is enough to split the signal to four bands probably, crossovers are at 250-820-2500 Hz
And then drop the band if cross-correlation is low or if a band's response in a softmaxed vector is lower than 0.25.
- Bass activation triggers everything
- There is pre- and post-echo of size hop×stretch for EVERY vertical motion in spectrum.
- This is the property of PGHI (that's window side lobes), mini-pvdr does the same.
- Apparent only on extreme stretches
- And on high frequencies it's even echo in frequency, not only time.
- gonum has vectorized complex and float operations. USE THIS.
Testing strategy
- Various impulse train signals
- LFO FM Sine
- Vocals under hard saturation
- Drum loops
- Full tracks: pop, electronica, acoustica, black metal
- Braid remastered soundtrack, phase resets WILL break the sound.
- “Frederic — oddloop” brings the algorithm to the knees: phasiness and misplaced phase resets are obvious.
Streaming implementation plan
1. Switch time ramp and coefficient handling method from signal buffers to breakpoints. Tried, it broke the algorithm.
2. Define stretch signal producer and sound producer (goroutines).
for {
readnext(i) // blocking
stretch(i, o)
writenext(o)
}
- Define Push and Pull which communicate with producers.
- Use in cmd/nanowarp.
Known issues
- No pitch modification. Requires a good resampler library, e.g. r8brain.
Either port it or use through cgo.
- No streaming support. All processing is in-memory with obvious RAM costs.
- Slow.
- Triples the sound on extreme (>4x) time stretches. The bane of all PVDR-based algorithms.
- Modifies the tonal balance of the material. Elastiqué doesn't do that.
References
- Průša, Z., & Holighaus, N. (2017). Phase vocoder done right.
- Flandrin, P. et al. (2002). Time-frequency reassignment: from principles to algorithms.
- Duxbury, C., Bello, J. P., Davies, M., & Sandler, M. (2003, September). Complex domain onset detection for musical signals. In Proc. Digital Audio Effects Workshop (DAFx) (Vol. 1, pp. 6-9). London: Queen Mary University.
- Altoè, A. (2012). A transient-preserving audio time-stretching algorithm and a real-time realization for a commercial music product.