nanowarp

package module
v0.0.0-...-dad73e5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

README

Nanowarp

Studio-grade audio time stretching algorithm.

Reference implementation is going to be in Go, and a possible C implementation will share this repo with a Go version.

Includes a modified version of github.com/youpy/go-wav (ISC license) with added 32-bit float WAV support export. © 2013–2025 youpy.

Current state: algorithm done, working on streaming. No user-facing API exists yet.

Installation and usage

  1. Install Go
  2. Open terminal/shell/command prompt.
  3. Install Nanowarp
go install github.com/neputevshina/nanowarp/cmd/nanowarp@latest
  1. Use it
nanowarp -i inputfile.wav -t <stretch> [-o outputfile.wav]

or

nanowarp -i inputfile.wav -from <bpm> -to <bpm> -st <semitones> [-o outputfile.wav]

If your system can't find nanowarp executable, you have probably changed PATH variable in your system. Probably the simplest way to bring it back if you are under Windows is by reinstalling the Go. On Linux, you should probably know what to do.

Consult

nanowarp -help

to get the list of available options.

Implementation

Nanowarp is a phase gradient heap integration (PGHI) phase vocoder (aka PVDR)[1] where partial derivatives of phase are obtained through time-frequency reassignment[2]. This way accurate phase-time advance can be obtained using only one windowed grain instead of two.

Like in original implementation of PVDR, FFT is oversampled by factor of 2 with zero-padding. Stereo coherence is obtained through stretching mono and adding complex phase difference of respective side channels after stretching back[4].

A phase ramp for the entire output signal is generated. Onsets are detected using rectified complex-domain novelty function[3]. If onset is detected, phase ramp will have a derivative of 1 in a region around detected onset. Starting points of these sample regions are scaled by the stretch size, and points between regions are linearly interpolated.

Then the large-grained (nfft=4096) PVDR is applied, using phase ramp for the input sample indexes. If the derivative of the signal is 1, samples are passed through to the output unmodified.

Demos

Listen here. Obsolete.

Notes

  • There exists a “beat-emphasis onset detection function”.
    • Or just make an informed guess of -poolms based on estimated uniform BPM. Much simpler and faster, probably as much effective.
    • In the latter case, the novelty curve “max pool” (per time bin) detection (vs. dilation as now) is probably more preferable.
    • Even simpler: use -from as a source of truth.
    • Or keep a cumulative average, and select peaks only from above it.
    • Or do live classification of the material and make the bin size smaller where no voice is found. Phase interruptions on instruments are much less noticeable.
    • Or use the large BPM-independent peak selection window (≥300 ms) and force the reset when enough correlation with the original is obtained. Very cheap, we always have both spectra for each grain, xcorr is one conjugate and multiplication away. Might require doing it per-band and using some psychoacoustics to do it both regularly and unnoticeable.
  • Some more onset detectors:
  • SELEBI exists (preprint): https://arxiv.org/abs/2602.16421
  • PGHI, being a “brute-force sinusoidal modeling”, probably can be abused as a tonality measure for ruling out erroneous onset detections. It can't, but it's still a cool concept to keep in mind.
  • Non-causal PGHI is ineffective because PGHI integrates the phase locally, ignoring overlap, so it is impossible to obtain globally coherent phase with phase resets using this method. We need a some way to use the phase of up to overlap number of frames.
  • Resamplers: https://codeberg.org/BillyDM/awesome-audio-dsp/src/branch/main/content/deip.pdf
  • Formant shifting must be implemented after streaming.
  • We can probably reset the phases not for the whole frame, but only for a most prominent region. Either:
    • define several “phase reset bands”. Just return the per-band sums of the novelty function; or
    • use the total sum (like now), but find a prominent bin range and reset the phase only in it.
    • We can probably never reset the bass. Probably.
    • It is enough to split the signal to four bands probably, crossovers are at 250-820-2500 Hz
    • And then drop the band if cross-correlation is low or if a band's response in a softmaxed vector is lower than 0.25.
    • Bass activation triggers everything
  • There is pre- and post-echo of size hop×stretch for EVERY vertical motion in spectrum.
    • This is the property of PGHI (that's window side lobes), mini-pvdr does the same.
    • Apparent only on extreme stretches
    • And on high frequencies it's even echo in frequency, not only time.
  • gonum has vectorized complex and float operations. USE THIS.
Testing strategy
  • Various impulse train signals
  • LFO FM Sine
  • Vocals under hard saturation
  • Drum loops
  • Full tracks: pop, electronica, acoustica, black metal
  • Braid remastered soundtrack, phase resets WILL break the sound.
  • “Frederic — oddloop” brings the algorithm to the knees: phasiness and misplaced phase resets are obvious.
Streaming implementation plan

1. Switch time ramp and coefficient handling method from signal buffers to breakpoints. Tried, it broke the algorithm. 2. Define stretch signal producer and sound producer (goroutines).

for {
  readnext(i) // blocking
  stretch(i, o)
  writenext(o)
}
  1. Define Push and Pull which communicate with producers.
  2. Use in cmd/nanowarp.

Known issues

  • No pitch modification. Requires a good resampler library, e.g. r8brain. Either port it or use through cgo.
  • No streaming support. All processing is in-memory with obvious RAM costs.
  • Slow.
  • Triples the sound on extreme (>4x) time stretches. The bane of all PVDR-based algorithms.
  • Modifies the tonal balance of the material. Elastiqué doesn't do that.

References

  1. Průša, Z., & Holighaus, N. (2017). Phase vocoder done right.
  2. Flandrin, P. et al. (2002). Time-frequency reassignment: from principles to algorithms.
  3. Duxbury, C., Bello, J. P., Davies, M., & Sandler, M. (2003, September). Complex domain onset detection for musical signals. In Proc. Digital Audio Effects Workshop (DAFx) (Vol. 1, pp. 6-9). London: Queen Mary University.
  4. Altoè, A. (2012). A transient-preserving audio time-stretching algorithm and a real-time realization for a commercial music product.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DetectorNew

func DetectorNew(nfft, fs int, maxTransient, onsetevery int) (n *detector)

Types

type Nanowarp

type Nanowarp struct {
	// contains filtered or unexported fields
}

func New

func New(samplerate int, opts Options) (n *Nanowarp)

func (Nanowarp) DilatePeakSelectProcess

func (n Nanowarp) DilatePeakSelectProcess(ar dspio.SignalReader, aw dspio.SignalWriter, stretch float64, ons chan onset) (err error)

func (Nanowarp) NoveltyCurveProcess

func (n Nanowarp) NoveltyCurveProcess(ar dspio.SignalReader, aw dspio.SignalWriter) (err error)

func (*Nanowarp) Process

func (n *Nanowarp) Process(lin, rin, lout, rout []float64, stretch float64)

type Options

type Options struct {
	// Output scaled onsets only.
	Onsets bool

	// Set algorithm quality.
	//  -1: Don't perform transient separation, output raw PVDR without phase resets.
	//      4x overlap. Fastest and currently the smoothest with OK transient preservation
	//	and excellent tonal quality.
	//  0:  Extract transients and reset the phase on them. 4x overlap. Slow.
	//  1:  Same as 0, but with 8x overlap. Slowest with diminishing returns.
	Quality int

	// Time for which signal will be bypassed at any detected transient.
	//
	// If zero will be set to 30.
	TransientMs int

	// The size of the transient picking filter in milliseconds.
	//
	// If zero will be set to 250.
	PickingMs int

	// Measure the pooling size in output time, not in input time.
	// I.e. scale the pooling size with the stretch coefficient.
	ScalePool bool
}

Directories

Path Synopsis
cmd
nanowarp command
Please, don't use this package in your projects.
Please, don't use this package in your projects.
wav module
Package waveform provides a way to dump the pseudographic waveform view of an audio buffer.
Package waveform provides a way to dump the pseudographic waveform view of an audio buffer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL