nanowarp

package module

v0.0.0-...-dad73e5 Latest Latest Go to latest Published: Jun 12, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/neputevshina/nanowarp

Links

Open Source Insights

README ¶

Nanowarp

Studio-grade audio time stretching algorithm.

Reference implementation is going to be in Go, and a possible C implementation will share this repo with a Go version.

Current state: algorithm done, working on streaming. No user-facing API exists yet.

Installation and usage

Install Go
Open terminal/shell/command prompt.
Install Nanowarp

go install github.com/neputevshina/nanowarp/cmd/nanowarp@latest

Use it

nanowarp -i inputfile.wav -t <stretch> [-o outputfile.wav]

or

nanowarp -i inputfile.wav -from <bpm> -to <bpm> -st <semitones> [-o outputfile.wav]

If your system can't find nanowarp executable, you have probably changed PATH variable in your system. Probably the simplest way to bring it back if you are under Windows is by reinstalling the Go. On Linux, you should probably know what to do.

Consult

nanowarp -help

to get the list of available options.

Implementation

Nanowarp is a phase gradient heap integration (PGHI) phase vocoder (aka PVDR)[1] where partial derivatives of phase are obtained through time-frequency reassignment[2]. This way accurate phase-time advance can be obtained using only one windowed grain instead of two.

Like in original implementation of PVDR, FFT is oversampled by factor of 2 with zero-padding. Stereo coherence is obtained through stretching mono and adding complex phase difference of respective side channels after stretching back[4].

A phase ramp for the entire output signal is generated. Onsets are detected using rectified complex-domain novelty function[3]. If onset is detected, phase ramp will have a derivative of 1 in a region around detected onset. Starting points of these sample regions are scaled by the stretch size, and points between regions are linearly interpolated.

Then the large-grained (nfft=4096) PVDR is applied, using phase ramp for the input sample indexes. If the derivative of the signal is 1, samples are passed through to the output unmodified.

Demos

~~Listen here~~. Obsolete.

Notes

There exists a “beat-emphasis onset detection function”.
- Or just make an informed guess of -poolms based on estimated uniform BPM. Much simpler and faster, probably as much effective.
- In the latter case, the novelty curve “max pool” (per time bin) detection (vs. dilation as now) is probably more preferable.
- Even simpler: use -from as a source of truth.
- Or keep a cumulative average, and select peaks only from above it.
- Or do live classification of the material and make the bin size smaller where no voice is found. Phase interruptions on instruments are much less noticeable.
- Or use the large BPM-independent peak selection window (≥300 ms) and force the reset when enough correlation with the original is obtained. Very cheap, we always have both spectra for each grain, xcorr is one conjugate and multiplication away. Might require doing it per-band and using some psychoacoustics to do it both regularly and unnoticeable.
Some more onset detectors:
- https://www.cp.jku.at/research/papers/Boeck_Widmer_DAFx_2013.pdf
- https://www.dlsi.ua.es/~pertusa/pub/pdf/ciarp05.pdf
- Expecting a regular beat might be bad for some types of music.
SELEBI exists (preprint): https://arxiv.org/abs/2602.16421
~~PGHI, being a “brute-force sinusoidal modeling”, probably can be abused as a tonality measure for ruling out erroneous onset detections.~~ It can't, but it's still a cool concept to keep in mind.
Non-causal PGHI is ineffective because PGHI integrates the phase locally, ignoring overlap, so it is impossible to obtain globally coherent phase with phase resets using this method. We need a some way to use the phase of up to overlap number of frames.
Resamplers: https://codeberg.org/BillyDM/awesome-audio-dsp/src/branch/main/content/deip.pdf
Formant shifting must be implemented after streaming.
We can probably reset the phases not for the whole frame, but only for a most prominent region. Either:
- define several “phase reset bands”. Just return the per-band sums of the novelty function; or
- ~~use the total sum (like now), but find a prominent bin range and reset the phase only in it.~~
- We can probably never reset the bass. Probably.
- It is enough to split the signal to four bands probably, crossovers are at 250-820-2500 Hz
- ~~And then drop the band if cross-correlation is low or if a band's response in a softmaxed vector is lower than 0.25.~~
- Bass activation triggers everything
There is pre- and post-echo of size hop×stretch for EVERY vertical motion in spectrum.
- This is the property of PGHI (that's window side lobes), mini-pvdr does the same.
- Apparent only on extreme stretches
- And on high frequencies it's even echo in frequency, not only time.
gonum has vectorized complex and float operations. USE THIS.

Testing strategy

Various impulse train signals
LFO FM Sine
Vocals under hard saturation
Drum loops
Full tracks: pop, electronica, acoustica, black metal
Braid remastered soundtrack, phase resets WILL break the sound.
“Frederic — oddloop” brings the algorithm to the knees: phasiness and misplaced phase resets are obvious.

Streaming implementation plan

~~1. Switch time ramp and coefficient handling method from signal buffers to breakpoints.~~ Tried, it broke the algorithm. 2. Define stretch signal producer and sound producer (goroutines).

for {
  readnext(i) // blocking
  stretch(i, o)
  writenext(o)
}

Define Push and Pull which communicate with producers.
Use in cmd/nanowarp.

Known issues

No pitch modification. Requires a good resampler library, e.g. r8brain. Either port it or use through cgo.
No streaming support. All processing is in-memory with obvious RAM costs.
Slow.
Triples the sound on extreme (>4x) time stretches. The bane of all PVDR-based algorithms.
Modifies the tonal balance of the material. Elastiqué doesn't do that.

References

Documentation ¶

Index ¶

func DetectorNew(nfft, fs int, maxTransient, onsetevery int) (n *detector)
type Nanowarp
- func New(samplerate int, opts Options) (n *Nanowarp)
type Options

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DetectorNew ¶

func DetectorNew(nfft, fs int, maxTransient, onsetevery int) (n *detector)

Types ¶

type Nanowarp ¶

type Nanowarp struct {
	// contains filtered or unexported fields
}

func New ¶

func New(samplerate int, opts Options) (n *Nanowarp)

func (Nanowarp) DilatePeakSelectProcess ¶

func (n Nanowarp) DilatePeakSelectProcess(ar dspio.SignalReader, aw dspio.SignalWriter, stretch float64, ons chan onset) (err error)

func (Nanowarp) NoveltyCurveProcess ¶

func (n Nanowarp) NoveltyCurveProcess(ar dspio.SignalReader, aw dspio.SignalWriter) (err error)

func (*Nanowarp) Process ¶

func (n *Nanowarp) Process(lin, rin, lout, rout []float64, stretch float64)

type Options ¶

type Options struct {
	// Output scaled onsets only.
	Onsets bool

	// Set algorithm quality.
	//  -1: Don't perform transient separation, output raw PVDR without phase resets.
	//      4x overlap. Fastest and currently the smoothest with OK transient preservation
	//	and excellent tonal quality.
	//  0:  Extract transients and reset the phase on them. 4x overlap. Slow.
	//  1:  Same as 0, but with 8x overlap. Slowest with diminishing returns.
	Quality int

	// Time for which signal will be bypassed at any detected transient.
	//
	// If zero will be set to 30.
	TransientMs int

	// The size of the transient picking filter in milliseconds.
	//
	// If zero will be set to 250.
	PickingMs int

	// Measure the pooling size in output time, not in input time.
	// I.e. scale the pooling size with the stretch coefficient.
	ScalePool bool
}

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
nanowarp command
dspio Please, don't use this package in your projects.	Please, don't use this package in your projects.
oscope
wav module
waveform Package waveform provides a way to dump the pseudographic waveform view of an audio buffer.	Package waveform provides a way to dump the pseudographic waveform view of an audio buffer.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL