Short-time Fourier transform

From Wikipedia, the free encyclopedia

The short-time Fourier transform (STFT), or alternatively short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.

Simply described, in the continuous-time case, the function to be transformed is multiplied by a window function which is nonzero for only a short period of time. The Fourier transform (a one-dimensional function) of the resulting signal is taken as the window is slid along the time axis, resulting in a two-dimensional representation of the signal. Mathematically, this is written as:

\mathbf{STFT} \left \{ x( \ ) \right \} \equiv X(\tau, \omega) = \int_{-\infty}^{\infty} x(t) w(t-\tau) e^{-j \omega t} \, dt

where w(t) is the window function, commonly a Hann window or gaussian "hill" centered around zero, and x(t) is the signal to be transformed. X(τ,ω) is essentially the Fourier Transform of x(t)w(t-τ), a complex function representing the phase and magnitude of the signal over time and frequency. Often phase unwrapping is employed along either or both the time axis, τ and frequency axis, ω, to suppress any jump discontinuity of the phase result of the STFT. The time index τ is normally considered to be "slow" time and usually not expressed in as high resolution as time t.

In the discrete time case, the data to be transformed could be broken up into chunks or frames (which usually overlap each other). Each chunk is Fourier transformed, and the complex result is added to a matrix, which records magnitude and phase for each point in time and frequency. This can be expressed as:

\mathbf{STFT} \left \{ x[ \ ] \right \} \equiv X(m,\omega) = \sum_{n=-\infty}^{\infty} x[n]w[n-m]e^{-j \omega n}

likewise, with signal x[n] and window w[n]. In this case, m is discrete and ω is continuous, but in most typical applications the STFT is performed on a computer using the Fast Fourier Transform, so both variables are discrete and quantized. Again, the discrete-time index m is normally considered to be "slow" time and usually not expressed in as high resolution as time n.

The magnitude squared of the STFT yields the spectrogram of the function:

\mathrm{spectrogram} \left \{ x( \ ) \right \} \equiv \left| X(\tau, \omega) \right|^2

Contents

[edit] Inverse STFT

The STFT is invertible, that is, the original signal can be recovered from the transform by the Inverse STFT.

[edit] Continuous-time STFT

Given the width and definition of the window function w(t), we initially require the height of the window function to be scaled so that

\int_{-\infty}^{\infty} w(\tau) \, d\tau  = 1.

It easily follows that

\int_{-\infty}^{\infty} w(t-\tau) \, d\tau  = 1 \quad \forall \ t

and

x(t) = x(t) \int_{-\infty}^{\infty} w(t-\tau) \, d\tau  = \int_{-\infty}^{\infty} x(t) w(t-\tau) \, d\tau.

The continuous Fourier Transform is

X(\omega) = \int_{-\infty}^{\infty} x(t) e^{-j \omega t} \, dt.

Substituting x(t) from above:

X(\omega) = \int_{-\infty}^{\infty} \left[ \int_{-\infty}^{\infty} x(t) w(t-\tau) \, d\tau \right] \, e^{-j \omega t} \, dt
= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x(t) w(t-\tau) \, e^{-j \omega t} \, d\tau \, dt.

Swapping order of integration:

X(\omega) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x(t) w(t-\tau) \, e^{-j \omega t} \, dt \, d\tau
= \int_{-\infty}^{\infty} \left[ \int_{-\infty}^{\infty} x(t) w(t-\tau) \, e^{-j \omega t} \, dt \right] \, d\tau
= \int_{-\infty}^{\infty} X(\tau, \omega) \, d\tau.

So the Fourier Transform can be seen as a sort of phase coherent sum of all of the STFTs of x(t). Since the inverse Fourier transform is

x(t)  = \frac{1}{2 \pi} \int_{-\infty}^{\infty} X(\omega) e^{+j \omega t} \, d\omega,

then x(t) can be recovered from X(τ,ω) as

x(t)  = \frac{1}{2 \pi} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} X(\tau, \omega) e^{+j \omega t} \, d\tau \, d\omega.

or

x(t)  = \int_{-\infty}^{\infty} \left[ \frac{1}{2 \pi} \int_{-\infty}^{\infty} X(\tau, \omega) e^{+j \omega t} \, d\omega \right] \, d\tau.

It can be seen, comparing to above that windowed "grain" or "wavelet" of x(t) is

x(t) w(t-\tau)  = \frac{1}{2 \pi} \int_{-\infty}^{\infty} X(\tau, \omega) e^{+j \omega t} \, d\omega.

the inverse Fourier transform of X(τ,ω) for τ fixed.

[edit] Discrete-time STFT

[edit] Resolution issues

One of the downfalls of the STFT is that it has a fixed resolution. The width of the windowing function relates to the how the signal is represented — it determines whether there is good frequency resolution (frequency components close together can be separated) or good time resolution (the time at which frequencies change). A wide window gives better frequency resolution but poor time resolution. A narrower window (said to be compactly supported) gives good time resolution but poor frequency resolution. These are called narrowband and wideband transforms, respectively.

Comparison of STFT resolution.  Left has better time resolution, and right has better frequency resolution
Comparison of STFT resolution. Left has better time resolution, and right has better frequency resolution

This is one of the reasons for the creation of the wavelet transform (or multiresolution analysis in general), which can give good time resolution for high-frequency events, and good frequency resolution for low-frequency events, which is the type of analysis best suited for many real signals.

This property is related to the Heisenberg uncertainty principle, but it is not a direct relationship. The product of the standard deviation in time and frequency is limited. The boundary of the uncertainty principle (best simultaneous resolution of both) is reached with a gaussian window function.

[edit] Example

Using the following sample signal that has 4 frequencies (10, 25, 50, 100 Hz) that are never present at the same time:

{x(t) = \cos (2\pi 10t)}\, for 0 \le t < 0.1s
{x(t) = \cos (2\pi 25t)}\, for 0.1s \le t < 0.2s
{x(t) = \cos (2\pi 50t)}\, for 0.2s \le t < 0.3s
{x(t) = \cos (2\pi 100t)}\, for 0.3s \le t < 0.4s

The following spectrograms were produced:

25 ms window
25 ms window
125 ms window
125 ms window
375 ms window
375 ms window
1000 ms window
1000 ms window


The 25 ms window allows the exact time the signals change to be identified but the frequencies are difficult to identify. At the other end of the scale, the 1000 ms window allows the frequencies to be precisely seen but the time between frequency changes is blurred.

[edit] Explanation

It can also be explained with reference to the sampling and Nyquist frequency.

Take a window of N samples from an arbitrary real-valued signal at sampling rate fs . Taking the Fourier transform produces N coefficients. Of these coefficients only the first N/2 are useful (the others are just a mirror image as this is a real valued signal).

These N/2 coefficients represent the frequencies 0 to fs/2 (Nyquist), meaning that two consecutive coefficients are spaced apart by

f_\mathrm{s} / 2 \over (N / 2) - 1 Hz

For large N this approximates to \frac{f_\mathrm{s}}{N}

To increase the frequency resolution of the window the frequency spacing of the coefficients needs to be reduced. There are only two variables, but decreasing fs (and keeping N constant) will cause the window size to increase — since there are now fewer samples per unit time. The other alternative is to increase N, but this again causes the window size to increase. So any attempt to increase the frequency resolution causes a larger window size and therefore a reduction in time resolution — and vice-versa.

[edit] Application

A STFT being used to analyze a song across time.
A STFT being used to analyze a song across time.

STFTs as well as standard fourier transforms and other tools are frequently used to analyze music. The image shows frequency on the horizontal axis, with the lowest frequencies at left, and the highest at the right. The height of each bar (augmented by color) is the amplitude of the frequencies within that band. And the depth dimension is time, where each new bar was a separate distinct transform. Audio engineers use this kind of visual to gain information about an audio sample, for example to locate the frequencies of specific noises (especially when used with greater frequency resolution) or to find frequencies which may be more or less resonant in the space where the signal was recorded. This information can be used for equalization or tuning other audio effects.

[edit] New Directions

A new, improved implementation of STFT has recently been devised by Fitz et al., and has found numerous applications in speech signal processing and signal roughness analysis (e.g. Vassilakis and Fitz, 2007). The algorithm, based on reassigned bandwidth-enhanced modeling (Fitz & Haken, 2002; Fitz et al., 2003; Fulop & Fitz, 2006a,b, 2007) incorporates an automatic spectral peak-picking process to determine which frequency analysis bands correspond to spectral components of the analyzed signal.

Frequency reassignment works differently from traditional FFT and has more in common with phase vocoder methods.

For example, as in traditional STFT, frequency resolution of 10Hz will not be able to resolve frequency components less than 10Hz apart. But, unlike traditional STFT, the precision of the frequency values returned will not be limited by this 10Hz "bandwidth," since the frequency band boundaries are floating rather than being fixed. This a) fine-tunes the frequencies reported and b) practically eliminates spectral smearing, since the method ensures that the assumption of all energy being located at the high-frequency end of an analysis band can be fulfilled. Similarly, as in traditional STFT, a given analysis window length determines the length of the shortest signals that can be reliably analyzed. But, unlike traditional STFT, the temporal resolution of a signal's spectral time-profiles will not be limited by this "window length," since the frequency and amplitude estimates are not time-window averages but instantaneous at the time-window's center. This a) pin-points time with much higher precision than implied by the window length and b) practically eliminates temporal smearing, since the spectra estimated through time-window overlaps do not involve averaging over the entire analysis window (Fulop & Fitz, 2006a,b, 2007).

In practical terms, spectral analysis results are fine-tuned through the incorporation of a dual STFT process: Frequency values reported correspond to the time derivative of the argument (phase) of the complex analytic signal representing a given frequency bin. Similarly, time values reported correspond to the frequency derivative of the STFT phase, defining the local group delay and applying a time correction that pinpoints the precise excitation time.

Therefore, the Reassigned Bandwidth-Enhanced Additive Model shares with traditional sinusoidal methods the notion of temporally-connected parameter estimates of spectral components. By contrast, however, reassigned estimates are non-uniformly distributed in both time and frequency, yielding greater temporal and frequency precision than is possible via conventional additive techniques. Parameter envelopes of spectral components are obtained by following ridges on a time-frequency surface, using the reassignment method (Auger & Flandrin,1995) to improve the time and frequency estimates for the envelope breakpoints.

In addition, bandwidth enhancement expands the notion of a spectral component, permitting the representation of both sinusoidal and noise energy with a single component type. Reassigned bandwidth-enhanced components are defined by a trio of synchronized breakpoint envelopes, specifying the time-varying amplitude, center frequency, and noise content (or bandwidth) for each component. The amount of noise energy represented by each reassigned bandwidth-enhanced spectral component is determined through bandwidth association, a process of constructing the components' bandwidth envelopes.

[edit] Sources (partial list)

  • Auger, F. and Flandrin, P. (1995). "Improving the readability of time frequency and time scale representations by the reassignment method," IEEE Transactions on Signal Processing 43: 1068-1089.
  • Fitz, K. and Haken, L. (2002). "On the use of time-frequency reassignment in additive sound modeling," Journal of the Audio Engineering Society 50(11): 879-893.
  • Fitz, K., Haken, L., Lefvert, S., Champion, C., and O'Donnell, M. (2003). "Cell-utes and flutter-tongued cats: Sound morphing using Loris and the Reassigned Bandwidth-Enhanced Model," Computer Music Journal 27(4): 44-65.
  • Fulop, S.A. and Fitz, K. (2006a). "Algorithms for computing the time-corrected instantaneous frequency (reassigned) spectrogram, with applications," J. Acoust. Soc. Am. 119(1): 360-371.
  • Fulop, S.A. and Fitz, K. (2006b). "A spectrogram for the twenty-first century," Acoustics Today 2(3): 26-33.
  • Fulop, S.A. and Fitz, K. (2007). "Separation of components from impulses in reassigned spectrograms," J. Acoust. Soc. Am. 119(1): 1510-1518.
  • Vassilakis, P.N. and Fitz, K. (2007). SRA: A Web-based Research Tool for Spectral and Roughness Analysis of Sound Signals. Supported by a Northwest Academic Computing Consortium grant to J. Middleton, Eastern Washington University.

[edit] See also

Other time-frequency transforms

[edit] External links

In other languages