Talk:Discrete cosine transform

From Wikipedia, the free encyclopedia

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: A-BB+ Class Low Priority  Field: Applied mathematics
One of the 500 most frequently viewed mathematics articles.

Contents

[edit] Change in notation

I have started converting the notation in this article to match the changes that have been made to Discrete Fourier transform . The four changes are:

1. n becomes N: Change all instances of the symbol n to the new symbol N to represent the length of the sequences in both the time and frequency domains.

2. k becomes n: Change all instances of the symbol k to the new symbol n to represent the index variable for the time domain sequence.

3. j becomes k: Change all instances of the symbol j to the new symbol k to represent the index variable for the frequency domain sequence.

4. f becomes X: Change all instances of the symbol f to the new symobl X to represent the transformation of the data sequence in the frequency domain.

For more information, see Talk:Discrete Fourier transform.

-- Metacomet 16:09, 6 January 2006 (UTC)


I just tried typing in the formula for the DCT 2 and 3 into my compiler to see what would happen. It wasn't inverting properly. It looks like it should be $-\frac{1}{2} x_0$ instead of positive. Adding in the negative sign made it work for me.

Nope, you must have mistyped them, or have a bug in your program. (To see that your proposed minus sign isn't correct, consider the simple case of a transform of all 1's.) —Steven G. Johnson 22:15, 1 September 2006 (UTC)

[edit] less algebra, and more explanation please

This article is pretty poor IMHO. How about explaining what it is without resorting to algebra, and also explaining its applications... --Rebroad 10:21, 19 November 2006 (UTC)

Sorry, if you don't even know algebra you have no hope of understanding a DCT. Don't blame the article for your own limitations. —Steven G. Johnson 20:18, 19 November 2006 (UTC)
Actually I do not entirely agree. I am not a physicist yet I understand what a nuclear weapon is (without understanding its mechanics) and Dr. Hawking has managed to simplify General Relatively for the masses without resorting to advanced mathematics.
It would be nice if some qualified person would provide a better lay summary of DCT to explain what it does without needing to understand necessarily how it specifically does it. For example: DCT is used in lossy image and audio compression by converting bitmaps into frequency.... and so on.
This is not Mathpedia, it is an encyclopedia which is by definition supposed to contain content for the lay person, not serve as an academic textbook. I liken this to a dictionary that skips the definition to go right to usage and philology. Tolstoy143 - "Quos vult perdere dementat" (talk) 21:38, 7 March 2008 (UTC)

[edit] DCT for non-uniform distributed data points

Is it possible to compute a DCT if the data points are non-uniformly distributed? i.e. if I only know the function x(t) values at t=0.0, 0.1, 0.4, 0.5, 0.55, 1.0 or so? I don't want to interpolate to artificial data points to get uniform distributed data points. --89.49.215.47 20:09, 26 November 2006 (UTC) (de:Benutzer:RokerHRO)

Excellent question. First, the answer is yes, you can perform a DCT on data that is non-uniformly distributed. Basically, replace n / N with the actual grid point. But the more important question is, can we do this in a fast transform? There was groundbreaking research in the 1990s that said "yes" (for the case of the non-equispaced Fourier Transform), and hence probably "yes" for the cosine case.
For information on the non-equispaced FFT, google "NFFT". There are now several fast algorithms. Lavaka (talk) 00:10, 25 January 2008 (UTC)

[edit] Shifted variants of DCT

What is impact on resulting DCT if instead of calculating

X_k = \sum_{n=0}^{N-1} x_n \cos \left[\frac{\pi}{N} n k \right]

we calculate X_k = \sum_{n=0}^{N-1} x_n \cos \left[\frac{\pi}{N} (n + \frac{1}{N}) k \right]?
Am I calculating DCT I + \frac{1}{N}?
Is it possible to emphasize importance of DCT I-VIII variants over other shifted DCT?
—The preceding unsigned comment was added by Arkadi kagan (talkcontribs) 13:17, 30 January 2007 (UTC).

Shifting by 1/N does not result in a DCT according to any of the standard definitions, and is not sensible because only shifts by integers or half-integers preserve the sampling points under mirror flips. —Steven G. Johnson 14:19, 30 January 2007 (UTC)
I had to do some homework to get your point:)
e^{j 2 \pi \frac{m n}{N}} is orthogonal set of functions as well as e^{j 2 \pi \frac{(m+\frac{1}{2}) n}{N}}, but not e^{j 2 \pi \frac{(m+\frac{1}{N}) n}{N}} and not e^{j 2 \pi \frac{(m+\frac{1}{4}) n}{N}}
Thanks. Arkadi.
Having an orthogonal basis is not really the critical issue; there are lots of non-orthogonal bases out there that are very important. The reason that a DCT is a useful basis is that it corresponds to a DFT + even symmetry. This even symmetry is broken by the "grid" of sampling points if you don't place your origin at either a sample or halfway between two samples. —Steven G. Johnson 00:46, 1 February 2007 (UTC)

[edit] Sorry. I didn`t get you.

As far as I know orthogonality is the sole property that gives us Inverce DCT. If you know something not based on orthogonality, please point me on.
Relation to DFT gives us to calculate some properties easily. But once it is calculated, it have own uses. For example, signal processing usually works with real numbers.

This even symmetry is broken by the "grid" of sampling points if you don't place your origin at either a sample or halfway between two samples.

Even symetry is one property of DCT. But if some other transform have or have not even symetry what is the influence on DCT discussion? Notice that I am refering to your own definition:

Formally, the discrete cosine transform is a linear, invertible function F : RN -> RN (where R denotes the set of real numbers), or equivalently an N × N square matrix.

Thanks. Arkadi.

A square matrix can be invertible without being orthogonal; the inverse may just be more complicated to compute. And there are infinitely many invertible real matrices, so the fact that it is real is hardly the defining property either. The things that make the DCT valuable to signal processing and partial differential equations and Chebyshev approximation and numerous other applications (its energy compaction property, its diagonalization of convolutions with certain boundary conditions, its properties under differentiation, its relationship to a Fourier cosine series) all stem directly from its relationship to a DFT of even symmetry.

(Indeed, for the normalization choices shown in the article, the DCTs are not orthogonal, although they are related to orthogonal matrices via multiplication by diagonal matrices.)

—Steven G. Johnson 16:58, 1 February 2007 (UTC)

If so, may be I don`t know what is exactly Discrete Cosine Transform.
A square matrix can be invertible without being orthogonal; the inverse may just be more complicated to compute.
In case you have something simple like 2 * I you can know its inverse easily - it`s \frac{1}{2} * I. Is it Discrete Cosine Transform?
Thanks. Arkadi.
No. "Discrete cosine transforms" are specific invertible linear transformations corresponding to DFTs of real-even data. They are not just any one of the infinitely many other possible invertible real matrices, nor are they any one of the infinitely many possible real orthogonal matrices. I don't understand how you could have read what I wrote above to conclude that 2I is a DCT. —Steven G. Johnson 20:57, 3 February 2007 (UTC)

I will not try to prove that 2I is a DCT :)
I want to locate precise boundary: what transform can be called DCT and what transform can`t. What I am seeking for is a precise definition of DCT. When I have such a definition, I can (or can`t) see all the mentioned above properties - either obvious or complex. Arkadi.

The article already has a long discussion of how the 8 types of DCT arise from a DFT of real-symmetric data. —Steven G. Johnson 17:27, 4 February 2007 (UTC)

I think I have recognized the source of my personal ignorance in all this topic. I see that none of DCT I-VIII do not have orthogonal basis function:
cos \left( \frac{2 \pi}{N} n k \right) is indeed orthogonal basis.
However, cos \left( \frac{\pi}{N} n k \right) is not.
Despite this, I do see some superior properties of basis cos \left( \frac{\pi}{N} n k \right). For example, spectrum calculated by DCT I (boundary effect is neglectable here) gives cleaner picture than the same spectrum, calculated by "orthogonal basis" DCT.

Please, help me to understand calculation of Inverse DCT given non-orthogonal basis like cos \left( \frac{\pi}{N} n k \right).
Thanks. Arkadi.

You're incorrect to state that they do not have an orthogonal basis. All of the DCTs I-VIII, appropriately normalized, are orthogonal matrices (i.e. form an orthogonal basis). (The relationship between DCT and DFT is analogous to the relationship between the cosine series and the Fourier series. The factor of 2 in the cosine argument is due to the fact that you can cut the integration interval in half by the even symmetry.) There are many fine textbooks on discrete cosine transforms (e.g. the Rao and Yip book) that you can read to help clear up any misunderstandings—Wikipedia talk pages are not someplace where you can expect to get extensive tutoring. —Steven G. Johnson 19:09, 6 February 2007 (UTC)
After reading this article from a beginners point of view, and despite I have used and implemented various DCT variants, that were used in proiduction priduct form image and audio processing (since 1984!), I must admit that this article is a real nightmare, that just mixes opinions about what are considered the current strategies, and what should be first a simple but solid definition.
This article fails in all areas, is extremely badly structured. It should first present the simple DCT, present how it is computed, what it gives by an example, and then present how it can be inverted. Then the properties should be studied, and later the link to the more general theory of Discret Fourier transforms may be added, as well as variants of the algorithm.
I have never heard the terminology of "even" and "odd" ends used everywhere here without any definition ; I never needed that terminologogy to study DCT algorithms since 20 years, and I bet this comes from an abestract taken from an advanced paper.
Most people won't need that terminology.
And I am completely opposed to your opinion here: Wikipedia should be the place for tutorial. Sending people to other resources first is really not the role of Wikipedia, which is to introduce any subject with the simplest approach, and then add some development in other more detailed articles, and then send them to other external advanced resources.
This articles fails everywhere, it is not made for Wikipedians, it's just a badly written abstract of some external paper!
If I had some better English skills (this is not my native language, although I can understand it without difficulty), I would delete it and restart it completely from scratch, with a much better structure, and much less assumption about readers knowledge ! It is unnecessarily complicated, despite a DCT can be very simple to understand. Thanks, when I learned it, there was still no Internet, no Wikipedia. But academi resources were much better written and much more accessible than this very poor article. 86.221.100.174 21:57, 16 February 2007 (UTC)

Perhaps, I can not agree about orthogonality of base function cos \left( \frac{\pi}{N} n k \right).
We can check this for case N = 2 (or N = 4 if you want).
However, inverse transform can be easily computed given this trick:
In case xn = x2Nn

\sum_{n=0}^{2N-1} \left[ x_n cos \left( \frac{nk}{2N} 2\pi \right) \right] = 2 \sum_{n=0}^{N-1} \left[ x_n cos \left( \frac{nk}{N} \pi \right) \right].

I suppose, similar conversions are possible for the rest of boundary assumptions.

You have the incorrect limits on the second sum (which should go from 0 to N, for an even extension), and there is no factor of 2 on the n=0 and n=N cases. However, the general idea is in the right direction: all 8 types of DCT can be converted to larger DFTs of real-even data with appropriate symmetries (you can put a sine term into the first sum to get a DFT, because the odd sine terms will sum to zero for even data). ...and because the DFT is unitary (with appropriate normalization), all of the DCT types are orthogonal (with appropriate scaling). Again, however, it seems like you need more extensive tutoring than you can reasonably expect on a Talk page (or from an encyclopedia article). —Steven G. Johnson 17:22, 11 February 2007 (UTC)

[edit] Formal definition

Under the formal definition, it is not stated what k represents. Or have I missed something? --Sam Pablo Kuper 23:41, 4 April 2007 (UTC)

Fixed. —Steven G. Johnson 20:15, 5 April 2007 (UTC)

[edit] Would be nice if someone added an example

Would be nice if someone added an example of how the DCT saves memory, by showing an 8x8 color block and going thru the math. Thanks, Daniel.Cardenas 18:38, 5 April 2007 (UTC)

See JPEG. —The preceding unsigned comment was added by Stevenj (talkcontribs) 20:12, 5 April 2007 (UTC).

[edit] n=1?

Great page, really helped me out. The summation given in the formula for DCT-III starts at n=1, should this be n=0?

No, the n=0 term is handled by a separate term since its normalization is different. —Steven G. Johnson 17:44, 11 April 2007 (UTC)

[edit] Misleading example

The example of the DFT and DCT of the image of the dandelion clock is highly misleading, because the original image was a JPEG-compressed image which will have thrown away information from the high-frequency DCT coefficients - so of course the DCT will have very little energy there! I think a new example should be created based on a genuine, non-lossy-compressed original. Rhebus 11:35, 23 May 2007 (UTC)

it is not exactly correct: JPEG compression removes higher frequencies in each 8x8 block, while the picture shows the transform of the whole image; then JPEG causes the blocking artifact that introduces sharp changes in intensity on the edge of blocks, thus introducing other higher frequency. In any case, I was thinking about re-making the picture using Image:Lichtenstein img processing test.png, since I have used the same picture for other examples (see the description of the picture). You might join the discussion about standard test images on the Electronics wikiproject Alessio Damato 14:25, 23 May 2007 (UTC)
just to be sure I made a test: I have seen the DCT and FFT of a lossless image and of the same image heavily JPEG compressed (10% quality). I have noticed that the compressed picture had a wider spectrum than the original (I think because of the blocking effect). You can try to do it as well with any picture: you can find the Matlab code in the description page of the picture. Alessio Damato 14:33, 23 May 2007 (UTC)
Ah yes, of course you're right that JPEG only locally removes higher frequencies. However I still think that the example does not show what it claims to show, and I'd definitely like to see another. Rhebus 16:21, 24 May 2007 (UTC)
any suggestion?! what's wrong with that, then? :-) Alessio Damato 20:55, 24 May 2007 (UTC)
Well it's my understanding that the image is trying to illustrate that because the DCT is good at compressing the image energy into one small corner of the transform-space, it is useful for image compression. But, in order to show this, you need to demonstrate that DCT does this on an uncompressed image. Otherwise all you demonstrate is that DCT is effective at compressing the energy in images which have already been JPEG-encoded. Rhebus 14:53, 29 May 2007 (UTC)
I also note that the Liechtenstein test image is also JPEG-based. I accept that it isn't a terrible test image, and you have taken steps to mitigate JPEG artifacts, but surely the best test image deals with real, unprocessed data? Rhebus 14:58, 29 May 2007 (UTC)
well, in general I agree that real, unprocessed data have the highest quality, but I'm not sure we really need it for our purpose. I mean: a test image has to have some features (sharp edges, flat surfaces, details), like Liechtenstein, moreover it should have power in any frequency, and I think Liechtenstein has. If you hadn't known the picture was generated from a JPEG, would you have been able to realize it was originally stored as a JPEG?
I could go out for a walk and take a picture with my camera, storing it with TIFF to avoid any artifact, but I don't think it would be a featured picture... do you think we should try to take a picture by ourself to be used as test?
(I think this discussion should move to the wikiproject about electronics, since it's a topic about a more general subject than DCT) Alessio Damato 21:46, 29 May 2007 (UTC)

[edit] Orthogonal

This page seems to mention changing scale factors to make the transform orthogonal. I am fairly sure the transform is orthogonal without the scaling factors, and that the scaling is done to make the transform Orthonormal. —Preceding unsigned comment added by 208.215.227.190 (talk) 18:48, 7 September 2007 (UTC)

Whilst the transform is orthogonal with any scale factor, the matrix is only orthogonal (by the definition given at Orthogonal matrix) if each of its columns is of unit magnitude. Oli Filth 19:17, 7 September 2007 (UTC)

[edit] figure doesn't match text

unless I'm missing something, the figure that lists DCT I, II, III, and IV doesn't match with the text. The text describes the DCT-II as "...where the even-indexed elements are zero. That is, it is half of the DFT of the 4N inputs y_n, where y_{2n}=0 ... ". This matches DCT-III in the figure. Which is correct? BTW, I like the figure a lot, very intuitive. Lavaka (talk) 03:14, 18 November 2007 (UTC)

The part that says "y_{2n}=0" refers to the input that would have to be provided to do an equivalent DFT, whereas the diagram shows the way that the DCT's input is effectively interpreted. Also, note that y_{2n}=0 means "every even element is zero", rather than "the element past the end of the sequence is zero", which is what can be see in the diagram of DCT-III's input (I am assuming that this was your interpretation). My best guess is that the zeros are added in order to allow the mid-point to fall at an integer index. This is required because the boundaries are fractional (n=-1/2 and n=N-1/2). Note that the diagram for DCT-III clearly shows that the boundaries are not fractional - they are the first value of the sequence (n=0), and the value past the end of the sequence (n=N) which is set to zero. --82.18.14.143 (talk) 12:50, 3 June 2008 (UTC)

[edit] SA-DCT

SA-DCT: Shape adaptive DCT. A new section inside DCT or a new article about SA-DCT should be created. Diving hawk (talk)


[edit] Default DCT Page

DCT should go to Discrete Cosine Transform and then mention up top DCT can also refer to DCT File Format. —Preceding unsigned comment added by 203.129.33.32 (talkcontribs) 00:21, 15 April 2008

There are approximately 15 entries on the DCT disambiguation page. There's no reason to suggest that this article is the most prominent or relevant. Oli Filth(talk) 23:26, 14 April 2008 (UTC)