Wikipedia:Reference desk/Archives/Mathematics/2008 January 27

From Wikipedia, the free encyclopedia

Mathematics desk
< January 26 << Dec | January | Feb >> January 28 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


Contents


[edit] January 27

[edit] Inverse Trigonometric Function or Arc...

Hello. In the math world, can arcsine, arccosine, arctangent, etc. or sin-1, cos-1, tan-1, etc. be officially called inverse trigonometric functions? My teacher says that inverse means one over something (e.g. x-2 = \frac{1}{x^2}). Can inverse sine be confused for \frac{1}{\sin A} when arcsine A is meant? Thanks in advance. --Mayfare (talk) 00:51, 27 January 2008 (UTC)

sin-1 can be confused for \frac{1}{\sin A} by someone who is unfamiliar with the notation. Your teacher is talking about the multiplicative inverse, also called the reciprocal, whereas sin-1 is an inverse function, a different meaning of inverse. --Evan Seeds (talk)(contrib.) 00:55, 27 January 2008 (UTC)
Yes, You're correct. But One thing I want to explain to you is the differences between Cosine and cosine which is capitalized one and non-capitalized one. Capitalized one Cosine equals to Inverse of Cosine = Arc Cosine. Daniel5127 (talk) 06:48, 27 January 2008 (UTC)
I have never heard of this distinction being used to differentiate between cosx and arccosx; in fact, the only reference I can find for this assertion is that next to no one uses this convention. Indeed, it would make title-casing in trigonometry textbooks, literature, etc. almost impossible. There are, however, non-standard symbolic distinctions between arccos vs. Arccos (in terms of principal values vs. multivalues) that are discussed in the first paragraph here. --Kinu t/c 08:28, 27 January 2008 (UTC)
When learning trig, I was taught that sin − 1A always meant arcsin(A) as stated above. There is never any doubt that arcsin is meant since \frac{1}{\sin A} can always be written as csc(A) (cosecant function) and similarly, the reciprocal of any other trig function has a specifically named function also. —Preceding unsigned comment added by Spinningspark (talkcontribs) 13:26, 27 January 2008 (UTC)
You're correct -- but back in my first year of college I lost points on a test because the grader insisted on interpreting "sin-1 x" as "1/(sin x)", despite the fact that the context made it clear that I meant arcsin x (in particular, the next line was only correct if arcsin x was meant). I found it remarkably difficult to convince the grader to give me the points; he argued that if I wasn't to lose points for the correctness of my answer, then I would lose points for misuse of notation. Fortunately I haven't met such an attitude since. Tesseran (talk) 05:31, 30 January 2008 (UTC)
Perhaps your grader was unfamiliar with the notation and was confusing "sin-1 x" with "(sin x)-1", the first being the inverse function and the second the multiplicative inverse. dbfirs 19:50, 30 January 2008 (UTC)

[edit] Famous mathematician

Do you know of any famous mathematician whose surname starts with a Q or a X? Thanks. Randomblue (talk) 01:56, 27 January 2008 (UTC)

I personnally can't think of any, but maybe this can help. -- Xedi (talk) 02:10, 27 January 2008 (UTC)
(edit conflict) There's Daniel Quillen, for a start. Michael Slone (talk) 02:11, 27 January 2008 (UTC)
Not really a surname, but Xenocrates starts with an X. —Keenan Pepper 08:29, 27 January 2008 (UTC)

Thanks! Randomblue (talk) 11:51, 27 January 2008 (UTC)

The page List of mathematicians has mathematicians by letter. kfgauss (talk) 19:37, 27 January 2008 (UTC)
Does Willard Van Orman Quine not count as a mathematician? —Tamfang (talk) 23:34, 28 January 2008 (UTC)

[edit] Finding a PRNG's formula , according to numbers

is it possible to finding a PRNG serie's formula according to 100 (or more) numbers ? here is the example that i would like do : 30,24,10,21,3,25,17,16,34,0,31,26,32,20,3,32,16,9,8,33,20,12,19,20,22 these numbers are in the 0-36 and they created by a random number generator C++ program. i would like to find how it create numbers ( what's the this PRNG's formula ) ? is it possible ? can you share your ideas and formulas please ? thank you, best regards... Adam McCansey —Preceding unsigned comment added by 81.215.240.54 (talk) 12:08, 27 January 2008 (UTC)

If the PRNG is any good, you shouldn't be able to find the formula based on a few numbers (or even many numbers). Of course, this still leaves the possibility that the PRNG isn't good and then it just might be possible, but probably not easy. -- Meni Rosenfeld (talk) 12:15, 27 January 2008 (UTC)
It depends what the intended use of the PRNG is for whether or not this will be easy. A PRNG I used for many years to test data transmission lines repeated the pattern after only 1023 bits. Obviously, this would be very easy to crack and would be useless in cryptography. However, it was fine for my purposes, all I really needed was a pattern other than 00000, 11111 or 10101 and that it throws in an isolated 1 or 0 occasionally. It might help you to solve this if you have an understanding of how the generator is physically implemented. My one used a shift register whose outputs were combined through some simple logic gates and then fed back into the shift register input. A knowledge of the length of the shift register and the boolean function of the gates will yield the PR pattern. Working back the other way you would need to trial various logic gates to see if they yielded the measured pattern. If you know the repeat length Log2 will yield the shift register size. Apologies to all mathematicians for drifting slightly off-topic. SpinningSpark 13:48, 27 January 2008 (UTC)

i think need a usefull formula or math ideas instead of words - Adam McCansey —Preceding unsigned comment added by 81.215.240.54 (talk) 14:21, 27 January 2008 (UTC)

Okay, here's a forumla that will give you the nth number in the series:
( − 10 + n)( − 241323281764463133291184128000 + n(868244504509004696474075136000 + n( − 1370756191502036313362211102720 + n(1281909032514630191333278319616 + n( − 805175857771887357454954678272 + n(364141711805257776591924793920 + n( − 123898012122969521170531722048 + n(32674974094141905624047308208 + n( − 6821375650049760342894729832 + n(1144492871812646302268468160 + n( − 156005076501944117023679678 + n(17405019675576595538537548 + n( − 1596622335191236964517897 + n(120662309969455735293275 + n( − 7507882647468999780553 + n(383338269076487883573 + n( − 15956063761534659122 + n(535762457490591790 + n( − 14281899815963568 + n(295083237887798 + n( − 4553594496877 + n(49370706855 + n( − 335341433 + 1073257n))))))))))))))))))))))) / 25852016738884976640000
-- Meni Rosenfeld (talk) 14:43, 27 January 2008 (UTC)
I remember way back in school, someone wrote a program to print a dot on a dot matix printer to according to successive values of a PRNG. The cyclic nature of the PRNG soon became visually apparent. So that would be the first check: what is the period of the sequence? For modern algorithms this wont work as the period will be very big. You may want to Decompile the code, which may be easier than trying to find the sequence. --Salix alba (talk) 14:39, 27 January 2008 (UTC)
The idea behind a PRNG is that is produces a sequence of numbers that look like randomly chosen numbers. If you find the formula, then the numbers become predictable unlike randomly chosen numbers. So you are not expected to crack the code. From any finite sequence of numbers you can create many algorithms that reproduces this sequence, for example by starting with this sequence and proceeding with the result of any PRNG of your choice, or simply with zeroes. This works, even if it is cheating. Bo Jacoby (talk) 14:48, 27 January 2008 (UTC).

does anyone have an idea about Meni Rosenfeld's formula ? —Preceding unsigned comment added by 81.215.240.54 (talk) 14:57, 27 January 2008 (UTC)

It was sarcastic. It does give the correct values for those elements in the sequence you have provided (n from 1 to 25), but not for the subsequent values. It was inspired by my annoyance from you ignoring our advice that for a good general-purpose PRNG this is impossible at worst, and depends on additional information about the program and its implementation at best, and insisting on a simple "cookie-cutter" formula. -- Meni Rosenfeld (talk) 15:06, 27 January 2008 (UTC)

It is a polynomial in the number of terms in your sequence, whose roots are your original sequence. I think Meni might be having a gentle joke with you. SpinningSpark 15:07, 27 January 2008 (UTC) Oops, sorry, I had not noticed Meni had already replied. SpinningSpark —Preceding comment was added at 15:08, 27 January 2008 (UTC)

I don't think you didn't notice it, rather that I have posted it while you were typing. You didn't get an ec because you added a blank line. -- Meni Rosenfeld (talk) 15:28, 27 January 2008 (UTC)
Basically if you are lucky they might be using a linear congruential generator with formula: X_{n+1} = \left( a X_n + c \right) \bmod m, if you are unlucky they are using a cryptographically secure pseudorandom number generator which are designed so that you can't predict the next term from those previously. --Salix alba (talk) 15:23, 27 January 2008 (UTC)
It can't be that simple, since 20 occurs thrice, each time with a different successor.  --Lambiam 01:33, 28 January 2008 (UTC)
Well spotted. There is a complication in that the m above may be much bigger than 36, typically 232. So the LCG generates numbers between 0 and 232 the next stage is then to reduce this to the desired range Yn = Xnmod 36 discarding some values to ensure uniformity. (see for example [1]). So it is possible for 20 to appear a number of times, but at some point the sequence 20,3,32,16,9,8,33,20 will reappear. --Salix alba (talk) 09:53, 28 January 2008 (UTC)

Just to say it explicitly - if your random numbers start to repeat over 36 values then you can find the formula using those 36 values (as shown above) - if the formula repeats over more than 36 values and you only have 36 values then you can't get the formula.. Obviously this applies for values other than 36. So if you want to find the formula - write a program to search for the point at which the random numbers repeat - assuming this ever happens (it's not neccessary that they will but I'd guess that most common (fast) methods will)77.86.108.68 (talk) 18:49, 27 January 2008 (UTC)

If the PRNG does not use an external source of randomness (in which case it wouldn't be a PRNG) and uses only a bounded amount of memory registers, say N bits, then the sequence is guaranteed to repeat eventually in a cycle of length ≤ 2N. For finding this cycle, see Cycle detection.  --Lambiam 12:21, 28 January 2008 (UTC)

empty words only... no formula and no idea... —Preceding unsigned comment added by 78.171.57.143 (talk) 13:59, 28 January 2008 (UTC)

Something here is definitely empty. It's not the words, though. -- Meni Rosenfeld (talk) 15:43, 28 January 2008 (UTC)

Meni Rosenfeld , do you have to always talk empty? and have to always answer all questions? you seems like a empty glass. i think noone need your empty words and ideas... please just close ur Pc.. —Preceding unsigned comment added by 78.171.57.143 (talk) 13:12, 29 January 2008 (UTC)

You got it all wrong. My PC is already shut down. I am editing Wikipedia telepathically. -- Meni Rosenfeld (talk) 13:16, 29 January 2008 (UTC)
in all fairness finding the algorhthym would be pretty difficult. The easiest way would be to examine the code that produces it.87.102.67.145 (talk) 13:32, 29 January 2008 (UTC)

[edit] Utility of money

What type of formula for total utility of income or net worth is most popular or most theoretically sound? An nth root? A logarithm? A bounded function? Or something else? NeonMerlin 13:33, 27 January 2008 (UTC)

I haven't seen many treatments of this, and I don't recall any particular function being proposed as a candidate. I'd say the function should be bounded. If you have an arbitrarily large amount of usable money, at best this means that you can decide what every person on earth will do - and this has a finite utility. -- Meni Rosenfeld (talk) 14:02, 27 January 2008 (UTC)
As the actual amount of money is finite, the utility function does not have to be theoretically bounded for infinite amount of money. A resonable model is f(x)=xa, where a<1 when money is less valuable to the rich than to the poor. Bo Jacoby (talk) 14:09, 27 January 2008 (UTC).
Utility#Utility of Money has some pointers, apparently bounded and asymmetric about the origin, concave in the positive region. --Salix alba (talk) 14:19, 27 January 2008 (UTC)
I think governments can create new money how they see fit. In normal circumstances this will create inflation and cause all sorts of problems, but the point remains that the US government can decide to create 1021 dollars and give them to some person.
Disregarding all of this, a logarithmic model makes much more sense. If a person's entire net worth is 100$ and he gains another 100$, the increase in his utility is comparable to a person whose worth is 1M$ when he gains another 1M$. -- Meni Rosenfeld (talk) 14:24, 27 January 2008 (UTC)
James Heckman has a paper where he uses a Box-Cox transformation to answer this question for some particular data. Log comes out best. Plus, as Meni Rosenfeld points out, it has a much stronger theoretical framework. Pdbailey (talk) 16:20, 27 January 2008 (UTC)
The use of y=log(x) for utility function has the following unpleasant consequences. The utility of one dollar is zero, and the utility of zero dollars is minus infinity. The use of the power function y=xa has the following pleasant consequences. The utility of zero dollars is zero, and the special case y=x, appearing for a=1, means that the utility of money equals the amount of money, and a small perturbation, a=0.95, means that the marginal utility of extra money, dy/dx, is 5% less than the average utility, y/x. Bo Jacoby (talk) 23:02, 27 January 2008 (UTC).
So, what are you saying? That the difference between 100$ and 200$ is the same as the difference between 1000000$ and 1000155$? That's absurd - the former is life-changing and the latter is barely noticeable. Your reservation about the extreme low end is irrelevant, since we can barely define what it means to have so little assets (keeping in mind that the potential to do work, even a demeaning one, is an asset), and we needn't concern ourselves with it. -- Meni Rosenfeld (talk) 14:49, 29 January 2008 (UTC)
Bo Jacoby, I read your two "unpleasant consequences" of the log utility function, and I don't understand why either is unpleasant. The utility of consuming zero (for me) is negative infinity, since I could not eat, breathe, et cetera, and would die immediately. As far as having a utility of zero, who cares. Milton Friedman used a utility function that was everywhere negative. If you think of utility as being a representation of preferences, then the answer is that it doesn't matter what function you use, any monotonic transformation of it will yield a function that represents the same preferences. Which means that we should refocus the question--why does he want to use a utility function for in the first place. It is typical to use one that follows von Neumann and Morgenstern's assumptions for money, but their goals may not overlap with Meni Rosenfeld's. Pdbailey (talk) 23:58, 29 January 2008 (UTC)
I am assuming we are aiming for a von Neumann-Morgenstern utility function (otherwise everything mentioned so far is identical, as they are montonous transformations of each other). I thought about mentioning the "no assets = no food = death = negatively infinite utility" argument, but the truth is that the utility of death is not minus infinity. -- Meni Rosenfeld (talk) 13:23, 30 January 2008 (UTC)
Meni Rosenfeld, Well, to make claims like, "the utility of death is not minus infinity." I think you need to have criteria that are or are not meet. I'm not sure where you are coming from, or what your criteria are, so it's unlikely I'd meet them.Pdbailey (talk) 04:28, 31 January 2008 (UTC)
I don't understand. Utility of -\infty would mean that people would never do anything that puts them in any risk of dying, however slight. This of course contradicts what people actually do. There are the rare but clear-cut cases of people sacrificing their lives for some cause, or people in a situation so dreadful that they'd rather die. And there are the mundane actions people do every day which puts them at risk. When you walk on the street, are you escorted by a bodyguard? No? That's because the utility you lose by paying him would outweigh the slim (yet positive!) chance you will be attacked and he would save your life. -- Meni Rosenfeld (talk) 12:09, 31 January 2008 (UTC)
So, I still don't see what criterion you are using or what you are getting at. Are you considering a rational expectation? Sacrificing ones life can be taken care of with an altruistic consumption function. Anyway, if negative infinity really concerns you, why not just use log(c+1) with c in the positive reals. Pdbailey (talk) 05:24, 1 February 2008 (UTC)
I have no idea what you mean by "my criterion". I am talking about a von Neumann - Morgenstern utility function, and assuming that it roughly reflects the actual actions of people. I am not "getting at" anything except refuting the nonsensical claim that the utility of death is minus infinity. I am unfamiliar with the term "altruistic consumption", but if you mean that the benefit of others is integrated into the utility function of an individual, this surely is finite and doesn't detract anything from infinity. Shifting the log function to avoid the branch point at 0 is completely irrelevant to my point (it may have been relevant to Bo's point, but is so obvious I didn't feel it was worth mentioning). -- Meni Rosenfeld (talk) 12:58, 1 February 2008 (UTC)
Well, in the case of sacrificing yourself for a cause, if the potential gain is saving a life, that would be positive infinity and would (or, at least could, the arithmetic is obviously undefined) cancel out the negative infinity of losing your own life. There are less extreme cases where it still fails, though - the gain from driving a car is definitely finite, but the risk to your life is definitely non-zero. --Tango (talk) 22:18, 1 February 2008 (UTC)

(backdent) Meni Rosenfeld, The log is a perfectly fine utility function and is often used in theoretical and empirical work, and as I mentioned, there is empirical reason to believe that it is good. As far as it being the utility function used inside the integration operator (i.e. U(L) = \int log(x) dF(x) where F is the CDF of the lottery L) of a von Neumann-Morgenstern expected utility function, it can't be because it the integrals wouldn't have the reals as a range. My point about a criterion is that you have to have a basis for rejecting it to reject it. A criterion is a method of judging something, without a method, it can not be judged. If you want a vNM expected utility function, the log is right out for the reason mentioned above. Pdbailey (talk) 02:25, 2 February 2008 (UTC)

[edit] Factorising programs

I usually spend several days a week away from home in hotels and need something to amuse myself in between drinking at the bar. I have been using my laptop to search for large primes. I use uBasic to find factors of (n-1) or (n+1) and then use these factors in a helper file fed in to PrimeForm/GW. Unfortunately, some of the numbers I am currently trying to get a solution for will not factorise in the time I have available (I need to stop the program when it is time to go and do some work). Does anyone know where I can get a factorising program that is faster than uBasic, or alternatively, is able to store its interim results and pick up the calculation later. SpinningSpark 15:31, 27 January 2008 (UTC)

This will depend on what you mean by a large prime. If you mean really large, you could join the Great Internet Mersenne Prime Search, but that'll take several months per prime on most computers. Algebraist 16:17, 27 January 2008 (UTC)
You can try Yves Gallot's Proth.exe, which you can stop and resume later, but it only handles numbers of certain forms. There's a list of programs for finding primes at The Prime Pages from the University of Tennessee at Martin. —Bkell (talk) 16:48, 27 January 2008 (UTC)
Thanks Algebraist, but I am already in GIMPS and have every intention of being the first to a 10,000,000 digit number (actually it will be 11,000,000 judging by the numbers primenet is currently allocating to me). I run it 24/7 on a computer at home. I was really looking for something that could be run in brief periods on my laptop. For various reasons, I need my laptop clear of everything else while I am working, so I would not want Prime95 permanently loaded. Besides, as it practically runs itself it is not exactly going to provide any entertainment value for me.
I was looking for something like uBasic that is very general purpose but a bit faster. The specific numbers I was attempting recently are of the form 3n-2. I had in mind attempting (wildly ambitious I know) the probable primes of this form listed at Henri & Renaud Lifchitz's website. I have been working my way through the known primes of this form listed in the OLEIS to verify my method but got stuck on the very largest ones because I could not give uBasic sufficient time to get some factors. SpinningSpark 17:36, 27 January 2008 (UTC)
And just to clarify, I don't need a program to test for primality, PrimeForm/GW does Lucas-Lehmer tests etc and is very general purpose. What it cannot do is find factors of (n-1) over a certain size but if I know the factors I can tell PrimeForm about them in a helper file. That is why I need a factorising program. SpinningSpark 17:54, 27 January 2008 (UTC)

[edit] Machine-learning databases in Euclidean form

Hi. I am currently working on Semi-supervised learning, and in my framework the data is assumed to be points in an Euclidean space. I am interested in finding out if there are datasets originating from real-world applications that have a certain feature I am exploring.
I'm having trouble finding a dataset to even test this. The most commonly used data is images, and one would think their pixel values could be used as Euclidean coordinates, but the truth is that Euclidean distance is usually pretty meaningless when it comes to images, and this is thus not suitable for my purporses.
So my questions are - does anyone know of a way to turn an image into a point in an Euclidean space such that the Euclidean distance is a meaningful indication of the difference between the images? And, does anyone know where I could find datasets (based on images or anything else) for which this processing has already been made and they are meaningfully Euclidean "out of the box"?
Thanks. -- Meni Rosenfeld (talk) 16:39, 27 January 2008 (UTC)

Have a look at Eigenface which is one method for getting meaningful euclidean cordinates out of images. The key idea hear is first register the images, then perform Principal components analysis of the set of images. This reduces the dimension of the problem and you can use the coeficients as you euclidean coordinates. --Salix alba (talk) 17:14, 27 January 2008 (UTC)
That's a good start, but it seems to me that the PCA already assumes that the original data is Euclidean. It might be able to "amplify" the "Euclideanity" of a dataset in which there is already some (such as faces in consistent conditions), but I suspect that it will fail miserably if applied to a set such as Caltech 101. I'll try this out, but I'll be happy to hear other suggestions. -- Meni Rosenfeld (talk) 17:42, 27 January 2008 (UTC)
Now wait just a minute here... The PCA is an orthogonal transformation, thus the distances between the PCA coordinates vector of a point (if all components are kept) is exactly equal to the Euclidean distance of the points themselves. So this doesn't help. -- Meni Rosenfeld (talk) 19:17, 27 January 2008 (UTC)
Ah I didn't pick up on the euclidean distance, this is likely to make some learning tasks harder, you probably can do some form of clustering but you are going to loose a lot of information. With PCA typically the first 10 or so components will be kept and the rests discarded. You could then use 10D vectors as input to the training phase. Is there a reason you need to use euclidean distance? --Salix alba (talk) 23:20, 27 January 2008 (UTC)
It's "lose". Anyway, clustering is the next step, after you already have a good representation of the data (and what I am actually working on is new methods for clustering). I'm simply asking about how to do the preliminary processing of taking an image and extract its "features" or whatever to end up with a good representation (or better yet, how to find a dataset which is already composed of those features). It's actually not the Euclidean distance that is so important (though this is what I am using in my theoretical framework, so is preferrable) - what I really need is just any metric that is meaningful for SSL, in the sense that datapoints belonging to different classes will never (or at least rarely) be close, and datapoints belonging to the same class will sometimes be close. -- Meni Rosenfeld (talk) 10:01, 28 January 2008 (UTC)
Some form of Image registration may be an important first step. The basic idea here is translate, rotate and scale the images so they are in a standard position.
Have you thought of creating your own synthetic dataset. For example take images of letter A-Z and add a bunch of noise. This would eliminate a lot of the image processing problems and at least get some well understood data to test your algorithms on. --Salix alba (talk) 10:41, 28 January 2008 (UTC)
I suspect that image registration will be relevant only to images too simple to be suitable for my purposes. I've tried working with the MNIST database of handwritten digits (which I think is "registered" out of the box), but couldn't find what I was looking for - either because digits are inherently too simple, or because SSD of pixel values just doesn't capture the correct notion of similarity. Generating data myself will not work for my current goal. -- Meni Rosenfeld (talk) 13:32, 28 January 2008 (UTC)
If you can do pairwise image registration, with "elastic" transformations, and you assign a sensible symmetric measure for the amount of deformation needed to get the best agreement, do something similar in colour space to make luminances and hues agree, determine the remaining amount of discrepancy, and take a weighted sum of these components, you get a kind of continuous edit distance, which should be a metric.  --Lambiam 19:47, 28 January 2008 (UTC)
Sounds... Complicated (yes, I do realize this was inevitable after eliminating all simple suggestions). Do you know of a reference which discusses this? -- Meni Rosenfeld (talk) 20:06, 28 January 2008 (UTC)
Essentially standard image processing fair, Procrustes analysis describes such a system where the data is landmark points, rather than actual images, and only euclidean transformations are allowed. The literature is vast, [2] may be a good way in. --Salix alba (talk) 20:36, 28 January 2008 (UTC)
Thanks, but I am hoping for something that will declare these 3 images of Basses ([3], [4], [5]) to be more or less close to each other, more so than to a typical non-bass image. Unless I am missing something, image registration of the kind you have linked to is not capable of that, since it is not the same scene that is depicted each time. -- Meni Rosenfeld (talk) 21:02, 28 January 2008 (UTC)

(Undent) you might stand a chance with 1 and 3, 2 is most probably beyond the current state of the art. A typical method might be to use an Active shape model to fit the a spline to the boundary of fishes, however you would need a much bigger training set to build the model.

Computer vision/image processing is a very hard problem, it took about ten years of research by thousands of people to come up with a optimal edge detection algorithm the Canny edge detector, and that left much to be desired. In the end simple edge detection was abandoned and trends about five years ago was to try and fit splines to the outlines of shapes. I worked with such a system which tried to recognised people walking, this had a very low success rate about 75% in uncluttered scenes and it could quite happily match parts of a tree. I would say that number-plate recognition systems are close to the state of the art and these are far from perfect. Having spend a brief time in the field I can but marvel at how well humans do.

I would highly recommend lowering your sights a lot. As CV is not your primary focus, it might be better to work with some other dataset, landmark data where humans have identified anatomical points on a shape is quite nice data. I see Caltech 101 does provide landmark data, and there are some good anatomical datasets about. (I might be able to get hold of a small dataset of landmarks on rat skulls). --Salix alba (talk) 23:17, 28 January 2008 (UTC)

I am quite flexible about where the data come from - the only requirements are that it will be "real", rich (so a database of only rat skulls would probably be ineffective) and that I know how to calculate a metric which is meaningful in the sense described above (which isn't currently the case for landmark data, at least as far as the "know" part is concerned). My guess is that trying to do the necessary processing myself would require too much involvement, so I'll change the focus of my question to - where do I find datasets in which such a metric is readily calculable? -- Meni Rosenfeld (talk) 13:33, 29 January 2008 (UTC)
What is your notion of "rich" here? The combination of real + rich suggests by itself you will need relatively advanced computer vision techniques for segmentation of the images to separate the foreground figure of interest from the background. Relatively well-researched areas are (handwritten) character recognition and face recognition, but I don't know the present state of the art. It all also depends on what you ultimately aim to achieve.  --Lambiam 20:59, 29 January 2008 (UTC)
I don't really have a more precise notion of "rich" in mind. I guess I am biting a little more than I can chew here. I posted the question in hopes there is something completely obvious I am missing that will solve my problems immediately. The long discussion this created indicates this is not the case, so I guess I will have to redirect my investigations elsewhere. Thank you both for your help. -- Meni Rosenfeld (talk) 21:13, 29 January 2008 (UTC)