Wikipedia:Reference desk/Archives/Mathematics/2006 October 29

From Wikipedia, the free encyclopedia

Mathematics desk
< October 28 << Sep | October | Nov >> October 30 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


Contents


[edit] October 29

[edit] Learning update schedules

This is sort of also a computer science question.

I want to create an RSS reader that "learns" approximately the times of day that an RSS feed is likely to be updated since the last check, and chooses to check the feed according to the probability that it has been updated since it was last checked.

I'm sort of at a loss as to how to actually implement this, especially because (I think) the probability is based on the time of day, weekday, AND the amount of time since the feed was last checked.

Does anyone have any ideas here?

You could use Unix timestamp (the unix seconds) to average out when one update happened and the last one before that also to create an estimate based on the prior updates.
I.e One update was at 09:00 one day and at 09:30 the next. Therefor the program can "predict" the next update is going to be at 9:15, but if you are trying to make a perfect AI that thinks ie that the guy posting is always late on wednesdays, thats a whole other can of wormsAvalean 01:53, 29 October 2006 (UTC)
This looks like a lot of work; is checking for updates costly? There are (at least) three aspects to this: (1) Selection of a model; (2) Estimating the model parameters; (3) Computing the probability. You need a model for the random variable of the update times. Depending on what you observe, this may be like: typically each next update is about 2 hours after the last, except in the wee hours it's more like 4 hours between updates; or: every day around 10am, noon, 15am and 22am. In the first case the times are much more irregular, of course. These possibilities require very different models. The second class is easier: use separate r.v.s for the update time at each part of the day, such as "the afternoon update", which maybe is on the average at 14:52am with a spread of 12 minutes. A normal distribution may be good for this, but probably you also want to add a probability that the whole update is skipped. The parameters (μ and σ) may be made to be different not only depending on the part of day, but also on the weekday. For the first class things are more complicated. You may consider a log normal distribution for the time between updates, but now we are in a situation where the parameters of the distribution change continually. One way of handling this might be to change instead the time scale, as if the clock ticks much slower in the wee hours than in busy periods. Estimating the model parameters can be done if you have enough data using maximum likelihood methods. The more parameters you have, the more data you will need. This can almost certainly not be done analytically; you will need to do numerical optimization, which is tricky. Now assuming you have all parameters, and the data on recent observations (last known update at 10:05, no new update yet at 12:22), you should be able to compute the conditional probability of an update having happened given these recent observations.  --LambiamTalk 06:44, 29 October 2006 (UTC)
Checking for updates is costly because one of my design parameters is that the reader must be able to check hundreds of feeds (it's more like an aggregator in this respect). This can take a very long time to do unless you try to skip feeds that don't update very often (most of them) some of the time. Currently I use a crude sort of reward-punish heuristic: if a feed isn't updated since the last check, be less likely to bother checking it on the next crawl and vice versa, but of course this suffers from painfully obvious calendar effects (wasting needless checks on Saturday and skipping valuable checks on Monday, etc.). The total number of checks might be about right but much of them will be at the wrong times. On the other hand, this has the power to "learn" very quickly, bloggers will keep updating late into the night if there is an event going on that interests them, such as an election to political bloggers in that jurisdiction. Foobody 17:55, 30 October 2006 (UTC)
Do you get the exact update time from the server? If so, you could estimate the probability of an update occurring on a wednesday by counting: check all the wednesdays in your known history of the site, and divide the number of wednesdays when an update occurred by the total number of wednesdays in the site history. You can do the same for updates occurring on each hour of the day, and similarly for updates less than 1 hour after last update, 1 - 2 hours after last update, etc.
Now, based on these estimates, and the known time of the last update, you can multiply the probability estimates into a heuristic unnormalized "probability" of an update occurring during each single hour since the last update. Also, you know that there was no update between the last update and the last unsuccessful check. Sum the hourly probability estimates from the hour of the last check up to the current hour, and divide by the sum from the hour of the last check to infinity. The result is your estimated likelihood of an update having occurred.
If you get division by zero in the last stage, you don't have enough data for the estimate, and need to use some simpler rule (or even the same rule without the hours-from-last-update term). I'm pretty sure it would be possible to develop a more rigorous approach along these lines, but the described approach should already be workable and relatively easy to implement.
Does this make any sense to you? Would you prefer a more rigorous method? 84.239.129.42 19:01, 30 October 2006 (UTC)
This sounds interesting. I've already been recording the frequencies of hour and day of week of whether each check was successful (local time; server times are not reliable). I just wasn't sure how to use the information, especially given the length of time since last check. I'm going to try this and see if I get stuck along the way. Foobody 22:23, 31 October 2006 (UTC)
If you don't know the exact update times, then all you know is that the update occurred between two checks. Instead of 1 update on an exactly known hour, you could assume 1/m updates on each hour between the two checks, where m is the number of hours between the checks. Otherwise the heuristic should work out pretty much the same.
Thinking about it further, I'm afraid that treating the factor for time after update the way I described above will cause too many zero probabilities. You ought to get better results by using a cumulative probability estimate: count all the (fractional) updates that occur less than k hours after the last update, and divide by the total number of updates, and do this for all k. This gives you a factor that starts out at 0 right after an update, and gradually increases to 1. Thus, if there's been no update for a long time, your estimated likelihood consists of only the weekly and daily factors.
Moreover, after this correction it is obvious that the hourly likelihood factors cannot be summed up to infinity. You could treat the hourly events as independent with probabilities p_1,p_2,\dots and compute the final likelihood of at least one update during the hours i,i+1,\dots,j as 1-(1-p_i)(1-p_{i+1}\cdots(1-p_j)). Or even easier, you could simply sum the likelihoods of the hours since the last check, and treat this not as a probability but as a "goodness" value which can be compared between different sites.84.239.129.42 18:25, 1 November 2006 (UTC)

[edit] Fractions whose value is one

what is the name for a number of fractions whos value is one? —The preceding unsigned comment was added by 124.206.195.150 (talk • contribs) 17:59, October 29, 2006 (UTC).

Are you talking about these?
\frac{1}{1} = \frac{2}{2} = \frac{3}{3} = \frac{4}{4}
—The preceding unsigned comment was added by 202.168.50.40 (talkcontribs) 21:44, October 29, 2006 (UTC).
Last time I checked, they were called equivalent fractions... Titoxd(?!?) 16:22, 30 October 2006 (UTC)
Equivalent fractions are any two fractions that are equal in value. --WikiSlasher 06:34, 31 October 2006 (UTC)
Fractions equivalent to 1, perhaps? --Kjoonlee 14:30, 31 October 2006 (UTC)

[edit] Cartesian product of unit circle with itself

It's indeed 'just' a torus as a topologist dismissed the following question with(I'm interested in geometry as well as topology, and right now, in differential geometry): Since not even the tiniest region of the surface lives in a 3 dimensional space, what curvature can be assigned to it?-Is it a tensor thing? Thanks,Rich 22:21, 29 October 2006 (UTC)

It embeds into 4-dimensional space (a two-dimensional subset of the three-dimensional sphere in R4) and has zero curvature everywhere. —David Eppstein 00:46, 30 October 2006 (UTC)
(after edit conflict) If by "circle" you mean the topological concept of a circle "up to homeomorphism", also known as the 1-sphere S1, and by "product" you mean the topological product, the concept of curvature simply does not apply. You get indeed the topological torus, which by definition is the topological product of two topological circles: S1 × S1. If your circles are embedded in the Euclidean plane R2, and you take the square into four-dimensional Euclidean geometry R4, it is possible to assign a meaning to the curvature (which then for reasons of symmetry must be the same everywhere).  --LambiamTalk 01:00, 30 October 2006 (UTC)
Moreover, if your circle had radius r, and hence arc curvature r-1 at each point, then the curvature tensor naturally associated with the topological product would be \begin{pmatrix}r^{-1}&0\\ 0&r^{-1}\end{pmatrix}. (More generally, one would take the direct sum of curvature tensors of the multiplicands.) So the mean curvature of the product is r-1 and the Gaussian curvature is r-2. —Blotwell 01:12, 30 October 2006 (UTC)
I did mean of course the product of two planar "unit" circles and it lives in 4 dimensional Euclidean space as for example (cost,sint,cosu,sinu) 0<=t,u<2pi.I agree that the curvature is of course everywhere the same. I am a bit nonplussed to hear the curvature is zero everywhere, David Eppstein. Are you certain? Regards,Rich 01:21, 30 October 2006 (UTC)
        • Hold it--my apologies Lambiam, sorry for my curtness, I wasn't being as clear as I thought I was due to the vivid picture in my head I have of the torus:(cost,sint,cosu,sinu)--the distinguished boundary of the bidisc.Rich 23:21, 30 October 2006 (UTC)
Gaussian curvature is intrinsic, locally a circle is (in an intrinsic view) the same as a line, and so locally the product of two circles is the product of two lines, a plane, which has no curvature. For other kinds of curvature I imagine it would matter how it was embedded into R4, though. —David Eppstein 01:36, 30 October 2006 (UTC)
PS if you want some more verifiability than my word about the curvature being zero, try Googling for "flat torus". —David Eppstein 03:29, 30 October 2006 (UTC)
Cf. also Gauss-Bonnet theorem: if the Gaussian curvature is everywhere the same, it must be zero.--gwaihir 11:35, 30 October 2006 (UTC)
Just to clarify: if the Gaussian curvature is everywhere the same on the torus, it must be zero. Tesseran 13:37, 30 October 2006 (UTC)

I was figuring the types of curvature I had learned about for surfaces and curves in 3D space would be at best "incomplete" and perhaps somehow incorrect for a surface that doesn't fit into 3D space at all. Rich 02:32, 30 October 2006 (UTC)

If you look at the torus as a quotient of R2 (by the integer lattice, or by the group of integer translations), it inherits the Euclidean metric from R2. In this view, it is clear that the metric is flat. This is true of the cylinder too, for example, because it also can be seen as a quotient of the plane. (It is also S1 × R, in case you worry about the flatness of the cylinder; it can also be seen as a subsurface of R2 where one of the principal curvatures is flat.)
Your question hits upon one of the fundamental issues in the development of mathematics: the distinction between intrinsic and extrinsic properties of objects. In group theory, groups were once thought of primarily as subgroups of permutation groups or matrix groups, and it was the step towards abstraction in defining groups axiomatically that allowed a lot of the advances in modern algebra. Furthermore, now we can study all the different ways that one abstract group can be realized as a subgroup of a matrix group—this is representation theory. In topology and geometry, surfaces and curves were thought of for a long time as subsets of some Euclidean space; moving to an abstract definition of a manifold allows us to distinguish between intrinsic properties of a manifold (like dimension, lengths, or angles) and extrinsic properties that depend upon the way the manifold is embedded into Euclidean space. One of my favorite results is that Gaussian curvature is an intrinsic property; though the principal curvatures may vary based on the embedding of a manifold, their product depends only on the metric. So you can, for example, calculate the curvature of the metric in any model that you like. Tesseran 13:37, 30 October 2006 (UTC)
Is it meaningful to talk of the Gaussian curvature of a surface in 4D? Our article only mentions the 3D case and does not talk of extensions. So I'm a little confused as to which notion of curvature we are refering to. --Salix alba (talk) 16:07, 30 October 2006 (UTC)
"intrinsic". It's a 2D thing and does not depend on ambient space.--gwaihir 16:10, 30 October 2006 (UTC)
Yes, 'sectional curvature' is the more general notion in this case, and for two-dimensional surfaces they coincide. \Mike(z) 16:25, 30 October 2006 (UTC)
Curvature raises questions because it has so many different meanings. The curvature of a curve at a point has to do with fitting a circle to the curve so that that it "kisses" (osculates) at the point. For this definition, a circle of radius r has curvature κ = 1r at every point. Smooth surfaces have a family of curves passing through a given point, and each curve can have a different osculating circle radius, which depends on how the surface sits in 3D Euclidean space; how then to assign a single number as curvature? Still, these radii must have a minimum and a maximum, giving us the two principal curvatures. We can average these two numbers to get "mean curvature", and we can multiply them to get "Gaussian curvature". Differential geometry is also able to study surfaces intrinsically, without reference to a containing space. In this more flexible setting we can still define Gaussian curvature, but not mean curvature.
Differential geometry does not limit itself to curves and surfaces, but considers differentiable manifolds of higher dimensions as well. Using intrinsic methods, we can define richer kinds of curvature, though a single number (per point) is hopelessly inadequate. This is where we encounter the curvature tensor field, and its reduced versions such as the Ricci curvature.
The "Cartesian product of a unit circle with itself" has a clear meaning for topology, but not for differential geometry. Nor is it clear what kind of curvature you seek. One geometry version is as follows: Take a unit circle in the xz plane centered at (1,0,0); sweep it rotationally around the z axis. The result is a torus with a pinch point at the origin, and can fairly be said to satisfy your request. --KSmrqT 20:48, 30 October 2006 (UTC)
  • I disagree with you. With the context that I gave, I still think it was unambiguous and clear. In any case, in a later edit, see above, I parametrized it as (cost, sint,cosu,sinu).Regards,Rich 22:00, 30 October 2006 (UTC)
There is a clear interpretation as an embedded submanifold of the product of the ambient spaces as well as an interpretation as a product of Riemannian manifolds, and these two notions agree. I don't see any ambiguity here.--gwaihir 22:32, 30 October 2006 (UTC)