Talk:Sigmoid function
From Wikipedia, the free encyclopedia
Call me crazy, but since when is the Hyperbolic_cosine considered "S shaped"? If this is a typo, I'm not sure what other function it was supposed to be. --65.147.0.105 15:25, 28 May 2004 (UTC)
- Right you are, I don't know what I was thinking. Fixed. (The error function is a proper sigmoid, right?)
Jorge Stolfi 10:24, 31 May 2004 (UTC)
Contents |
[edit] Recent changes to this page
Isn't it better to redirect this page to the logistic function page? Or restore this page to its former glory? The current page is kinda pathetic.
[edit] local extrema?
Do you really mean "local minimum" and "local maximum"? The example function given clearly doesn't have any local minima or maxima (but it does have a global minimum of 0 and a global maximum of 1) -- Somebody
Perhaps what is meant is that the second derivative (curvature) has a local minimum and maximum? BTW, I do not agree that the function has global extrema, because as I learned them and as the article on them states, they are points in the domain of the function and are always also local extrema. This function has none. The image of the function has supremum of 1 and infimum of 0, though (ie. the asymptotes of this function are y=0 and y=1). 82.103.198.180 10:03, 23 July 2006 (UTC)
Maybe it's for complex argument values? One is led to think of reals only because the plot is 2d, but maybe the text doesn't assume that. Coffee2theorems 18:22, 23 July 2006 (UTC)
[edit] Examples
I'd like to add a gallery of sigmoid-like curves to this article. The hemoglobin example is a nice one. Any others? --HappyCamper 17:22, 30 March 2007 (UTC)
[edit] Sign
For the double sigmoid function, do you mean sin?
I also think the double sigmoid function is wrong. What about this one?
or this one:
[edit] Image
I see that someone changed the image size recently in order to avoid resolution problems. Maybe the image should be replaced after all with the almost identical vector image ?--Hagman-de 15:53, 16 June 2007 (UTC)
[edit] a slightly more useful definition?
I've used the sigmoid function on and off, for a long time (about 8 years), and what I use is of course similar to what is presented here, but I would suggest adding two elements into the definition -- a "gain" or "sharpness" factor "k" or "g" -- and a "threshold" or "slider" term that allows the function to be "slid" back and forth across the X-axis:
-
- Y(t) = 1/(1 + e(k*(X - thr))
The neat thing about this more expanded definition is the following:
- The "gain" at X = "thr", is the derivative of course, but it is 1/4 the value of k (as I remember)
- The curve can be "flipped around" by changing the sign of k; thus the sigmoid can be made to act like a Boolean NOT if "thr" is 0.5 and k is positive,
-
- You see the failure of "the law of excluded middle" (LoEM) -- no matter how huge the k, the value of the function at X = "thr" = 0.5. This violates the LoEM.
- You can build e.g. an OR gate by adding X1 and X2, subtracting "thr" = 0.5 and then squashing the sum with the sigmoid:
-
- OR(X1, X2) = 1/(1 + e(-12*(X1 + X2 - 0.5)))
- Given that you can build an OR and a NOT you now can approximate any Boolean function.
- Similarly, in a plane, the value of Z(t) will be 0.5 all along a line (it looks like a folded plane)
- From Y = mX + b,
- Y/b + (m/b)*X = 0
- Z(t) = sig(Y/b + (m/b)*X - thr)
- Two of the above Z(t) but with reversed signs and slightly offset with different thresholds added together make a line, like a mountain range on a map, or a canyon. However, If you put three of these plateaus i.e. "folded sheets" (for a total of just 3 sigmoids) on the X-Y plane and get the signs of their k's right, add them together and pass them through a "second-layer" sigmoid you have a "triangle" that can be shrunk with higher values of k's make a single Matterhorn stick up anywhere on the plane (or make a sink-hole).
- Given that you can make Matterhorns to your heart's content anywhere on the plane, you can add them together and approximate any curve by "bleeding" one into another. This summation proves that sigmoids can be used to approximate any arbitrary curve, much like a 2-D Fourier transform.
Some of this stuff can be found in a book titled:
- Tom M. Mitchell, Machine Learning, WCB-McGraw-Hill, 1997, ISBN 0-07-042807-7
In particular see "Chapter 4: Artificial Neural Networks" where the Boolean abilities of "perceptrons" are defined as well. I happened onto the tricky business of adding three folded planes together to make a "triangle" (and passing them through a second-layer sigmoid) because a neural net showed me this (!). I've not seen it documented anywhere, but I did see the results of it in a journal once. I'm sure someone who knows the literature better could cite the source. Proofs similar to the above are mentioned in Mitchell. This stuff is easy to do in Excel. wvbaileyWvbailey 18:39, 17 June 2007 (UTC)
[edit] Another sigmoid?
I wonder if it would be useful to list the following function among the sigmoids:
I have seen it used as a "hack" when a fast S-shaped function was needed, avoiding the (computer) evaluation of exp(x). Its derivative is flat at 0 and 1, and it is symmetrical with respect to the midpoint (meaning, f(1 − x) = 1 − f(x)). For many purposes it works fine, as long as you don't run outside the range [0,1]. —Preceding unsigned comment added by Pasmao (talk • contribs) 12:44, 27 October 2007 (UTC)
- It would be interesting to add something like this. I fiddled with this notion with respect to what would be required for mother nature to build a squasher for making neuralogical ANDs and ORs, and was able to get to some pretty nice approximations -- as long as you stay within the interval. Somewhere I actually worked out the math for this ... a problem arises because, to be useful, the AND etc needs some "gain" in the middle (i.e. a slope > 1) but the more gain you put in the more difficult the design becomes. For an OR you need a range of -0.25 to +2.25 (i.e. if inputs are "a" and "b" that vary from 0 to 1, add them and squash their sum back to approximately 0 or 1). The first hack starts out with the odd function y = 1*(x-0.5) + 0.5 (just a straight line shifted to the right: yielding (0,0), (1,1) ). This clearly won't work. The trick then is to feedback a certain amount of x2 to give you some "gain", etc, etc. As I remember this works best if it goes through two iterations. I'm working from memory here... bill Wvbailey (talk) 17:18, 13 January 2008 (UTC)
[edit] Derivative Clarification
I'm pretty sure that not all sigmoid functions have the derivative:
Perhaps a minor clarification would be in order. —Preceding unsigned comment added by 128.111.110.55 (talk) 02:12, 11 December 2007 (UTC)
This formula is only for 1 / (1 − exp( − x)) tanh for example has a derivative of 1-tanh^2. This is also confusing as f(...) can be mistaken for applying function f to (...) where in this case it means the result of multiplying function f with 1-f. dP/df = (P)*(1-P) would be clearer.
Jfmiller28 (talk) 23:09, 2 January 2008 (UTC)
- 1 / (1 − exp( − x)) is not even the special case of the logistic function mentioned in the text. How the reader could know what function the formula applies to. This part of the text is very confusing.130.234.198.85 (talk) 14:36, 7 January 2008 (UTC)
[edit] Are some of the sections talking about the logistic?
Please see my questions in comments. New Image Uploader 929 (talk) 00:50, 30 May 2008 (UTC)
- My text by Mitchell, which I listed on the article page (the only reference, BTW), equates the two:
-
-
- "σ(y) = 1/(1+e-y)
- "σ is often called the signmod function or, alternately, the logistic function. Note that its output ranges between 0 and 1 .... Because it it maps a very large input domain to a small range of outputs, it is often referred to as the squashing function of the unit [cf Figure 4.6 The sigmoid threshold unit; in this drawing, σ(y) = 1/(1+e-net), where net = Σ0i(wi*xi) and wi is the ith weight for the ith input xi and x0 is a constant -- x0 is important(!)]. The sigmoid function has the useful property that its derivative is easily expressed in terms of its output..." (Mitchell 1997:96-97)
-
- My guess is writers who distinguish between the two are (needlessly) splitting the hare (hair) and using two different names for the same function depending on where it is used. "Logistic" would seem to come from "logic" i.e. having 1 and 0 outcomes only; "Sigmoid" because of its shape as in "sigmoidoscopy". Anyway, as this is wikipedia and we need sources to back up our claims, mine says they are the same thing. Bill Wvbailey (talk) 15:07, 30 May 2008 (UTC)