Talk:Binomial regression

From Wikipedia, the free encyclopedia

Before you discuss here, please be sure that you have read the wikipedia merging page. Specifically, the three paragraphs under the first title, Merging. The basic problem that I see is that binomial regression has many people interested in it, but finding all these models or even more information about binomial regression once one has founc one article is difficult. As an example, google 'probit' or 'logit' and you get to a page that doesne't help much. I think that all the information would be better edited and tighter if it were in one article, more like an encyclopedia. Pdbailey 14:45, 27 April 2006 (UTC)

I agree with some of the proposals, but not all of them. I basically have two objections:
  1. Why merge into binomial regression? This doesn't strike me as the most obvious place. Why not merge into generalized linear model instead?
  2. A number of articles are general enough to stand on their own. These include logit, probit, and logistic regression. Check "what links here" to see that "logit" and "probit" are referenced outside the context of generalized linear models. Also, while logistic regression could be seen as a special case of binomial regression, everything here is a special case of GLMs, but nobody is proposing merging everything into the GLM article. Certain topics have a life of their own and are large enough to warrant stand-alone articles.
--MarkSweep (call me collect) 18:17, 27 April 2006 (UTC)
MarkSweep, thanks for pointing out the usefulness of the "what links here" pages. There appears to be a number of uses for these functions, Based on looking at the what links to logit and probit, they should stay. But, I would argue that they both need to be disambiguation pages because people appear to linke to them with abandon. In general, these two appear to have three uses:
  • To model something that has a maximum value that it approaches first quickly, then linearly, and finally asymptotically.
  • To model a binomial or multinomial process.
  • To model the same process, but to use the model and data to allow for the estimation of of parameters in the model.
I would suggest the present "logit" and "probit" be moved to the same with function added on the end. But otherwise, it seems that there is a lot of confusion about the relationship between "logit" (and to a lesser extent, the "probit") and the use of these functions as link functions for regression or for models that do not have fit parameters. It is interesting to note that Naive_Bayes_classifier, Rasch_model, and Mode_choice are similar articles on very disperate topics. At any rate, I'll leave logit and probit linked here for discussion. Pdbailey 19:12, 27 April 2006 (UTC)

I'd suggest that the main article should be called "Discrete regression", which is sufficiently general to encompass all the cases suggested for merger (note that logit doesn't give rise to a binomial distribution but to a logistic). The particular cases could either be treated in full in subsections or discussed briefly with a link to separate articles depending on how thorough the treatment is. JQ 07:10, 28 April 2006 (UTC)

That strikes me as a bit too narrow. If we want to generalize and merge, it seems to me that the right context would be the article on generalized linear models, which don't have to be discrete. It's in the context of GLMs that link functions can be discussed properly (where the logit is indeed the canonical link function for binomial regression models and its relationship to the logistic CDF is an afterthought). I agree with your general point on subsections: articles like Ordered Probit are very short and don't make a whole lot of sense as stand-alone entries. Even if that article was much more detailed, it's probably better to have it as a subsection of a larger section (or article) on probit models, to establish context and make it easier to compare and contrast. If a section becomes too long, it will be easy to spin it off into a separate article, but let's start off with a single coherent article instead of the current fragmentation. --MarkSweep (call me collect) 08:52, 29 April 2006 (UTC)
Given the current size of the articles, it would probably be reasonable to merge most of these into the article on the generalized linear model, with a section on ordinary linear regression and link to linear regression, and then sections on discrete regression and count models. But in the long run, it would be good to have full articles down to the level of Ordered Probit.

Yes, so the general idea was to merge the other pages in with this one, based on the basic structure I gave this one in just a few hours. I think GLM will be to laden with we include all the possible GLM in it. That said, if others want it in there for now, I'm happy to move all this in there and then move it if this gets to be too much. Pdbailey 05:13, 2 May 2006 (UTC)

I would be against merging logistic regression, either here or with GLM. The ol' Google test yields: 5.34 million hits for phrase "logistic regression", 75,700 hits for "binomial regression", and 170 thousand hits for "generalized linear model". So, people out in the world seem to care about logistic regression as its own entity, and will look for it as such in Wikipedia. hike395 13:24, 10 May 2006 (UTC)
So it looks like your logic is that there are more pages that google finds with the search "logistic regression" than the catagories that it falls into, so there should be a wikipedia article about it in that area. I would argue first that there are more google hits for "making gold" than there are for "alchamy", but that doesn't mean that making gold deservies its own article seperate form alchamy, and second that Wikipedia is not a dictionary, and why would you want it to be? There is a lot more to an encyclopedia in that it can bring topics together and give lots of context. So, redirecting to an overarching article makes a lot of sense. Pdbailey 20:00, 14 May 2006 (UTC)
Um, "making gold" -> 91,600 hits; "alchemy" has 41M hits. Alchemy clearly wins, and it is the name of the article.
WP:WINAD doesn't talk about titles of articles, it talks about content --- articles shouldn't be simply definitions, but full articles.
The relevant Wikipedia policy is Wikipedia:Naming conventions (common names), which directs articles to be titled with the most common name. Hence, my use of the Google test.
I think that my logic still stands. It doesn't exclude having an article about generalized linear model, but I think that removing/merging a logistic regression will disappoint a lot of people. -- hike395 03:52, 17 May 2006 (UTC)


Looking at the various components proposed for merger, I think there is already too much for a single article. What we need is for the more general articles (say on GLM and discrete choice) to have sections with links to more specific articles, giving rise to a natural hierarchy.JQ 05:31, 15 May 2006 (UTC)

Agree: there's nothing that prevents us from having both a GLM article and a logistic regression article. I just oppose removing the logistic regression article.


As a biologist who has used Probit and Logit analysis to determine dose response curves, amongst other things, in the past but who does not understand all the ins and outs of the maths, I think it would be far more useful if these articles stood alone rather than being rolled up into a larger one on the binomial theorem. Either way, I would suggest that someone checks out Probit analysis (1971) D.J.Finney, Cambridge University Press before making any changes. Maccheek 16:21, 5 June 2006 (UTC)

Maccheek, I guess exactly your situation is the one that worries me the most, that is people who thinking of logit and probit as very different. It's a bit like thinking of (a) linear regression after taking the log of the response and (b) linear regression, as completely different things. True, you need some math to link the two, but they are so highly related that having a seperate page for each is not straightforward. I think I'm getting many negatives because I have not develped this page. I'll work on that now. Pdbailey 03:28, 6 June 2006 (UTC)
pdBailey. My concern would be that some poor biology student looking for information on probit analysis for use in determining dose reponse curves would get completely lost in the maths. Probably what we need is something written into the article on dose response relationship on how to use probits. I support getting the math sensible but also making it accessible to those whose maths is not so hot but may have to use it and want a basic understanding. Then again maybe I am just the oddity as a half numerate biologist. Most biologists, at least the agricultural ones I come across, would give up at the first equation at least I can understand most of it if I try (Many agricultural scientists employed in the agrochemical industry don't even seem to be able to calculate a mean properly!). On that I'll probably bow out of this thread Maccheek 20:18, 6 June 2006 (UTC)
  • The best way to structure a complex topic like this IMHO is to have a main article that gives an structured overview (= 1 introductory paragraph, not only links) about the different methods and models plus separate articles on the individual sub-topic.
    1. Don't forget you are losing readers coming from Google as well as all of the Interwiki links if you simply merge all articles to one lemma.
    2. Merging too much into one big article is definitely no more digestible for students or beginners.
    3. This is not only a structure problem, it is a content problem as well. Most of the articles go to specific from the very first sentence and are ill-structured. Start with a short introduction that everybody can understand.
  • So please go step by step. For the moment it should be best if you merge Ordered Logit and Multinomial logit into logit as well as Ordered Probit and maybe Probit model into probit. Large scale solutions rarely work in Wikipedia. PanchoS 15:17, 13 July 2006 (UTC)
I have clearly not done this an nobody to date agrees. I will remove the proposed merge link. Pdbailey 13:26, 17 July 2006 (UTC)