Talk:Histogram

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: B Class High Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Contents

[edit] Missing: calculating bin sizes

Should this article contain something about calculating bin sizes?

If by bin size you mean width, then absolutely something needs to be here. There's no explanation of why the widths differ and in research elsewhere the widths are always equal, making the wikipedia definition inconsistent with what's available elsewhere. - Sam N.

Bin widths do not have to be equal. In many instances they are equal simply because it is easier to handle, and any benefits from variable bin sizes are out-weighed by the simplicity of having equal bin sizes. An example of variable bin sizes is the histogram method of white balancing digital images, where the bin widths of the histograms of brightness are adjusted so that each bin has a roughly equal number of elements. Mattopia 19:46, 29 March 2006 (UTC)

[edit] Excessive whitespace

This article needs to be reformatted to eliminate the excessive whitespace. If the tables could be placed beside the graphs then that would tighten things up considerably.--Hooperbloob 05:16, 12 August 2005 (UTC)

I reworked the layout quite a bit, mostly merging the tables and reducing the graphs to thumbnails. It's not quite pretty yet, but methinks it should be a little easier to work with. --Pathoschild 01:29, September 21, 2005 (EST)

Added logical headings and reorganised appropriately. --Pathoschild 01:16, September 22, 2005 (EST)

[edit] Missing: intepretation of histogram shapes

Should the article contain something about the interpretation of various histogram shapes; normally distributed around a mean, etc.

[edit] Missing: etymology of the word histogram

The article should contain something about the etymology of the word "histogram".

I've added a little something about the etymologyAastrup 09:46, 12 February 2007 (UTC).

[edit] Missing: Basic example of a histogram

This article should contain a simple histogram to illustrate what it is. Currently it is a tad ambiguous with the 2000 census versions.

[edit] Confusion

"Actually, this document shows bar graphs, but they are not histograms since the bars are not adjacent."

more concrete and correct examples of histograms are required

[edit] Histogram

Hi, what is the name given to the highest point in a histogram, and what is the name of the bars that are not the highest?

++ Density plots == This page needs links to density estimation, which are often superior to histograms, see, e.g. Simonoff Smoothing methods in statistics.

Density plots are not superior to histograms - they are different. Histograms are generally more appropriate for exploratory data analysis as they make fewer assumptions about the underlying distribution, and they let you see the raw data more directly.

--Hi, how do you know when to use a histogram and a bar graph? What is the difference between the bar graph and the histogram in simple terms??

Most basically, a proper histogram has no spaces between the bars because they are meant to represent numbers on a continuous scale (infinitely precise decimals theoretically, any and all of which have potential to be observed in your data), while bar graphs have spaces between the bars and are meant for comparing amounts for different categories that don't even have to be numerical (such as a survey about favourite colours). So, bar graphs for categorical variables and histograms for continuous numerical variables. For discrete numerical variables (eg survey of number of TV sets owned), the bar graph works fine, though nothing but your conscience can stop you from doing a cheat and pretending the numbers are part of a continuous scale, as if someone could own say 2.75 TV sets.Nicknicknickandnick 05:23, 3 May 2007 (UTC)

[edit] Excel

As I have understood form this page there is a difference between a bar chart and a histogram, micorsoft excel 2003 calles a bar chart a histogram and a horizontal bar chart a bar chart... Mabe a warning that Excel get the names messed up might be in order? Thanks --Squidonius 22:58, 28 January 2007 (UTC)

Yes unfortunately Excel's histogram generator (in the Data Analysis ToolPak) doesn't really generate a proper histogram, because it uses the existing available bar chart generator to attempt the task. Nicknicknickandnick 05:36, 3 May 2007 (UTC)

[edit] Bin and Count independence?

A colleague had a chart displaying the number of events occuring during each hour of the day, displaying 24 bins, one for each hour of the day. He called it a Histogram and referenced the definition in this entry. What bothers me is that the definition of the count does not seem sufficiently independent of the definition of the bins. Is there some aspect of a histogram that would require the definition of the count to be independent of the definition of the bins?

That is somewhat borderline between histogram and bar chart of counts (an example is this). Generally, the placement and count of histogram bins is arbitrarily set by the distribution of the value, which is usually unknown (for example, measuring the heights of trees in a park). This is why there are multiple methods of determining the number of bins, and the placement of breaks between bins. But for your example, the frequency (hour) and range (0-23) of the value is known—so it is convenient to use the breaks of a day. In the end, the chart you described is interpreted the same as a histogram, so it would seem like an appropriate definition. +mwtoews 14:21, 3 May 2007 (UTC)

[edit] Diagrams are inconsistent with the data tables

There are data tables on two subjects. There is only one "histogram" and that is on drive times. It appears as twice. There is no "histogram" for the student data. 66.74.146.101 20:38, 17 October 2007 (UTC)LSquared Orange CA USA

It looks like the original source data for the "by proportion" example was replaced with an data for absolute numbers in version 105847254. The edit was anonymous and no reason seems to have been given. Additionally the image and the text were never changed to match the new data. I've reinstated the original data, so the article should make more sense now. I'm sort of surprised no one else has caught it before. Undisputedloser (talk) 22:57, 26 December 2007 (UTC)

[edit] Bin count & convolution?

Does anyone ever do away with the bin-count question and just convolve the data set with, e.g., a Gaussian kernel? That leaves the question of what the standard deviation of the kernel should be, but it does away with the arbitrariness of bin size. In the context of displaying a sample (e.g., scores on a test), this seems more natural since one's score on a test includes noise, so a score of 92 really means the person's understanding is between, say, an 89 level and a 95 level. —Ben FrantzDale 17:33, 3 November 2007 (UTC)

It looks like yes. See kernel density estimation. 155.212.242.34 20:24, 6 November 2007 (UTC)

[edit] Number of bins and width

This statement "The number of bins k can be calculated directly, or from a suggested bin width h:" seems misleading given the formula that follows on the page. In the formula displayed for the number of bins k the denominator as shown is n, shouldn't it be h (which represents the calculated bin width)? Doesn't n represent the number of samplesor observations? Hzlnt7 (talk) 23:31, 18 December 2007 (UTC)