Give Me the Stats, Jack!

Sometimes you need to take a specific set of data, crunch some numbers, and base selections on a normal distribution. Well there are plenty of equations and mathmatical processes to determine if and where `Object A` may fall. Is it close to the mean of a given dataset? Or is it so far out there it really does not fit the curve?

Normalized Gaussian curves with expected value μ and variance σ^2. The corresponding parameters are a = 1/(σ√(2π)), b = μ, c = σ  Source: http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

The above image shows what are called Gaussian curves (link goes to image source) and these Gaussian curves are used to describe normal distributions. For the shown graph, the corresponding parameters are a = 1/(σ√(2π)), b = μ, c = σ.

A “normal distribution” is defined as:

In probability theory and statistics, the normal distribution, or Gaussian distribution, is an absolutely continuous probability distribution whose cumulants of all orders above two are zero. The graph of the associated probability density function is “bell”-shaped, with peak at the mean, and is known as the Gaussian function or bell curve

Source: http://en.wikipedia.org/wiki/Normal_distribution

So how is this useful? Well as you can see, our defined mean is the tip of each curve. Here is where most of our data results lie. The closer you are to μ, the closer you are to the mean. If your data point lies further away on the line, you are farther away from the mean and average of the data group. So let’s say you design a function that creates a bell curve from a Gaussian function detailing pay grades for all “Widget” employees. Most of the employees will be in an average group, and you will have some lower paid workers (such as interns) and higher paid workers (such as company heads). You can then take a single data point, find its location on the graph, and determine if that worker is paid under average, average, or above average compared to the rest of his or her peers.

The applications of the Gaussian function and distribution curves are limitless. They are proven and widely used for statistical and analytical purposes in all fields of work.

One Response to “Give Me the Stats, Jack!”

  1. Zane Thorn says:

    Program it! :)

Leave a Reply