# What is the overall mean and variance for partitioned data?

Suppose that:

• A normally distributed random variable $Latex formula$ is being sampled.
• There are k partitions, each with $Latex formula$ samples (where $Latex formula$ ).
• For each partition, the mean $Latex formula$ and variance $Latex formula$ are known, but the original observations $Latex formula$ are not available.
• The overall mean $Latex formula$ and variance $Latex formula$ are desired.

Computation of $Latex formula$ and $Latex formula$ are straightforward: $Latex formula$ $Latex formula$   

However, because the overall mean $Latex formula$ is not available within the partition where $Latex formula$ is sampled, the formula for the variance: $Latex formula$   

must be rewritten as $Latex formula$   

This formula may be derived as follows.

First introduce $Latex formula$ into the formula for $Latex formula$ $Latex formula$   

Then, simplify to remove $Latex formula$ from the partition summation $Latex formula$ :

Apply $Latex formula$ $Latex formula$   

Replace $Latex formula$ with $Latex formula$ $Latex formula$   

Replace $Latex formula$ with $Latex formula$ $Latex formula$   

Distribute $Latex formula$ $Latex formula$   

Apply $Latex formula$ ;
express summation of sum as sum of summations $Latex formula$   

Because neither $Latex formula$ nor $Latex formula$ depends on j $Latex formula$   

Apply $Latex formula$;
distribute $Latex formula$;
express summation of sum as sum of summations;
simplify $Latex formula$ $Latex formula$ $Latex formula$   

Apply $Latex formula$;
simplify $Latex formula$ $Latex formula$   

Express summation of sum as sum of summations;
simplify $Latex formula$   

Because $Latex formula$ does not depend on j, factor out $Latex formula$ $Latex formula$   

Apply $Latex formula$ and $Latex formula$;
simplify $Latex formula$ $Latex formula$   

#### Reference

This post was largely inspired by
http://stats.stackexchange.com/questions/10441/how-to-calculate-the-variance-of-a-partition-of-variables/10445#10445
which tantalizingly ended with the (under-)statement:

These formulas are easy to derive by writing the desired variance as the scaled sum of $Latex formula$, then introducing $Latex formula$: $Latex formula$, using the square of difference formula, and simplifying