# What is the overall mean and variance for partitioned data?

Suppose that:

• A normally distributed random variable $Latex formula$ is being sampled.
• There are k partitions, each with $Latex formula$ samples (where $Latex formula$ ).
• For each partition, the mean $Latex formula$ and variance $Latex formula$ are known, but the original observations $Latex formula$ are not available.
• The overall mean $Latex formula$ and variance $Latex formula$ are desired.

Computation of $Latex formula$ and $Latex formula$ are straightforward:

$Latex formula$   [1]

$Latex formula$   [2]

However, because the overall mean $Latex formula$ is not available within the partition where $Latex formula$ is sampled, the formula for the variance:

$Latex formula$   [3]

must be rewritten as

$Latex formula$   [4]

This formula may be derived as follows.

First introduce $Latex formula$ into the formula for $Latex formula$

$Latex formula$   [5]

Then, simplify to remove $Latex formula$ from the partition summation $Latex formula$ :

Apply $Latex formula$

$Latex formula$   [6]

Replace $Latex formula$ with $Latex formula$

$Latex formula$   [7]

Replace $Latex formula$ with $Latex formula$

$Latex formula$   [8]

Distribute $Latex formula$

$Latex formula$   [9]

Apply $Latex formula$ ;
express summation of sum as sum of summations

$Latex formula$   [10]

Because neither $Latex formula$ nor $Latex formula$ depends on j

$Latex formula$   [11]

Apply $Latex formula$;
distribute $Latex formula$;
express summation of sum as sum of summations;
simplify

$Latex formula$   [12]

$Latex formula$   [13]

$Latex formula$   [14]

Apply $Latex formula$;
simplify

$Latex formula$   [15]

$Latex formula$   [16]

Express summation of sum as sum of summations;
simplify

$Latex formula$   [17]

Because $Latex formula$ does not depend on j, factor out $Latex formula$

$Latex formula$   [18]

Apply $Latex formula$ and $Latex formula$;
simplify

$Latex formula$   [19]

$Latex formula$   [20]

#### Reference

This post was largely inspired by
http://stats.stackexchange.com/questions/10441/how-to-calculate-the-variance-of-a-partition-of-variables/10445#10445
which tantalizingly ended with the (under-)statement:

These formulas are easy to derive by writing the desired variance as the scaled sum of $Latex formula$, then introducing $Latex formula$: $Latex formula$, using the square of difference formula, and simplifying