Suppose that:

- A normally distributed random variable is being sampled.
- There are k partitions, each with samples (where ).
- For each partition, the mean and variance are known, but the original observations are not available.
- The overall mean and variance are desired.

Computation of and are straightforward:

[1]

[2]

However, because the overall mean is not available within the partition where is sampled, the formula for the variance:

[3]

must be rewritten as

[4]

This formula may be derived as follows.

First introduce into the formula for

[5]

Then, simplify to remove from the partition summation :

Apply

[6]

Replace with

[7]

Replace with

[8]

Distribute

[9]

Apply ;

express summation of sum as sum of summations

[10]

Because neither nor depends on j

[11]

Apply ;

distribute ;

express summation of sum as sum of summations;

simplify

[12]

[13]

[14]

Apply ;

simplify

[15]

[16]

Express summation of sum as sum of summations;

simplify

[17]

Because does not depend on j, factor out

[18]

Apply and ;

simplify

[19]

[20]

#### Reference

This post was largely inspired by

http://stats.stackexchange.com/questions/10441/how-to-calculate-the-variance-of-a-partition-of-variables/10445#10445

which tantalizingly ended with the (under-)statement:

These formulas are easy to derive by writing the desired variance as the scaled sum of , then introducing : , using the square of difference formula, and simplifying