- A normally distributed random variable is being sampled.
- There are k partitions, each with samples (where ).
- For each partition, the mean and variance are known, but the original observations are not available.
- The overall mean and variance are desired.
Computation of and are straightforward:
However, because the overall mean is not available within the partition where is sampled, the formula for the variance:
must be rewritten as
This formula may be derived as follows.
First introduce into the formula for
Then, simplify to remove from the partition summation :
express summation of sum as sum of summations
Because neither nor depends on j
express summation of sum as sum of summations;
Express summation of sum as sum of summations;
Because does not depend on j, factor out
Apply and ;
This post was largely inspired by
which tantalizingly ended with the (under-)statement:
These formulas are easy to derive by writing the desired variance as the scaled sum of , then introducing : , using the square of difference formula, and simplifying