What is the overall mean and variance for partitioned data?

Suppose that:

  • A normally distributed random variable Latex formula is being sampled.
  • There are k partitions, each with Latex formula samples (where Latex formula ).
  • For each partition, the mean Latex formula and variance Latex formula are known, but the original observations Latex formula are not available.
  • The overall mean Latex formula and variance Latex formula are desired.

Computation of Latex formula and Latex formula are straightforward:

Latex formula   [1]

Latex formula   [2]

However, because the overall mean Latex formula is not available within the partition where Latex formula is sampled, the formula for the variance:

Latex formula   [3]

must be rewritten as

Latex formula   [4]

This formula may be derived as follows.

First introduce Latex formula into the formula for Latex formula

Latex formula   [5]

Then, simplify to remove Latex formula from the partition summation Latex formula :

Apply Latex formula

Latex formula   [6]

Replace Latex formula with Latex formula

Latex formula   [7]

Replace Latex formula with Latex formula

Latex formula   [8]

Distribute Latex formula

Latex formula   [9]

Apply Latex formula ;
express summation of sum as sum of summations

Latex formula   [10]

Because neither Latex formula nor Latex formula depends on j

Latex formula   [11]

Apply Latex formula;
distribute Latex formula;
express summation of sum as sum of summations;
simplify

Latex formula   [12]

Latex formula   [13]

Latex formula   [14]

Apply Latex formula;
simplify

Latex formula   [15]

Latex formula   [16]

Express summation of sum as sum of summations;
simplify

Latex formula   [17]

Because Latex formula does not depend on j, factor out Latex formula

Latex formula   [18]

Apply Latex formula and Latex formula;
simplify

Latex formula   [19]

Latex formula   [20]

Reference

This post was largely inspired by
http://stats.stackexchange.com/questions/10441/how-to-calculate-the-variance-of-a-partition-of-variables/10445#10445
which tantalizingly ended with the (under-)statement:

These formulas are easy to derive by writing the desired variance as the scaled sum of Latex formula, then introducing Latex formula: Latex formula, using the square of difference formula, and simplifying