29 November 2011

Statistics Bleg

This is a bit of a long-shot, but I'm in need of some statistics help.

Does anyone out there know how to calculate the Standard Error of the Mean (SEM) for a set of data given the SEM from two subsets of the data?

I'm looking at a paper here that divides the results up into two mutually exclusive and exhaustive subsets. Let's call them Red and Blue. I don't have the original data, but I have the mean and SEM of the Red and Blue samples, and I know the size of each. It's trivial to calculate the mean for the overall set, but can I recover the overall SEM?

Wikipedia tells me that "If the standard error of several individual quantities is known then the standard error of some function of the quantities can be easily calculated in many cases;" Is this as trivial as doing a weighted average of the two samples' SEMs the same way I would to calculate the overall mean?

Update: Solution can be found here.


  1. Let's say you have sets S1 and S2 of sizes N1 and N2 respectively. All observations are independent. Let's say the means are m1 and m2, respectively. And corresponding SEMs are SEM1 and SEM2. Now

    SEMi = STDi/sqrt(Ni)

    where i could be 1 or 2 and STD is the standard devition (or its estimate). From here

    STDtotal_i = sqrt(STD1^2+STD2^2)
    = sqrt(SEM1^2*N1 + SEM2^2*N2)

    So we have

    SEMtotal = STDtotal_i/sqrt(N1+N2)
    = sqrt(SEM1^2*N1 + SEM2^2*N2)/sqrt(N1+N2)

    I hope it helps

  2. Thanks! The approach seems so obvious now. I was trying to make it more difficult than it had to be.

    I think do have an error in calculating STDtotal though. I used the formula given here. When I used that on some data I generated (so I would know the statistics of the two partitions as well as the total set) it all worked out fine.

  3. Yes, you are right. Sorry. I guess midway through my mind went in the wrong direction of the sum of the two random variables.