30 November 2011

Calculating the SEM of a set from the SEMs of two subsets

Before moving on to more relevant blogging I should say that I got my statistics issue from last week sorted out. Thanks to my anonymous commenter and a tip I got out-of-band.

The OOB idea was to generate artificial data sets with the given SEMs and then calculate the SEM of the combined data. That sort of experimental approach is the kind of thing I like.

Anon pointed out what should have been immediately obvious to me: use the SEM of each subset to get their standard deviations, calculate the standard deviation of the whole set, then calculate the SEM of the whole set. Anon unfortunately had the incorrect method of finding the overall std.dev, but the correct method is readily available. I used both formulas listed here, and both gave me the same results, as well as matching the "experimental" results I generated earlier. I confess to not really understanding what the difference is supposed to be between those two methods.  (That's why I need to go back and learn stats for real, rather than the ad-hoc way I've gone about it up until now.)

For the record, here's the two ways you can get the SEM of the full set given sx = SEM(X), sy = SEM(Y), nx = |X| and ny = |Y|, and the mu's being the means for the appropriate sets.






(Click to enlarge.)



No comments:

Post a Comment