Statistical test to compare two groups of data

Comparing the empirical distribution of a variable across different groups is a common problem in data science. In particular, in causal inference, the problem often arises when we have to assess the quality of randomization.

Nội dung chính Show

Kernel Density
Cumulative Distribution
Standardized Mean Difference (SMD)
Mann–Whitney U Test
Permutation Tests
Chi-Squared Test
Kolmogorov-Smirnov Test
Violin Plot
Ridgeline Plot
Related Articles
Blog-Posts/distr.ipynb at main · matteocourthoud/Blog-Posts
Code and notebooks for my blog posts. Contribute to matteocourthoud/Blog-Posts development by creating an account on…
Thank you for reading!
What is the best statistical test to compare two groups?
What test is used to compare observations between two groups?

When we want to assess the causal effect of a policy (or UX feature, ad campaign, drug, …), the golden standard in causal inference is randomized control trials, also known as A/B tests. In practice, we select a sample for the study and randomly split it into a control and a treatment group, and we compare the outcomes between the two groups. Randomization ensures that the only difference between the two groups is the treatment, on average, so that we can attribute outcome differences to the treatment effect.

The problem is that, despite randomization, the two groups are never identical. However, sometimes, they are not even “similar”. For example, we might have more males in one group, or older people, etc.. (we usually call these characteristics covariates or control variables). When it happens, we cannot be certain anymore that the difference in the outcome is only due to the treatment and cannot be attributed to the imbalanced covariates instead. Therefore, it is always important, after randomization, to check whether all observed variables are balanced across groups and whether there are no systematic differences. Another option, to be certain ex-ante that certain covariates are balanced, is stratified sampling.

In this blog post, we are going to see different ways to compare two (or more) distributions and assess the magnitude and significance of their difference. We are going to consider two different approaches, visual and statistical. The two approaches generally trade off intuition with rigor: from plots, we can quickly assess and explore differences, but it’s hard to tell whether these differences are systematic or due to noise.

Example

Let’s assume we need to perform an experiment on a group of individuals and we have randomized them into a treatment and control group. We would like them to be as comparable as possible, in order to attribute any difference between the two groups to the treatment effect alone. We also have divided the treatment group into different arms for testing different treatments (e.g. slight variations of the same drug).

For this example, I have simulated a dataset of 1000 individuals, for whom we observe a set of characteristics. I import the data generating process

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

3 from

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

4 and some plotting functions and libraries from

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

from src.utils import *
from src.dgp import dgp_rnd_assignment

df = dgp_rnd_assignment().generate_data()
df.head()

png

Data snapshot, image by Author

We have information on 1000 individuals, for which we observe

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

7 and weekly

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8. Each individual is assigned either to the treatment or control

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

9 and treated individuals are distributed across four treatment

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

Two Groups — Plots

Let’s start with the simplest setting: we want to compare the distribution of income across the

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

1 and

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

2 group. We first explore visual approaches and then statistical approaches. The advantage of the first is intuition while the advantage of the second is rigor.

For most visualizations, I am going to use Python’s

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

3 library.

Boxplot

A first visual approach is the boxplot. The boxplot is a good trade-off between summary statistics and data visualization. The center of the box represents the median while the borders represent the first (Q1) and third quartile (Q3), respectively. The whiskers instead extend to the first data points that are more than 1.5 times the interquartile range (Q3 — Q1) outside the box. The points that fall outside of the whiskers are plotted individually and are usually considered outliers.

Therefore, the boxplot provides both summary statistics (the box and the whiskers) and direct data visualization (the outliers).

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

png

Distribution of income across treatment and control groups, image by Author

It seems that the

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 distribution in the

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

1 group is slightly more dispersed: the orange box is larger and its whiskers cover a wider range. However, the issue with the boxplot is that it hides the shape of the data, telling us some summary statistics but not showing us the actual data distribution.

Histogram

The most intuitive way to plot a distribution is the histogram. The histogram groups the data into equally wide bins and plots the number of observations within each bin.

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

png

Distribution of income across treatment and control groups, image by Author

There are multiple issues with this plot:

Since the two groups have a different number of observations, the two histograms are not comparable
The number of bins is arbitrary

We can solve the first issue using the

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

6 option to plot the

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

7 instead of the count and setting the

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

8 option to

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

9 to normalize each histogram separately.

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

png

Distribution of income across treatment and control groups, image by Author

Now the two histograms are comparable!

However, an important issue remains: the size of the bins is arbitrary. In the extreme, if we bunch the data less, we end up with bins with at most one observation, if we bunch the data more, we end up with a single bin. In both cases, if we exaggerate, the plot loses informativeness. This is a classical bias-variance trade-off.

Kernel Density

One possible solution is to use a kernel density function that tries to approximate the histogram with a continuous function, using kernel density estimation (KDE).

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

png

Distribution of income across treatment and control groups, image by Author

From the plot, it seems that the estimated kernel density of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 has "fatter tails" (i.e. higher variance) in the

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

1 group, while the average seems similar across groups.

The issue with kernel density estimation is that it is a bit of a black box and might mask relevant features of the data.

Cumulative Distribution

A more transparent representation of the two distributions is their cumulative distribution function. At each point of the x-axis (

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8) we plot the percentage of data points that have an equal or lower value. The main advantages of the cumulative distribution function are that

we do not need to make any arbitrary choice (e.g. number of bins)
we do not need to perform any approximation (e.g. with KDE), but we represent all data points

sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density",
             element="step", fill=False, cumulative=True, common_norm=False);
plt.title("Cumulative distribution function");

png

Cumulative distribution of income across treatment and control groups, image by Author

How should we interpret the graph?

Since the two lines cross more or less at 0.5 (y axis), it means that their median is similar
Since the orange line is above the blue line on the left and below the blue line on the right, it means that the distribution of the
```
sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");
```
1 group as fatter tails

Q-Q Plot

A related method is the Q-Q plot, where q stands for quantile. The Q-Q plot plots the quantiles of the two distributions against each other. If the distributions are the same, we should get a 45-degree line.

There is no native Q-Q plot function in Python and, while the

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

4 package provides a

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

5 function, it is quite cumbersome. Therefore, we will do it by hand.

First, we need to compute the quartiles of the two groups, using the

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

6 function.

income = df['Income'].values
income_t = df.loc[df.Group=='treatment', 'Income'].values
income_c = df.loc[df.Group=='control', 'Income'].values

df_pct = pd.DataFrame()
df_pct['q_treatment'] = np.percentile(income_t, range(100))
df_pct['q_control'] = np.percentile(income_c, range(100))

Now we can plot the two quantile distributions against each other, plus the 45-degree line, representing the benchmark perfect fit.

plt.figure(figsize=(8, 8))
plt.scatter(x='q_control', y='q_treatment', data=df_pct, label='Actual fit');
sns.lineplot(x='q_control', y='q_control', data=df_pct, color='r', label='Line of perfect fit');
plt.xlabel('Quantile of income, control group')
plt.ylabel('Quantile of income, treatment group')
plt.legend()
plt.title("QQ plot");

png

Q-Q plot, image by Author

The Q-Q plot delivers a very similar insight with respect to the cumulative distribution plot: income in the treatment group has the same median (lines cross in the center) but wider tails (dots are below the line on the left end and above on the right end).

Two Groups — Tests

So far, we have seen different ways to visualize differences between distributions. The main advantage of visualization is intuition: we can eyeball the differences and intuitively assess them.

However, we might want to be more rigorous and try to assess the statistical significance of the difference between the distributions, i.e. answer the question “is the observed difference systematic or due to sampling noise?”.

We are now going to analyze different tests to discern two distributions from each other.

T-test

The first and most common test is the student t-test. T-tests are generally used to compare means. In this case, we want to test whether the means of the

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 distribution are the same across the two groups. The test statistic for the two-means comparison test is given by:

t test statistic, image by Author

Where x̅ is the sample mean and s is the sample standard deviation. Under mild conditions, the test statistic is asymptotically distributed as a Student t distribution.

We use the

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

8 function from

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

9 to perform the t-test. The function returns both the test statistic and the implied p-value.

from scipy.stats import ttest_ind

stat, p_value = ttest_ind(income_c, income_t)
print(f"t-test: statistic={stat:.4f}, p-value={p_value:.4f}")t-test: statistic=-1.5549, p-value=0.1203

The p-value of the test is 0.12, therefore we do not reject the null hypothesis of no difference in means across treatment and control groups.

Note: the t-test assumes that the variance in the two samples is the same so that its estimate is computed on the joint sample. Welch’s t-test allows for unequal variances in the two samples.

Standardized Mean Difference (SMD)

In general, it is good practice to always perform a test for differences in means on all variables across the treatment and control group, when we are running a randomized control trial or A/B test.

However, since the denominator of the t-test statistic depends on the sample size, the t-test has been criticized for making p-values hard to compare across studies. In fact, we may obtain a significant result in an experiment with a very small magnitude of difference but a large sample size while we may obtain a non-significant result in an experiment with a large magnitude of difference but a small sample size.

One solution that has been proposed is the standardized mean difference (SMD). As the name suggests, this is not a proper test statistic, but just a standardized difference, which can be computed as:

Standardized Mean Difference, image by Author

Usually, a value below 0.1 is considered a “small” difference.

It is good practice to collect average values of all variables across treatment and control groups and a measure of distance between the two — either the t-test or the SMD — into a table that is called balance table. We can use the

sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density",
             element="step", fill=False, cumulative=True, common_norm=False);
plt.title("Cumulative distribution function");

0 function from the

sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density",
             element="step", fill=False, cumulative=True, common_norm=False);
plt.title("Cumulative distribution function");

1 library to generate it. As the name of the function suggests, the balance table should always be the first table you present when performing an A/B test.

from causalml.match import create_table_one

df['treatment'] = df['Group']=='treatment'
create_table_one(df, 'treatment', ['Gender', 'Age', 'Income'])

png

Balance table, image by Author

In the first two columns, we can see the average of the different variables across the treatment and control groups, with standard errors in parenthesis. In the last column, the values of the SMD indicate a standardized difference of more than 0.1 for all variables, suggesting that the two groups are probably different.

Mann–Whitney U Test

An alternative test is the Mann–Whitney U test. The null hypothesis for this test is that the two groups have the same distribution, while the alternative hypothesis is that one group has larger (or smaller) values than the other.

Different from the other tests we have seen so far, the Mann–Whitney U test is agnostic to outliers and concentrates on the center of the distribution.

The test procedure is the following.

Combine all data points and rank them (in increasing or decreasing order)
Compute U₁ = R₁ − n₁(n₁ + 1)/2, where R₁ is the sum of the ranks for data points in the first group and n₁ is the number of points in the first group.
Compute U₂ similarly for the second group.
The test statistic is given by stat = min(U₁, U₂).

Under the null hypothesis of no systematic rank differences between the two distributions (i.e. same median), the test statistic is asymptotically normally distributed with known mean and variance.

The intuition behind the computation of R and U is the following: if the values in the first sample were all bigger than the values in the second sample, then R₁ = n₁(n₁ + 1)/2 and, as a consequence, U₁ would then be zero (minimum attainable value). Otherwise, if the two samples were similar, U₁ and U₂ would be very close to n₁ n₂ / 2 (maximum attainable value).

We perform the test using the

sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density",
             element="step", fill=False, cumulative=True, common_norm=False);
plt.title("Cumulative distribution function");

2 function from

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

We get a p-value of 0.6 which implies that we do not reject the null hypothesis that the distribution of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 is the same in the treatment and control groups.

Note: as for the t-test, there exists a version of the Mann–Whitney U test for unequal variances in the two samples, the Brunner-Munzel test.

Permutation Tests

A non-parametric alternative is permutation testing. The idea is that, under the null hypothesis, the two distributions should be the same, therefore shuffling the

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

9 labels should not significantly alter any statistic.

We can choose any statistic and check how its value in the original sample compares with its distribution across

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

9 label permutations. For example, let's use as a test statistic the difference in sample means between the treatment and control groups.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

The permutation test gives us a p-value of 0.053, implying a weak non-rejection of the null hypothesis at the 5% level.

How do we interpret the p-value? It means that the difference in means in the data is larger than 1–0.0560 = 94.4% of the differences in means across the permuted samples.

We can visualize the test, by plotting the distribution of the test statistic across permutations against its sample value.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

png

Mean difference distribution across permutations, image by Author

As we can see, the sample statistic is quite extreme with respect to the values in the permuted samples, but not excessively.

Chi-Squared Test

The chi-squared test is a very powerful test that is mostly used to test differences in frequencies.

One of the least known applications of the chi-squared test is testing the similarity between two distributions. The idea is to bin the observations of the two groups. If the two distributions were the same, we would expect the same frequency of observations in each bin. Importantly, we need enough observations in each bin, in order for the test to be valid.

I generate bins corresponding to deciles of the distribution of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 in the control group and then I compute the expected number of observations in each bin in the treatment group if the two distributions were the same.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

png

Bins and frequencies, image by Author

We can now perform the test by comparing the expected (E) and observed (O) number of observations in the treatment group, across bins. The test statistic is given by

Chi-squared test statistic, image by Author

where the bins are indexed by i and O is the observed number of data points in bin i and E is the expected number of data points in bin i. Since we generated the bins using deciles of the distribution of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 in the control group, we expect the number of observations per bin in the treatment group to be the same across bins. The test statistic is asymptotically distributed as a chi-squared distribution.

To compute the test statistic and the p-value of the test, we use the

sns.histplot(x='Income', data=df, hue='Group', bins=len(df), stat="density",
             element="step", fill=False, cumulative=True, common_norm=False);
plt.title("Cumulative distribution function");

9 function from

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

Differently from all other tests so far, the chi-squared test strongly rejects the null hypothesis that the two distributions are the same. Why?

The reason lies in the fact that the two distributions have a similar center but different tails and the chi-squared test tests the similarity along the whole distribution and not only in the center, as we were doing with the previous tests.

This result tells a cautionary tale: it is very important to understand what you are actually testing before drawing blind conclusions from a p-value!

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov test is probably the most popular non-parametric test to compare distributions. The idea of the Kolmogorov-Smirnov test is to compare the cumulative distributions of the two groups. In particular, the Kolmogorov-Smirnov test statistic is the maximum absolute difference between the two cumulative distributions.

Kolmogorov-Smirnov test statistic, image by Author

Where F₁ and F₂ are the two cumulative distribution functions and x are the values of the underlying variable. The asymptotic distribution of the Kolmogorov-Smirnov test statistic is Kolmogorov distributed.

To better understand the test, let’s plot the cumulative distribution functions and the test statistic. First, we compute the cumulative distribution functions.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

png

Snapshot of cumulative distribution dataset, image by Author

We now need to find the point where the absolute distance between the cumulative distribution functions is largest.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

We can visualize the value of the test statistic, by plotting the two cumulative distribution functions and the value of the test statistic.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

png

Kolmogorov-Smirnov test statistic, image by Author

From the plot, we can see that the value of the test statistic corresponds to the distance between the two cumulative distributions at

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8~650. For that value of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8, we have the largest imbalance between the two groups.

We can now perform the actual test using the

income = df['Income'].values
income_t = df.loc[df.Group=='treatment', 'Income'].values
income_c = df.loc[df.Group=='control', 'Income'].values

df_pct = pd.DataFrame()
df_pct['q_treatment'] = np.percentile(income_t, range(100))
df_pct['q_control'] = np.percentile(income_c, range(100))

3 function from

sns.kdeplot(x='Income', data=df, hue='Group', common_norm=False);
plt.title("Kernel Density Function");

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

The p-value is below 5%: we reject the null hypothesis that the two distributions are the same, with 95% confidence.

Note 1: The KS test is too conservative and rejects the null hypothesis too rarely. Lilliefors test corrects this bias using a different distribution for the test statistic, the Lilliefors distribution.
Note 2: the KS test uses very little information since it only compares the two cumulative distributions at one point: the one of maximum distance. The Anderson-Darling test and the Cramér-von Mises test instead compare the two distributions along the whole domain, by integration (the difference between the two lies in the weighting of the squared distances).

Multiple Groups — Plots

So far we have only considered the case of two groups: treatment and control. But that if we had multiple groups? Some of the methods we have seen above scale well, while others don’t.

As a working example, we are now going to check whether the distribution of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 is the same across treatment

sns.histplot(data=df, x='Income', hue='Group', bins=50, stat='density', common_norm=False);
plt.title("Density Histogram");

Boxplot

The boxplot scales very well when we have a number of groups in the single-digits since we can put the different boxes side-by-side.

sns.boxplot(data=df, x='Group', y='Income');
plt.title("Boxplot");

png

Distribution of income across treatment arms, image by Author

From the plot, it looks like the distribution of

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 is different across treatment arms, with higher numbered arms having a higher average income.

Violin Plot

A very nice extension of the boxplot that combines summary statistics and kernel density estimation is the violin plot. The violin plot displays separate densities along the y axis so that they don’t overlap. By default, it also adds a miniature boxplot inside.

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

png

Distribution of income across treatment arms, image by Author

As for the boxplot, the violin plot suggests that income is different across treatment arms.

Ridgeline Plot

Lastly, the ridgeline plot plots multiple kernel density distributions along the x-axis, making them more intuitive than the violin plot but partially overlapping them. Unfortunately, there is no default ridgeline plot neither in

income = df['Income'].values
income_t = df.loc[df.Group=='treatment', 'Income'].values
income_c = df.loc[df.Group=='control', 'Income'].values

df_pct = pd.DataFrame()
df_pct['q_treatment'] = np.percentile(income_t, range(100))
df_pct['q_control'] = np.percentile(income_c, range(100))

8 nor in

income = df['Income'].values
income_t = df.loc[df.Group=='treatment', 'Income'].values
income_c = df.loc[df.Group=='control', 'Income'].values

df_pct = pd.DataFrame()
df_pct['q_treatment'] = np.percentile(income_t, range(100))
df_pct['q_control'] = np.percentile(income_c, range(100))

9. We need to import it from

plt.figure(figsize=(8, 8))
plt.scatter(x='q_control', y='q_treatment', data=df_pct, label='Actual fit');
sns.lineplot(x='q_control', y='q_control', data=df_pct, color='r', label='Line of perfect fit');
plt.xlabel('Quantile of income, control group')
plt.ylabel('Quantile of income, treatment group')
plt.legend()
plt.title("QQ plot");

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

png

Distribution of income across treatment arms, image by Author

Again, the ridgeline plot suggests that higher numbered treatment arms have higher income. From this plot, it is also easier to appreciate the different shapes of the distributions.

Multiple Groups — Tests

Lastly, let’s consider hypothesis tests to compare multiple groups. For simplicity, we will concentrate on the most popular one: the F-test.

F-test

With multiple groups, the most popular test is the F-test. The F-test compares the variance of a variable across different groups. This analysis is also called analysis of variance, or ANOVA.

In practice, the F-test statistic is given by

F test statistic, image by Author

Where G is the number of groups, N is the number of observations, x̅ is the overall mean and x̅g is the mean within group g. Under the null hypothesis of group independence, the f-statistic is F-distributed.

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

The test p-value is basically zero, implying a strong rejection of the null hypothesis of no differences in the

sns.histplot(data=df, x='Income', hue='Group', bins=50);
plt.title("Histogram");

8 distribution across treatment arms.

Conclusion

In this post, we have seen a ton of different ways to compare two or more distributions, both visually and statistically. This is a primary concern in many applications, but especially in causal inference where we use randomization to make treatment and control groups as comparable as possible.

We have also seen how different methods might be better suited for different situations. Visual methods are great to build intuition, but statistical methods are essential for decision-making since we need to be able to assess the magnitude and statistical significance of the differences.

References

[1] Student, The Probable Error of a Mean (1908), Biometrika.

[2] F. Wilcoxon, Individual Comparisons by Ranking Methods (1945), Biometrics Bulletin.

[3] B. L. Welch, The generalization of “Student’s” problem when several different population variances are involved (1947), Biometrika.

[4] H. B. Mann, D. R. Whitney, On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other (1947), The Annals of Mathematical Statistics.

[5] E. Brunner, U. Munzen, The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation (2000), Biometrical Journal.

[6] A. N. Kolmogorov, Sulla determinazione empirica di una legge di distribuzione (1933), Giorn. Ist. Ital. Attuar..

[7] H. Cramér, On the composition of elementary errors (1928), Scandinavian Actuarial Journal.

[8] R. von Mises, Wahrscheinlichkeit statistik und wahrheit (1936), Bulletin of the American Mathematical Society.

[9] T. W. Anderson, D. A. Darling, Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochastic Processes (1953), The Annals of Mathematical Statistics.

Goodbye Scatterplot, Welcome Binned Scatterplot

Code

You can find the original Jupyter Notebook here:

Blog-Posts/distr.ipynb at main · matteocourthoud/Blog-Posts

Code and notebooks for my blog posts. Contribute to matteocourthoud/Blog-Posts development by creating an account on…

github.com

Thank you for reading!

I really appreciate it! 🤗 If you liked the post and would like to see more, consider following me. I post once a week on topics related to causal inference and data analysis. I try to keep my posts simple but precise, always providing code, examples, and simulations.

Also, a small disclaimer: I write to learn so mistakes are the norm, even though I try my best. Please, when you spot them, let me know. I also appreciate suggestions on new topics!

What is the best statistical test to compare two groups?

anova - Best statistical test to compare two groups when they have different distributions - Cross Validated.

What test is used to compare observations between two groups?

Comparison tests They can be used to test the effect of a categorical variable on the mean value of some other characteristic. T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women).