Mediation with Dichotomous Outcomes

So you want to test a mediation hypothesis...
but you have dichotomous outcome variables...
and you've heard that the standard analysis won't work...

Don't worry -- It can be done!
(And the goal of this page is to make it accessible to non-statisticians)

The information on this site was derived from the equations presented by David A. Kenny and is used with permission from the author.
The equations come primarily from MacKinnon & Dwyer (1993).
This page was created and is maintained by Nathaniel R. Herr. Feel free to e-mail me with any questions or comments.

Ok, you have two choices from this point. You can take the long road and learn what is going on with this whole logistic regression/dichotomous outcome mediation thing...

Or, you can take the short road and just get to the part where you actually do the analysis.
This link takes you to the SPSS and Excel tools you can use to run your own mediation tests.

I recommend at least looking over the stats, but who am I to tell you what to do?

(the magical triangle)

Mediation has been wonderfully described by David Kenny on his website, so I refer you there if you want a detailed description of the procedure.

 Statistically, mediation is calculated with three equations that are typically represented with the diagrams to the right. X is considered the causal variable, M is the mediator, and Y is the outcome. The equations are: Y = cX + E1 <--- This is the effect of X on Y (ignoring M) M = aX + E2 <--- This is the effect of X on M Y = bM + c'X + E3 <--- The unique effect of both X and M on Y a, b, c, and c' are the "coefficients" and understanding them is the key to understanding mediation. I want to point out two critical things that often trip people up:
• The first is that the "b" coefficient is NOT simply the effect of M on Y. It is actually the effect of M on Y CONTROLLING for X. When you calculate the b coefficient, you need to put both the X and M variables in simultaneously. Doing this evaluates whether you meet Kenny's "step 3" when testing mediation.

• The second is understanding the difference between c and c'. Kenny's "step 1" requires that you show that there is a significant association between your X and Y variables, IGNORING your M variable. That association is represented by the c coefficient (see top part of diagram). The c' ("c prime") coefficient is not the same thing as the c coefficient. It is the effect of X on Y CONTROLLING for M. This is Kenny's "step 4" requirement. Ideally, this is a non-significant relationship, which would indicate complete mediation. Furthermore, note that b and c' are calculated at the SAME TIME! (i.e. you only have to do one SPSS run to get both values)

Ok, you have to be familiar with things so far if you are going to move on, because there's nothing but more diagrams and "primes" ahead. Check David Kenny's website if you're still not feeling surefooted.

Mediation and Logistic Regression
(the broken triangle)

The rumors are true...there is a problem with doing a mediation analysis when you have a dichotomous mediator, outcome, or both. MacKinnon and Dwyer (1993) described the issue and presented a statistical solution that seems to solve the problem.

Note: If you are testing mediation with a dichotomous outcome variable and are using the Sobel test, you do not need to use the procedures described below because the Sobel test can handle dichotomous outcomes (Thank you to Andrew Hayes for this advice). These procedures are for calculating the proportion of the effect mediated by the indirect path, as described by David Kenny.

To understand the problem, you first need to think about the variables in a different way. In each of the equations above there are "predictor" variables and there are "outcome" variables. Put simply, whatever variable is to the left of the "=" is the outcome and all variables to the right are the predictors. In the diagrams, arrows point from predictors to outcomes.
• Note that "X" is always a predictor variable, "Y" is always an outcome variable, and "M" is both (outcome of X, but predictor of Y)

Logistic regression creates a problem because when outcomes are dichotomous the coefficients in your mediation analyses end up being in different scales. This is not a problem if your predictors are dichotomous...so if you have data where only the X variable is binary, then stop reading immediately!! You can follow the traditional mediation procedure and move on with your life!

The rest of us will continue on ahead. To appropriately indicate the fact that the variables differ when they are predictors versus when they are outcomes, the three equations must be modified slightly: Y' = cX + E1 M' = aX + E2 Y" = bM + c'X + E3 Primes are added to M or Y to show that M' is on a different scale than M. Because Y is the outcome variable in two equations, it gets the coveted "double-prime" to show that the scale of Y differs from Y' which differs from Y".

To depict the equations I have broken the triangle apart into it's pieces (diagrams above). It's not as pretty as the other diagram, but it hopefully helps to visualize the equations better.

The next step is to make the coefficients comparable across the equations. This is accomplished by multiplying each coefficient by the standard deviation (SD) of the predictor variable in the equation and then dividing by the SD of the outcome variable. Here are the equations for the comparable ("comp") coefficients:
• comp a = a * SD(X)/SD(M')
• comp b = b * SD(M)/SD(Y")
• comp c = c * SD(X)/SD(Y')
• comp c' = c' * SD(X)/SD(Y")

You can get the SD of X and M by looking at descriptive statistics in SPSS. The other values are not as easy, but David Kenny has derived equations from MacKinnon and Dwyer's paper to solve for the variances of Y', M', and Y". (Remember that to get the SD, you simply have to square root the variance) Here are those equations:
• Var(Y') = c2 * Var(X) + p2/3
• Var(M') = a2 * Var(X) + p2/3
• Var(Y") = c'2 * Var(X) + b2 * Var(M) + 2*b*c'*Cov(X,M) + p2/3

I know what you're thinking...why the heck is pi in there?? (or if you only see "p" that means pi - 3.14...) Well, it turns out that pi squared divided by three is the variance of the standard logistic distribution. Say that ten times fast.

The comparable SEs are calculated in similar fashion to the comparable coefficients. Here are the equations:
• SE(comp a) = SE(a) * SD(X)/SD(M')
• SE(comp b) = SE(b) * SD(M)/SD(Y")
• SE(comp c) = SE(c) * SD(X)/SD(Y')
• SE(comp c') = SE(c') * SD(X)/SD(Y")

At this point the rugged individualists can work to solve those equations by hand. Everyone else can move on to the Tools section where I have provided files to help you simplify the process.

Tools for automatically calculating logistic regression mediation
(well, almost automatically)

I have created an Excel spreadsheet that prompts you for all of the necessary numbers that you must collect in order to run a mediation analysis with dichotomous M or Y variables. After you fill in all of the blanks, the percentage (proportion) of effect mediated will be calculated for you.

I also provide you with the result the Sobel test, but if you are using that method, you might as well calculate it directly on Kristopher Preacher's site.

To simplify things even more, I have created an SPSS syntax file that will run all of the analyses you need to do in order to get the values to plug into the spreadsheet. To use the SPSS files, simply download them, then choose "replace" under the edit menu when they are open. Replace all "xvar" with the name of your X variable, all "mvar" with the name of your M variable, and all "yvar" with the name of your Y variable. There are also notes in the syntax that direct you to the correct values.

Actually, there are three SPSS syntax files because you would run different analyses if your mediator is continuous or dichotomous and you may have ONLY a dichotomous mediator. This difference does not impact your results, however.

Dichotomous Mediator and Outcome Syntax
Continuous Mediator, Dichotomous Outcome Syntax
Dichotomous Mediator, Continous Outcome Syntax

Reference

MacKinnon, D. P., & Dwyer, J. H. (1993). Estimating mediated effects in prevention studies. Evaluation Review, 17, 144-158.

Feel free to e-mail Nate at nherr@american.edu with any questions or comments about the use of the files or the site in general. 