All other things being held constant, what is the change in the dependent variable for a unit change

We’ve covered Simple Linear Regression, whereby “Simple” means just one independent variable. Next we’ll talk about Multiple Linear Regression, where “Multiple” just means multiple independent variables.

\[Y = b_0 + b_1X_1 + b_2X_2 + b_3X_3 + \ldots\]

This is actually quite a straightforward extension, once we get the hang of the interpretation. We can simply extend our linear model to also include \(X_2\), \(X_3\) and so forth, and now our \(b_1\) is called the coefficient on \(X_1\) or the partial coefficient on \(X_1\).

Here’s the most difficult part.

When we interpret each partial coefficient, the value is interpreted holding all the other IVs constant. So \(b_1\) represents the expected change in Y when \(X_1\) increases by one unit, holding constant all the other variables.

This is so important that I’ll say it twice more. If you really understand this point, then you know how to do multiple regression.

The partial regression coefficients represent the expected change in the dependent variable when the associated independent variable is increased by one unit while the values of all other independent variables are held constant.

And a third time, in different words:

Each coefficient \(b_i\) estimates the mean change in the dependent variable (\(Y\)) per unit increase in \(X_i\), when all other predictors are held constant.

Here’s an example

\[\text{Profit} = -2000 + 2.5* \text{ExpenditureOnAdvertising} +32*\text{NumberOfProductsSold}\]

CoefficientIntepretation:
(b0) Monthly profit is -$2000 without any money spent on advertising and with zero products sold.
(b1) Holding the number of products sold constant, every dollar spent on advertising increases profit by $2.50
(b2) Keeping advertising expenditure constant, every product sold increases profit by $32

You are right. Technically, it is any value. However, when I teach this I usually tell people that you are getting the effect of a one unit change in $X_j$ when all other variables are held at their respective means. I believe this is a common way to explain it that is not specific to me.

I usually go on to mention that if you don't have any interactions, $\beta_j$ will be the effect of a one unit change in $X_j$, no matter what the values of your other variables are. But I like to start with the mean formulation. The reason is that there are two effects of including multiple variables in a regression model. First, you get the effect of $X_j$ controlling for the other variables (see my answer here). The second is that the presence of the other variables (typically) reduces the residual variance of the model, making your variables (including $X_j$) 'more significant'. It is hard for people to understand how this works if the other variables have values that are all over the place. That seems like it would increase the variability somehow. If you think of adjusting each data point up or down for the value of each other variable until all the rest of the $X$ variables have been moved to their respective means, it is easier to see that the residual variability has been reduced.

I don't get to interactions until a class or two after I've introduced the basics of multiple regression. However, when I do get to them, I return to this material. The above applies when there are not interactions. When there are interactions, it is more complicated. In that case, the interacting variable[s] is being held constant (very specifically) at $0$, and at no other value.

If you want to see how this plays out algebraically, it is rather straight-forward. We can start with the no-interaction case. Let's determine the change in $\hat Y$ when all other variables are held constant at their respective means. Without loss of generality, let's say that there are three $X$ variables and we are interested in understanding how the change in $\hat Y$ is associated with a one unit change in $X_3$, holding $X_1$ and $X_2$ constant at their respective means:

\begin{align} \hat Y_i &= \hat\beta_0 + \hat\beta_1\bar X_1 + \hat\beta_2\bar X_2 + \hat\beta_3X_{3i} \\ \hat Y_{i'} &= \hat\beta_0 + \hat\beta_1\bar X_1 + \hat\beta_2\bar X_2 + \hat\beta_3(X_{3i}\!+\!1) \\ ~ \\ &\text{subtracting the first equation from the second:} \\ ~ \\ \hat Y_{i'} - \hat Y_i &= \hat\beta_0 - \hat\beta_0 + \hat\beta_1\bar X_1 - \hat\beta_1\bar X_1 + \hat\beta_2\bar X_2 - \hat\beta_2\bar X_2 + \hat\beta_3(X_{3i}\!+\!1) - \hat\beta_3X_{3i} \\ \Delta Y &= \hat\beta_3X_{3i} + \hat\beta_3 - \hat\beta_3X_{3i} \\ \Delta Y &= \hat\beta_3 \end{align}

Now it is obvious that we could have put any value in for $X_1$ and $X_2$ in the first two equations, so long as we put the same value for $X_1$ ($X_2$) in both of them. That is, so long as we are holding $X_1$ and $X_2$ constant.

On the other hand, it does not work out this way if you have an interaction. Here I show the case where there is an $X_1X_3$ interaction term:

\begin{align} \hat Y_i &= \hat\beta_0 + \hat\beta_1\bar X_1 + \hat\beta_2\bar X_2 + \hat\beta_3X_{3i} \quad\quad\ \! + \hat\beta_4\bar X_1X_{3i} \\ \hat Y_{i'} &= \hat\beta_0 + \hat\beta_1\bar X_1 + \hat\beta_2\bar X_2 + \hat\beta_3(X_{3i}\!+\!1) + \hat\beta_4\bar X_1(X_{3i}\!+\!1) \\ ~ \\ &\text{subtracting the first equation from the second:} \\ ~ \\ \hat Y_{i'} - \hat Y_i &= \hat\beta_0 - \hat\beta_0 + \hat\beta_1\bar X_1 - \hat\beta_1\bar X_1 + \hat\beta_2\bar X_2 - \hat\beta_2\bar X_2 + \hat\beta_3(X_{3i}\!+\!1) - \hat\beta_3X_{3i} + \\ &\quad\ \hat\beta_4\bar X_1(X_{3i}\!+\!1) - \hat\beta_4\bar X_1X_{3i} \\ \Delta Y &= \hat\beta_3X_{3i} + \hat\beta_3 - \hat\beta_3X_{3i} + \hat\beta_4\bar X_1 X_{3i} + \hat\beta_4\bar X_1 - \hat\beta_4\bar X_1X_{3i} \\ \Delta Y &= \hat\beta_3 + \hat\beta_4\bar X_1 \end{align}

In this case, it is not possible to hold all else constant. Because the interaction term is a function of $X_1$ and $X_3$, it is not possible to change $X_3$ without the interaction term changing as well. Thus, $\hat\beta_3$ equals the change in $\hat Y$ associated with a one unit change in $X_3$ only when the interacting variable ($X_1$) is held at $0$ instead of $\bar X_1$ (or any other value but $0$), in which case the last term in the bottom equation drops out.

In this discussion, I have focused on interactions, but more generally, the issue is when there is any variable that is a function of another such that it is not possible to change the value of the first without changing the respective value of the other variable. In such cases, the meaning of $\hat\beta_j$ becomes more complicated. For example, if you had a model with $X_j$ and $X_j^2$, then $\hat\beta_j$ is the derivative $\frac{dY}{dX_j}$ holding all else equal, and holding $X_j=0$ (see my answer here). Other, still more complicated formulations are possible as well.