Extended Marketing Mix Models

Marketing Mix Models are a common tool to help understand the impact of media investments, but in order to become truly useful, they need to be extended!

What is Extended Marketing Mix Models?

Marketing involves a lot of decisions with a large number of possible options, should I invest in branded search? On what tv channels should I run my ads? In order to decode all of the data that gets generated from the various marketing data, one uses a marketing mix model (MMM). It tries to attribute the effects of the various media channels on a specific KPI such as sales.

Today one commonly uses multiple linear regression models for this, it is based on Maximum likelihood estimation and stems from Frequentist statistics. The challenge is that one normally has more variables one wants to put into the model than there is actually time entries(data points), the data also tends to be sparse as one typically doesn’t run tv commercials every day on every channel, so media spending is zero most days. This is a recipe for a phenomenon called overfitting, which means that the model learns to predict so well on the training data that it loses the ability to predict on data it has not seen before.
under fitting, over fitting, perfect fit
Fig. 1, Showcasing how different models fit the same data. source

Bayesian hierarchical modeling

Luckily a great tool to address these issues, it is called Bayesian statistics specifically Bayesian hierarchical modeling. It enables us to encode domain knowledge into the model such as media channels should have a positive ROI and it is likely less than 10, macroeconomic factors should not account for more than 10 % of sales, etc. Not only that but one can pool information across variables, ex. One can assume that the tv channels Discovery 1 and Discovery 2 will result in a similar response to the media ads, by incorporating this information into the model, it can use this information to better understand the sparse data.
Prior, posterior, likelihood
Shows the Prior assumptions and the Likelihood combines to form the Posterior. source
This is not all the goodies Bayesian statistics provides, it also gives us a Posterior distribution over the weights, which is a probability distribution that allows us to do correct uncertainty estimation of the model's parameters and predictions. What this means is that the model can quantify how certain it is about things, ex. If you only run an ad on a tv channel for 5 days and that is all the running you have ever done on that tv channel. The model should be uncertain about the ROI of that media channel as there is so little information. On the other hand, if one has been running ads on the same media channel for years with similar creatives, the model should be confident in the ROI of that channel.

Correct Media Dynamics

That is why Bayesian hierarchical modeling is at the core of what we do. One can have the best inference method in the world, but one also needs a model that can capture real-world dynamics correctly. Without that, the model will also over/under fit and the learnings of the model will not be correct. That is why we have put a lot of effort into being able to correctly capture complex dynamics for various data types. Ex. in media one often works with the adstock concept, that is when one advertises on one day, one will not see all the effects on the same day but it will carry over to the following days.

different adstock effect for MMM models
Different types of Adstock functions
This is one area, where it is common to do the “simple” solution and use the traditional Adstock(see column 1 in fig 3), it basically assumes the greatest effect on the same day as the add. This is obviously crazy in the case where the ad runs late Friday night and the shop that ran the ad is closed until Monday. To address this people started using the Delayed Adstock function (column 2 in fig. 3), which basically shifts the effect a bit into the future, but still assumes a big peak that then decays. In reality, one needs something that can allow both a ramp-up and ramp-down behavior, as different people will act differently in the response to the ad.


In order for Marketing Mix Models to be used effectively one needs to go beyond what is currently being used to avoid overfitting and to be able to include domain knowledge into the model. The way to do this is by using Bayesian hierarchical modeling combined with models that can capture realistic media behavior.

If you are interested in going beyond Media Mix modeling to see how we can help model all the dynamics of the business at once, have a look at our blog post about modeling business dynamics.

Read More