Coarsened Exact Matching

1 minute read

In my current position we recently launched a new product, and we wanted to see how the product impacted users’ engagement on the platform.

This is tricky, because we are working with a business line that is sensitive to changes (so A/B tests are typically off the table, since we can’t just randomize who sees what), and we are working with very limited data. We also want to capture the causal impact of using the product, vs. simply inherent differences between users who chose to use the product and those who did not.

So, I approached this problem using coarsened exact matching. I’ve worked with matching before (while TA’ing the Causal Inference course during my Master’s degree), but we focused on propensity score and exact matching at the time.

I began by consulting with stakeholders and doing some EDA to determine which variables best capture “engagement” with the platform. We went with the number of enrollments, number of item completions, number of course compilations, and minutes spent on the platform.

I then “bucketed” the learners before the introduction of the product into five groups per variable (based on the distribution of each variable), such that learners who enroll a lot are all in one bucket, those who enroll less are in another, etc. After that, I exactly matched the learners who went on to use the new product with five controls who were in the same buckets on each variable of consideration. Finally, the activity levels of the matched groups were compared after the treatment group used the new product.

This allowed us to get a statistically significant estimate of the causal impact of using the new product, since the covariates between the exposure groups were balanced, and confounding therefore overcome to at least some degree. While we cannot match for intrinsic motivation, which likely plays a role and obviously isn’t match-able, this approach did give us a reasonable upper bound on the impact of the new product.

I really liked this approach as it was rigorous, simple to understand and explain to stakeholders, and fast to implement.