2 Years of PhD Research: Stein Discrepancies with a Twist
Do you want to estimate a truncated density? Do you have access to a functional form of the boundary? I didn’t think so.

This is a blog post detailing Approximate Stein Classes for Truncated Density Estimation, by myself and my supervisor, Song Liu, which recently got accepted into ICML 2023.
Introduction
Pretend, for a moment, that you are the kind of person who likes to see where animals live, and you go out for the day to find where all the animal habitats are. You are interested in the broader picture; the general spread of habitat locations across a certain region. What you would be doing is looking to model a density based on each observation of a habitat. However, you might find that these habitats arbitrarily stop after some point, and you don’t have an exact reason why. In a similar way, you might not be allowed to cross into a neighbouring country to continue measurements. In both of these scenarios, you are prohibited from viewing a full picture of your dataset due to some unknown circumstances - but you do have access to something, which is a collection of points that roughly make up the ‘edge’ of your domain, where your data are truncated. How do you estimate your density now?
Background
Up until the introduction of this work, to estimate the density of your wildlife habitat locations, you would probably try to use TruncSM [1], a very fine work which uses Score Matching [2] to do truncated density estimation. This work is quite interesting if you are a fan of this kind of thing. If you want to read more about it I also wrote a blog post last year which goes into a few more details, or read the full paper here.
The jist of the method is that our true density,
Our method, Truncated Kernelised Stein Discrepancies (what a mouthful, we’ll call it TKSD from now on), uses the same broad strokes as Score Matching, which, roughly speaking, means we also use the score function,
Instead of minimising the score matching divergence like TruncSM, we want to construct a discrepancy based on Minimum Stein discrepancies [3]. If we want to make the two densities,
TKSD: How does he do it?
Well, we described above what we want to use, but we can’t actually use it. All because of that pesky truncation. The issue is due to (1) not actually holding when the density is truncated in a way which we do not know (recall the aim of this project is to be able to estimate the density when we do not have an exact form of the truncation boundary, and instead access it through a set of points). The actual cause is complicated, but involves the derivation of (1), and a boundary condition on an integration by parts not holding when the density is truncated. So, we have to do something slightly different.
Two lemmas, one proposition, one remark and one final theorem later, we get the following:
Note that (3) is not an exact analogue of (1) from before, but instead,
🚨🚨 Caution: Long Equation Ahead 🚨🚨
We can minimise in the same way as (2). Two theorems and a long analytic solution later we obtain our objective function,
Yes this is quite a lot. No it is not important to understand every detail. The key takeaway is that we have a loss function, consisting only of linear algebra operations, which we can minimise to obtain a truncated density estimate when the boundary is not known fully! 🎉🎉🎉
(There are also two assumptions for one final theorem which proves this is a consistent estimator. You think this sounds like a lot of theorems? This is only mild, as far as statistics papers go.)
Finally something interesting, results!
I know, I know, you must be thinking “Is the estimation error across a range of experiments comparable to previous implementations of truncated density estimators considering the use of an approximate set of boundary points instead of an exact functional form?”.
Or maybe you are just thinking “Is it better than the state-of-the-art?”. Same question, really. The answer is yes, it does pretty well.
Simulation Study

This plot shows mean estimation error over 64 trials in a simple task of estimating the mean of Gaussian distribution truncated within a

This second plot shows the same experiment setup but for truncation of the


The next set of experiments contains a more complex setup, which is estimating multiple modes of a mixed Gaussian distribution. This is a similar experiment setup to before, except we are estimating 2, 3 and 4 means of a Gaussian at the same time. Figure 3 shows the experiment visually; as we vary the number of mixture modes, the distribution becomes more complex and thus harder to estimate accurately.
Figure 4 shows the mean estimation error across 64 trials for TKSD and TruncSM (exact). We vary the number of mixture modes (left) from 2-4, and measure how that changes the error across both methods. We also fix the number of mixture modes as 2, and vary sample size
Regression Example

Let’s look at one specific example before we go, a simple linear regression. Since TKSD is a density estimation method, we can use it to estimate parameters of the (conditional) mean of a Normal distribution, given some feature variables. Truncation happens in the
The second plot is an experiment on a real-world dataset, given by UCLA: Statistical Consulting Group [5]. This dataset contains student test scores in a school for which the acceptance threshold is 40/100, and therefore the response variable (the
test scores) are truncated below by 40 and above by 100. Since no scores get close to 100, we only consider one-sided
truncation at
Conclusions
This work has taken up the majority of my PhD, around 2 years. It is more complicated than I have given it credit for in this post, and please do read the full paper if you want more detail. Even with all the detail, it is not a work that would normally take 2 years. It started as a way of extending the previous implementation we developed in TruncSM, to try and adaptively solve for what we were calling a ‘boundary function’. It became clear that score matching was holding us back, and then we kept having to add extra constraints and details to an implementation around Stein discrepancies. Amongst loads of different ideas, things also kept going wrong, so it is a great relief to see this research finished, working, and even performing extremely well, not to mention being accepted to ICML!
Anyway, why would you be interested in TKSD in general? If you care about truncated densities, and want something that is
- adaptive to the dataset at hand
- requires no prior knowledge about the boundary, except being able to obtain samples from it
- performs better in more complicated scenarios
- has a nice theoretical and empirical results
- is an acronym
then look no further than TKSD!
References
[1] Liu, S., Kanamori, T., and Williams, D. J. Estimating density models with truncation boundaries using score matching. Journal of Machine Learning Research, 23(186):1–38, 2022.
[2] Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(24):695–709, 2005.
[3] Barp, A., Briol, F.-X., Duncan, A., Girolami, M., and Mackey, L. Minimum stein discrepancy estimators. In Advances in Neural Information Processing Systems, volume 32, 2019.
[4] Chwialkowski, K., Strathmann, H., and Gretton, A. A kernel test of goodness of fit. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 2606–2615. PMLR, 2016
[5] UCLA: Statistical Consulting Group. Truncated regression — stata data analysis examples. URL https://stats.oarc.ucla.edu/stata/dae/truncated-regression/. Accessed March 17, 2023.
"leading the investigation" ??? τι ειναι...FBI ?
"you don't have a monopoly on grief."
"Course of Freedom party".....ΝΟ ΝΟ ΝΟ is the "Plefsi" or "Sailing" party !
Aλεξανδρα...μαθε παιδι μου γραμματα, κατσε λιγο UK ή USA ΠΡΙΝ αρθρογραφησεις, διοτι δεν εμαθες να σκεφτεσαι "αγγλοσαξωνικα", εισαι ακομη η φτωχη Γκρεκιά συγγενης ! γι αυτο δε βγαζει νοημα ουτε καν ο τιτλος. Βοηθαει αν γαμας κανα Αμερικανο τουριστα που και που στη Πλακα, ξερεις μουσατους με σακιδιο αναρχοαπλυτους σα το συναφι σου.
μη ξεχνας οτι οι ...ξενοι (στους οποιους υποτιθεται απευθυνεσαι) δεν εχουν πομπωδεις ορους οπως "εξεταστικη" ή "διαφανεια" που εχει η ψωροκωσταινα, με αποτελεσμα να....ντρεπεται και η ντροπη για τα χαλια σου !
δε σου καταλογιζω το επισης αγραμματο "Make sure to reference “TPP International” and your order number as the reason for payment." διοτι οφειλεται στα "ξενα" που εμαθε το προσωπικο του ΤΠΠ στο Περισσο ! μιλαμε "ΤΗΕΥ ARE OVER RIVERS" !!
Xese mas re fascist psycho!
Πολύ χρόνο έχεις αντί να ψοφήσεις, φασιστάκο.