A practical advice on non-parametric density estimation.

Facebook Tweet LinkedIn

Always start from the histogram, any non-parametric density estimation methods are essentially fancier versions of a histogram.

Compare the problem of choosing and optimal size of bins in histogram with choice of h in kernel estimator

The number of bins is too small. Important features, such as mode, of this distribution are not revealed — The number of bins is too small. Important features, such as
mode, of this distribution are not revealed

Optimal number of bins (Optimal according to Sturges' rule, but the rule is besides the point) — Optimal number of bins (Optimal according to Sturges’ rule, but the rule is besides the point)

capture — The number of bins is too large. The distribution is overtted.

The point of the exercise is to reveal all features of data; and that what important to keep in mind.

The bandwidth h is too large. Local features of this distribution are not revealed — The bandwidth h is too large. Local features of this distribution
are not revealed

The bandwidth h is selected by a rule-of-thumb called normal reference bandwidth — The bandwidth h is selected by a rule-of-thumb called normal
reference bandwidth

The bandwidth h is too small. The distribution is overtted. — The bandwidth h is too small. The distribution is overfitted.

And now take a look at a perfect application of the idea in

Nissanov, Zoya, and Maria Grazia Pittau. “Measuring changes in the Russian middle class between 1992 and 2008: a nonparametric distributional analysis.” Empirical Economics 50.2 (2016): 503-530.

Comparison between income distributions in the period 1992–2008. Authors’ calculation on weighted household income data from RLMS. Kernel density estimates are obtained using adaptive bandwidth — Comparison between income distributions in the period 1992–2008. Authors’ calculation on
weighted household income data from RLMS. Kernel density estimates are obtained using adaptive bandwidth

Going back to advice: keep in mind that you doing it to reveal features of data and it has to be strictly more informative than a histogram, otherwise the computational costs are not justified.

Facebook Tweet LinkedIn

One Reply to “A practical advice on non-parametric density estimation.”

folorentorium says:

01.04.2019 at 20:21

Nice post. I was checking continuously this blog and I’m impressed! Very helpful info particularly the last part 🙂 I care for such information a lot. I was looking for this certain information for a long time. Thank you and good luck.

A practical advice on non-parametric density estimation.

Published by Sergey Alexeev

One Reply to “A practical advice on non-parametric density estimation.”

Leave a Reply Cancel reply