Skip to content

Binning rules

compute_edges computes bin edges independently per dimension using one of three rules. Let \(n\) be the number of samples, \([lo, hi]\) the data range, and \(IQR\) the interquartile range.

Freedman-Diaconis (fd, default)

\[h = 2 \cdot IQR \cdot n^{-1/3}, \quad k = \left\lceil \frac{hi - lo}{h} \right\rceil\]

Robust to outliers. Recommended for most datasets. Falls back to \(\sqrt{n}\) when \(IQR = 0\).

Sturges (sturges)

\[k = \left\lceil \log_2(n) + 1 \right\rceil\]

Assumes near-Gaussian data. Gives very few bins for large \(n\) — best for small samples (\(n < 200\)).

Square root (sqrt)

\[k = \left\lceil \sqrt{n} \right\rceil\]

Fast heuristic, no distribution assumption. A reasonable default when data is uniform or unknown.


All methods cap \(k\) at max_bins (default 200) to prevent excessively fine grids on large datasets, and ensure at least 2 bins. Constant dimensions (where \(lo = hi\)) produce a single bin \([lo - 0.5, hi + 0.5]\) rather than failing.