'display non-uniform datas with a gauss curve (a bit like kernel density estimation)
I've got this kind of non uniforme datas :
[{'time':0,'sum':0},{'time':600,'sum':2},{'time':700,'sum':4},{'time':1200,'sum':1},{'time':1300,'sum':3},{'time':1600,'sum':1},{'time':2000,'sum':0}];
"time" is on x axis and "sum" on y axis. If I make an area, I've got these shapes (curved in red, not curved in white) : https://codepen.io/kilden/pen/podadRW
But the meaning of this is wrong. I have to interpret the "missing" datas. A bit like the "kernel density estimation" charts (example here :https://bl.ocks.org/mbostock/4341954) where values are at zero when there is no data, but there is a "fall off" around the point with data. (a gaussian curve) It's hard to explain with words (and English is not my mother tongue). So I did this second codepen to show the idea of the shape. The area in red is the shape I want (White one is the reference of the first codepen) :
https://codepen.io/kilden/pen/VwrQrbo
I wonder if there is a way to make this kind of cumulative gaussian curves with a (hidden?) d3 function or a trick function ?
Solution 1:[1]
A. Your cheating yourself when you use the Epanechnikov kernel, evaluate these on a rather coarse grid and make a smooth line interpolation so that it looks gaussian. Just take a gaussian kernel to start with.
B. You're comparing apples and oranges. A kernel density estimate is an estimate of a probability density that cannot be compared to the count of observations. The integral of the kernel density estimate is always equal to 1. You can scale the estimate by the total count of observations, but even then your curve would not "stick" to the point, since the kernel spreads the observation away from the point.
C. What comes close to what you want to achieve is implemented below. Use a gaussian curve which is 1 at 0, i. e. without the normalizing factor and the rescaling by the bandwidth. The bandwidth now scales only the width of the curve but not its height. Then use your original data array and add up all these curves with the weight sum from your data array.
This will match your data points when there are no clustered observations. Naturally, when two observations are close to each other, their individual gaussian curves can add up to something bigger than each observation.
DISCLAIMER: As I already pointed out in the comments, this just produces a pretty chart and is mathematical nonsense. I strongly recommend working out the mathematics behind what it is you really want to achieve. Only then you should make a chart of your data.
const WIDTH = 600;
const HEIGHT = 150;
const BANDWIDTH = 25;
let data = [
{time: 0, sum: 0},
{time: 200, sum: 4},
{time: 250, sum: 2},
{time: 500, sum: 1},
{time: 600, sum: 2},
{time: 1500, sum: 5},
{time: 1600, sum: 4},
{time: 1800, sum: 3},
{time: 2000, sum: 0},
];
// svg
const svg = d3.select("body")
.append("svg")
.attr("width", WIDTH)
.attr("height", HEIGHT)
.style("background-color", "grey");
// scales
const x_scale = d3.scaleLinear()
.domain([0, 2000])
.range([0, WIDTH]);
const y_scale = d3.scaleLinear()
.range([HEIGHT, 0]);
// curve interpolator
const line = d3.line()
.x(d => x_scale(d.time))
.y(d => y_scale(d.sum))
.curve(d3.curveMonotoneX);
const grid = [...Array(2001).keys()];
svg.append("path")
.style("fill", "rgba(255,255,255,0.4");
// gaussian "kernel"
const gaussian = k => x => Math.exp(-0.5 * x / k * x / k);
// similar to kernel density estimate
function estimate(kernel, grid) {
return obs => grid.map(x => ({time: x, sum: d3.sum(obs, d => d.sum * kernel(x - d.time))}));
}
function render(data) {
data = data.sort((a, b) => a.time - b.time);
// make curve estimate with these kernels
const curve_estimate = estimate(gaussian(BANDWIDTH), grid)(data);
// set endpoints to zero for area plot
curve_estimate[0].sum = 0;
curve_estimate[curve_estimate.length-1].sum = 0;
y_scale.domain([0, 1.5 * Math.max(d3.max(data, d => d.sum), d3.max(curve_estimate, d => d.sum))]);
svg.select("path")
.attr("d", line(curve_estimate))
const circles = svg.selectAll("circle")
.data(data, d => d.time)
.join(
enter => enter.append("circle")
.attr("fill", "red"),
update => update.attr("fill", "white")
)
.attr("r", 2)
.attr("cx", d => x_scale(d.time))
.attr("cy", d => y_scale(d.sum));
}
render(data);
function randomData() {
data = [];
for (let i = 0; i < 10; i++) {
data.push({
time: Math.round(2000 * Math.random()),
sum: Math.round(10 * Math.random()) + 1,
});
}
render(data);
}
function addData() {
data.push({
time: Math.round(2000 * Math.random()),
sum: Math.round(10 * Math.random()) + 1,
});
render(data);
}
d3.select("#random_data").on("click", randomData);
d3.select("#add_data").on("click", addData);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/7.3.0/d3.min.js"></script>
<button id="random_data">
Random Data
</button>
<button id="add_data">
Add data point
</button>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | deristnochda |
