[D] I need to interpolate some of my data and have some design decisions about where in my pipeline this should happen.
I am working with a database that is spread across 7 tables and for ML stuff I need to join them together. However, as some of these rows are not sampled as frequently as others, this leaves a lot of nulls for some features. I want to interpolate these values but I’m not sure the most efficient way to do so. In other words, let’s say I have feature X sampled every 10 ms and feature Y every 1 second, and a third feature Z sampled every 15 seconds. I could store it in the database, but I don’t know if allowing that kind of storage capacity is feasible for us. Alternatively, I could calculate it for each row when I get batches for training, but I’m afraid that will become a bottleneck depending on how fast the interpolation is. Is there some obvious way of interpolating this efficiently that I’m not thinking of that will allow me to save on memory space?
submitted by /u/zcleghern
[link] [comments]