FP64 performance isn't nearly as important for compute as people make it. Most scientists use it because they don't know or care whether or not they need it.
I have access to 4 p100s, 1 1080 Ti, 1 1080 and 1 980 Ti. FP64 is nice on the p100s because I can use mixed precision for even more performance (there are 50% more ALUs if you use both FP32 and FP64) but otherwise I run in single precision.
What I find strange is that earlier cards had such strong FP64 performance to begin with. The reason we see 1 per 32 these days is to maintain compliance with libraries like OpenCL and CUDA etc.
It's got to be used in some cases where the precision is needed, as there's even higher precision formats than 64 bit, such as 128 bit decimal and even 256 bit decimal if you really want to be insanely precise.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format