Spark has a way to compute the histogram, however, it is kept under low level and sometimes obscure classes. In this post I'll give you a function that provides you with the desired values passing a dataframe. In the official documentation the only mention to histogram is in the DoubleRDDFunctions