CUME_DIST function

In this post, I would like to explain a function that is not well known. CUME_DIST analytic function is used for calculating the cumulative distribution of a data in the group of data. It returns value greater than 0 and less than or equal to 1. Let’s create our test table.

Set Up

create table t_test
id number primary key,
team varchar2(50),
point number

insert into t_test values(1, 'Real Madrid', 85);

insert into t_test values(2, 'Barcelona', 72);

insert into t_test values(3, 'Manchester United', 75);

insert into t_test values(4, 'Arsenal', 55);

insert into t_test values(5, 'Chelsea', 67);

insert into t_test values(6, 'Milan', 69);

insert into t_test values(7, 'Juventus', 59);

insert into t_test values(8, 'Bayern Munih', 69);

	cume_dist() over(order by point) pct,
    count(*) over(order by point) / count(*) over() pct2,
    count(*) over(order by point range between unbounded preceding and current row) / count(*) over() pct3
from t_test t
order by point desc, id;

As shown above, it is easy to use CUME_DIST function in order to calculate score percentile. Equivalent solutions are also shown by PCT2 and PCT3 columns. Note that in PCT3, range between is used instead of rows between. Otherwise, percentile of Milan and Bayern Munih would have been different. In addition, this also shows that the default value of analytical functions is range between. I mentioned this in my post on Analytic functions & Windowing clause.

I hope this helps 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *