Analytic functions & Windowing clause

Analytic functions have been used from the early versions of Oracle. It is used in many tasks, especially reporting. In this article, I am going to use EMP table for my examples. If you don’t have EMP table, you can find the script from the following link.

https://livesql.oracle.com/apex/livesql/file/content_O5AEB2HE08PYEPTGCFLZU9YCV.html

You can also find more about analytic function syntax in the docs.

https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions004.htm#SQLRF06174

Analytic functions are similar to aggregate functions. However, the output of the query does not reduce the rows. In other words, unlike aggregate functions you can see the whole result set when using the analytic functions.

SELECT 
    DEPTNO, 
    AVG(SAL) 
FROM EMP GROUP BY DEPTNO;
aggregate function
SELECT 
    EMPNO, 
    ENAME, 
    JOB, 
    SAL, 
    AVG(SAL) OVER(PARTITION BY DEPTNO) AVG_DEPT_SAL 
FROM EMP;
analytic function

The most used sort functions are the followings. So, you can look at the result set below to see the difference between each other. ROW_NUMBER function is the conventional way to sort the data. RANK function is like much more gradation and as the name suggests, DENSE_RANK sorts the data without leave a blank.

SELECT 
    EMPNO, 
    ENAME, 
    JOB, 
    DEPTNO,
    SAL, 
    ROW_NUMBER() OVER(PARTITION BY DEPTNO ORDER BY SAL) RN,
    RANK() OVER(PARTITION BY DEPTNO ORDER BY SAL) RNK,
    DENSE_RANK() OVER(PARTITION BY DEPTNO ORDER BY SAL) DRNK 
FROM EMP ORDER BY DEPTNO;
ROW_NUMBER

Which function would you use if you want to get the EMPNO of the person with the highest salary? Actually, there are several ways to get this data. Assuming that only one person gets the highest salary, the following queries get the same result. However, I would prefer to use the KEEP one.

Option 1
SELECT 
    EMPNO 
FROM (
    SELECT T.*, ROW_NUMBER() OVER(ORDER BY SAL DESC) RN FROM EMP T
) WHERE RN = 1;
Option 2
SELECT 
    EMPNO 
FROM EMP 
WHERE SAL = (SELECT MAX(SAL) FROM EMP);
Option 3
SELECT 
    MAX(EMPNO) KEEP(DENSE_RANK FIRST ORDER BY SAL DESC) 
FROM EMP;

No matter which function you use MAX, MIN or AVG. In KEEP clause, you get the same result as long as you get only one row for the highest salary record.

SELECT 
    MAX(EMPNO) KEEP(DENSE_RANK FIRST ORDER BY SAL DESC) mx,
    MIN(EMPNO) KEEP(DENSE_RANK FIRST ORDER BY SAL DESC) mn,
    AVG(EMPNO) KEEP(DENSE_RANK FIRST ORDER BY SAL DESC) ag
FROM EMP;
KEEP CLAUSE

Windowing Clause

It can only be used when ORDER BY clause is present and after the ORDER BY clause. It has two basic formats with using several combinations like the following.

RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 
RANGE BETWEEN UNBOUNDED PRECEDING AND offset FOLLOWING 
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING 
RANGE BETWEEN CURRENT ROW AND CURRENT ROW
RANGE BETWEEN CURRENT ROW AND offset FOLLOWING
RANGE BETWEEN offset PRECEDING AND UNBOUNDED FOLLOWING 
RANGE BETWEEN offset PRECEDING AND CURRENT ROW 
RANGE BETWEEN offset PRECEDING AND offset FOLLOWING 
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN UNBOUNDED PRECEDING AND offset FOLLOWING
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING 
ROWS BETWEEN CURRENT ROW AND CURRENT ROW
ROWS BETWEEN CURRENT ROW AND offset FOLLOWING
ROWS BETWEEN offset PRECEDING AND UNBOUNDED FOLLOWING 
ROWS BETWEEN offset PRECEDING AND CURRENT ROW
ROWS BETWEEN offset PRECEDING AND offset FOLLOWING

The default value is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, don’t get confused with ROWS clause. RANGE and ROWS have the different meanings.

SELECT 
    EMPNO, 
    ENAME, 
    JOB, 
    DEPTNO,
    SAL, 
    AVG(SAL) OVER(ORDER BY DEPTNO RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) range_sal, --equivalent to AVG(SAL) OVER(ORDER BY DEPTNO)
    AVG(SAL) OVER(ORDER BY DEPTNO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rows_sal
FROM EMP;
Windowing clause

As can be seen in the following example, using the WINDOWING clause, the FIRST_VALUE and LAST_VALUE functions can be used instead of the LAG and LEAD functions.

SELECT 
    EMPNO, 
    ENAME, 
    JOB, 
    DEPTNO,
    SAL, 
    LAG(SAL, 1, SAL) OVER(ORDER BY SAL) lag_val,
    FIRST_VALUE(SAL) OVER(ORDER BY SAL ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) first_val,
    LEAD(SAL, 1, SAL) OVER(ORDER BY SAL) lead_val,
    LAST_VALUE(SAL) OVER(ORDER BY SAL ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) last_val
FROM EMP;
windowing clause

Lastly, I also would like to mention one of the most useful analytic function that introduced in Oracle 11g Release 2, LISTAGG function. Basically, it is used like a pivot process. It aggregates strings easily. It also order the elements in the concatenated list.

SELECT 
    JOB, 
    LISTAGG(ENAME, ', ') WITHIN GROUP(ORDER BY ENAME) ENAMES
FROM EMP GROUP BY JOB;
LISTAGG

I usually use this function when I have to specify all column names instead of using * sign in SELECT clause for a table that too many columns.

SELECT 
    LISTAGG(COLUMN_NAME, ', ') WITHIN GROUP(ORDER BY COLUMN_ID) 
FROM USER_TAB_COLS 
WHERE TABLE_NAME = 'T_TAB1' 
AND COLUMN_NAME != 'ID'; --Column names that not intended to be used in the SELECT section

NTILE Function

Basically, NTILE function divides an ordered result set into specified number of equal groups or buckets. In the following example, 10 rows result set split 4 groups. The reason why the first buckets have 3 lines is due to the incomplete ratio of the number of lines to the section. In this case, some buckets may have more lines according to the specified sort priority in the function.

SELECT 
    EMPNO, 
    ENAME, 
    NTILE(4) OVER(ORDER BY ENAME) BUCKET 
FROM EMP;

If the number of buckets specified is less than the number of rows in the result set. Then, the number of buckets will be reduced. So there is one row per bucket.

SELECT 
    EMPNO, 
    ENAME, 
    NTILE(20) OVER(ORDER BY ENAME) BUCKET 
FROM EMP;

The query below shows the employees who are in the middle of the salary scale.

SELECT * FROM (
    SELECT 
        EMPNO, 
        DEPTNO, 
        SAL, 
        NTILE(3) OVER(ORDER BY SAL) BUCKET 
    FROM EMP
) WHERE BUCKET = 2;
SELECT 
    EMPNO, 
    ENAME, 
    SAL, 
    DEPTNO, 
    NTILE(3) OVER(PARTITION BY DEPTNO ORDER BY SAL) BUCKET 
FROM EMP;

RATIO_TO_REPORT

As the name suggests, RATIO_TO_REPORT function computes the ratio of the column data according to the sum of the grouping data set. If query partition clause is omitted then the function is computed over the whole data set of the column. It doesn’t support the order or windowing clauses. So, the below query shows that the percentage rate of each employee’s salary to the sum of the all salaries.

SELECT 
    EMPNO, 
    ENAME, 
    JOB, 
    SAL, 
    ROUND(RATIO_TO_REPORT(SAL) OVER(), 2) * 100 SAL_RATIO,
    ROUND((SAL / SUM(SAL) OVER()), 2) * 100 SAL_RATIO2 --Equivalent of RATIO_TO_REPORT column
FROM EMP;
RATIO_TO_REPORT example

The following example gives the percentage of each employee’s salary by each job title.

SELECT 
    EMPNO, 
    ENAME, 
    JOB, 
    SAL, 
    ROUND(RATIO_TO_REPORT(SAL) OVER(PARTITION BY JOB), 2) * 100 SAL_RATIO
FROM EMP;
RATIO_TO_REPORT example

So, there are many analytic functions that used for different purposes. I hope this article will help you.

Posted in SQL
Leave a Reply

Your email address will not be published. Required fields are marked *