3 Types of Ranking Problems in MySQL

/ / How To

Edited by Yifei Xu

This is a recap of a FocusKPI Analytics Leadership Forum event. Join our  LinkedIn group to learn more.


Speake Wu

Expertise: Data analytics, statistical analysis, machine learning, data engineering with Python, SAS, SQL, R, C++, Java and UNIX.

Image for post

Ranking is an important method in business world. A sales manager has to rank the sales performance by quarter or region in his report. A marketing manager has to evaluate the contribution of different channels in a campaign. A human resource manager has to assess department and employee KPI. Ranking problem is widely used as it is the most intuitive way in appraisal, and more importantly, not so tricky as it may seem.

Today we are going to simulate a sales report dataset to explore various real-world ranking problems in MySQL. It is a very simple one, having 77 observations and columns of region, time period, row id, and revenue of three sales channels. There is no null values but always remember to check it when you are working on a real case.

Image for post

Q1: How to generate a ranking column of the sum of sales within each quarter?

In simple terms, we need to complete three steps before query the data. The first is to store the sum-up revenue of three channels. Then we need to group the data by quarter. Finally we sort the data within each group and generate a ranking column.

According to the statement above, we could make a temporary table which documents revenue sum as below:

SELECT 
quarter, region, product, Sale_TV, Sale_DM, Sale_online,
RANK() OVER (PARTITION BY quarter
ORDER BY sale_tot DESC
) DM_rank_tie
FROM
(

SELECT * ,(sale_DM+sale_TV+sale_online) sale_tot
FROM sales
) a
;

Or we could just simplify it into one query:

SELECT 
quarter,region,product,Sale_TV,Sale_DM,Sale_online,
RANK() OVER (PARTITION BY quarter
ORDER BY (sale_DM+sale_TV+sale_online) desc
)
DM_rank_tie
FROM
sales
;

Image for post

One thing worth noticing is that the rank() over() function automatically does a tied ranking, that is when two or more observations have the same score, they will get the same ranking and the one following them will be ranked originally.

Q2: What is the largest/lowest/3rd largest sales of quarterly Sale_TV?

This question itself is easy. We just need to find the maximum value of TV channel in each group of quarter. But we cannot simply query the maximum value because there might be equal values. We need to find all largest sales of quarterly Sale_TV. So, we apply the join syntax to fetch all max values

SELECT
s.*
FROM
sales s
JOIN
(
select

quarter, max(sale_TV) max_TV
FROM sales
GROUP BY sales.quarter
) m
ON
m.quarter = s.quarter
AND
s.sale_tv = m.max_tv
;
Image for post

Similarly, we could call the minimum sales data.

SELECT
s.*
FROM
sales s
JOIN
(
SELECT

quarter, min(sale_TV) min_TV
FROM sales
GROUP BY sales.quarter
) m
ON
m.quarter = s.quarter
AND
s.sale_tv = m.min_tv
;
Image for post

What is the third highest sales revenue of TV channel then? This question could be tricky. We could copy the method we employed in the former question. That is to generate a ranking column and target the third positions.

SELECT 
quarter,region,product,Sale_TV,Sale_DM,Sale_online
,RANK() OVER (PARTITION BY quarter
ORDER BY sale_TV DESC
)
TV_rank_tie
FROM
sales
WHERE
TV_rank_tie = 3
;

However, you will be retrieving an error message, because the code actually is against MySQL syntax, rank() over() function is a calculated field and where clause does not allow it.

So, it naturally come to us that we could build a temporary table to store the result of rankings and select the data again from it. Usually, temporary tables would occupy lots of RAM, so please be cautious when you are dealing with real big data.

SELECT
*
FROM
(
SELECT

quarter,region,product,Sale_TV,Sale_DM,Sale_online
,RANK() OVER (PARTITION BY quarter
ORDER BY sale_TV DESC
)
TV_rank_tie
FROM
sales
)a
WHERE
TV_rank_tie = 3
;

Image for post

Q3: Please rank the table according to direct market sales revenue column.

One type of answer could be easy as we just use the rank() over() function.

SELECT
quarter,region,product,Sale_TV,Sale_DM,Sale_online
,RANK() OVER (
ORDER BY
sale_DM DESC
) DM_rank_tie
FROM
sales
;

Image for post

We would also like to introduce the variable method. It is like a loop as we set the start number as 0 (including table title) and each time the cursor moves to a new row, the row number pluses one and becomes the ranking order. You may have noticed that it is not a tied ranking, but you could try self-join and adding conditions to realize it.

SET @row_number = 0;
SELECT
(@row_number:=@row_number + 1) AS DM_rank,
a. *
FROM
(SELECT * FROM sales ORDER BY sale_dm DESC) a
;

Image for post

This script can also be combined with limit and offset. For example, if you put limit m,n at the end, MySQL will return a table with n observations starting from the m+1 in the above table.

Below is an example. You could play with it to get clear how the syntax works.

SET @row_number = 0;
SELECT
(@row_number:=@row_number + 1) AS DM_rank,
a. *
FROM
(SELECT * FROM sales ORDER BY sale_dm DESC) a
LIMIT 2,3;

Image for post

Ranking questions are very practical. Combined with line charts or bar charts, you could easily tell the trend of your object or the comparison of dimensions, depending on the layout. In addition, if you have other features associated with the object, you could make a prediction and build a model to forecast the business. Still, the statistics could go into your business intelligence dashboard to let your business growth and performance under control and help you react accordingly and rapidly.


If you are interested in this topic or have any related questions about our event and service, please reach out to NEWS@FOCUSKPI.COM.