SQL Queries, Tips for Writing Complex Grouping Operators

/ / Data Science

Edited by Yifei Xu, Tatyana Frid

This is a recap of a FocusKPI Analytics at Work chat. Join our LinkedIn group to learn more.


Speaker: Jin Zhou

Jin is well versed in using marketing, political campaign and social media data to build predictive models and conduct statistical analysis for Fortune 500 companies and world-renowned organizations. Jin has developed statistical models, algorithms and web tools in direct marketing, campaign measurement, customer behavioral segmentation, multi-channel attribution, response and incremental modeling, marketing mix modeling, text mining, social media sentiment analysis, and customer life time value prediction for clients such as eBay, Staples, TWC and Shutterfly.


Joseph: Oftentimes a senior manager assigns you a task where you have to do individual transactions, you will have to aggregate them and there are several ways to do that. Jin is going to show us one of the easiest ways to do this in SQL today.

As a data analyst, writing complicated queries is one of the most important tasks. You have to know different combinations of common expressions.

For writing these queries, you often use the group by clause, in this particular example, we are going to look at what if we need to include different combinations of the group by clause in one single query: Using a simulation of real-world business data to show how you could apply complex grouping operators.

Simulation of the real-world business dataset

This table documented the transaction date, customer id, customer demographic (consumer crew), sales channel, and sales amount of every single transaction (deal). We could see each deal as customer behavior. So how do we summarize or extract information based on different aggregation requests?

1st Question: Report the total amount of sales of different consumer crews in this time period.

the total amount of sales of different consumer crew

SELECT Business_Consumer, sum(Sale_Amt) as sale_amount

FROM data

GROUP BY 1

This report is quite basic, just follow the words, and there comes the script. Note that the ‘1’ here means the first column in the output.

2nd Question: Report the total amount of sales of different consumer crews and the number of overall sales in this time period.

the total amount of sales of different consumer crew and overall

SELECT Business_Consumer, sum(Sale_Amt) as sale_amount

FROM data

GROUP BY 1

UNION

SELECT ‘Overall’, sum(sale_amt) as sale _amount

FROM data

Based on what we have done in report 1, we just need to union a line (or you can say an observation) documenting the amount of the overall sales. Note that by using ‘’ we can manually add the overall characters to the output.

SELECT Business_Consumer, sum(Sale_Amt) as sale_amount

FROM data

GROUP BY GROUPING SETS(1,0)

UNION

ORDER BY 1 nulls last

Think about this far less tedious version. This time we use the grouping sets clause. Let’s figure out how it works. So, within the parentheses, 1 still means the first column in the output, while the empty parentheses yield the overall line, basically meaning including all or summarizing all the aggregated results. However, the output will not show ‘overall’ this, but a blank instead.

3rd Question: Report the total amount of sales of different consumer crews through different sales channels and the amount of the overall sales by consumer crews and all in this time period.

the total amount of sales of different consumer crews through different sales channels and the amount of the overall sales by consumer crews and all

SELECT Business_Consumer, sum(Sale_Amt) as sale_amount

FROM data

GROUP BY GROUPING SETS((1,2),1,0)

ORDER BY 1 nulls last, 2 nulls last

This question helps us understand grouping sets clause more easily. (1,2) returns us with four unique combinations of two types of business_consumers and tow sales channels (BD, BR, CD, and CR). The single 1 refers to the group of business_consumers, which is to get the overall sum of business consumers later. And still blank parentheses mean getting all.

By using grouping sets and by using the different combinations. That gives you different results than just using group by. No need to union and there lies the efficiency of your query.

What are the unique combination result of 3 fields and each field has 2 unique values then? 8. Can you figure out the reason?

Related Question: What happens if you lose some of your observations? Like you’re missing a value for the channel? Will it include that in null last? Or will it give you a separate one for the null values of the channel? And then null last would be the aggregation of Business_Consumer field. How does it handle if you have missing data?

Answer: That would more tricky. There is definitely conflict with our grouping sets clause here. So, in that case, you need more of “special treatment”. So if you want to know about this because it’s much more detailed, I would recommend googling some grouping methods to figure out how to handle these specific situations.

The complex grouping has many practical values in the business world, marketing, sales, supply chain, as you name it. Using it in a proper way saves you time and fewer errors perhaps.


If you are interested in this topic or have any related questions about our event and service, please reach out to NEWS@FOCUSKPI.COM.