Aggregate functions in SQL are super dope. When combining these functions with clauses such as `GROUP BY` and `HAVING`, we discover ways to view our data from completely new perspectives. Instead of looking at the same old endless flat table, we can use these functions to give us entirely new insights; aggregate functions help us to understand bigger-picture things. Those things might include finding outliers in datasets, or simply figuring out which employee with a family to feed should be terminated, based on some arbitrary metric such as sales numbers.

With the basics of `JOIN`s under our belts, this is when SQL starts feel really, really powerful. Our plain two-dimensional tables suddenly gain this power to be combined, aggregated, folded on to themselves, expand infinitely outward as the universe itself, and even transcend into the fourth dimension.*

*Needs citation

## Our Base Aggregation Functions

First up, let's see what we mean by "aggregate functions" anyway. These simple functions provide us with a way to mathematically quantify what exactly is in our database. Aggregate functions are performed on table columns to give us the make-up of said column. On their own, they seem quite simple:

• `AVG`: The average of a set of values in a column.
• `COUNT`: Number of rows a column contains in a specified table or view.
• `MIN`: The minimum value in a set of values.
• `MAX`: The maximum value in a set of values.
• `SUM`: The sum of values.

### DISTINCT Aggregations

A particularly useful way of using aggregate functions on their own is when we'd like to know the number of `DISTINCT` values. While aggregate values take all records into account, using `DISTINCT` limits the data returned to specifically refer to unique values. `COUNT(column_name)` will return the number of all records in a column, where `COUNT(DISTINCT column_name)` will ignore counting records where the value in the counted column is repeated.

## Using GROUP BY

The `GROUP BY` statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns.

To demonstrate how aggregate functions work moving forward, I'll be using a familiar database: the database which contains all the content for this very blog. Let's look at how we can use aggregate functions to find which authors have been posting most frequently:

``````SELECT
COUNT(title), author_id
FROM
posts
GROUP BY author_id;
``````

And the result:

Count author_id
102 1
280 5c12c3821345c22dced9f591
17 5c12c3821345c22dced9f592
5 5c12c3821345c22dced9f593
2 5c12c3821345c22dced9f594
2 5c12c3821345c22dced9f595

Oh look, a real-life data problem to solve! It seems like authors are represented in Ghost's posts table simply by their IDs. This isn't very useful. Luckily, we've already learned enough about JOINs to know we can fill in the missing information from the users table!

``````SELECT
COUNT(posts.title),
users.name
FROM
posts
ON
(posts.author_id = users.id)
GROUP BY users.id
ORDER BY COUNT(posts.title) DESC;
``````

Let's see the results this time around:

Count author_id
280 Matthew Alhonte
102 Todd Birchard
17 Max Mileaf
2 Graham Beckley
2 David Aquino

Now that's more like it! Matt is crushing the game with his Lynx Roundup series, with myself in second place. Max had respectable numbers for a moment but has presumably moved on to other hobbies, such as living his life.

For the remainder, well, I've got nothing to say other than we're hiring. We don't pay though. In fact, there's probably zero benefits to joining us.

### Conditional Grouping With "HAVING"

`HAVING` is like the `WHERE` of aggregations. We can't use `WHERE` on aggregate values, so that's why `HAVING` exists. `HAVING` can't accept any conditional value, but instead it must accept a numerical conditional derived from a `GROUP BY`. Perhaps this would be easier to visualize in a query:

``````SELECT
tags.name,
COUNT(DISTINCT posts_tags.post_id)
FROM posts_tags
LEFT JOIN tags ON tags.id = posts_tags.tag_id
LEFT JOIN posts ON posts.id = posts_tags.post_id
GROUP BY
tags.id
HAVING
COUNT(DISTINCT posts_tags.post_id) > 10
ORDER BY
COUNT(DISTINCT posts_tags.post_id)
DESC;
``````

In this scenario, we want to see which tags on our blog have the highest number of associated posts. The query is very similar to the one we made previously, only this time we have a special guest:

``````HAVING
COUNT(DISTINCT posts_tags.post_id) > 10
``````

This usage of `HAVING` only gives us tags which have ten posts or more. This should clean up our report by letting Darwinism takes its course. Here's how it worked out:

tag Count
Roundup 263
Python 80
Machine Learning 29
DevOps 28
Data Science 28
Software Development 27
Data Engineering 23
Excel 19
SQL 18
Architecture 18
REST APIs 16
#Adventures in Excel 16
Pandas 15
If we hadn't included our `HAVING` statement, this list would be much longer, filled with tags nobody cares about. Thanks to explicit omission, now we don't need to experience the dark depression that comes when confronting those sad pathetic tags. Out of sight, out of mind.