Understand Group by in Django with SQL

Originally published at https://hakibenita.com on February 11, 2020.
Read
this article on by blog with proper syntax highlighting.

Aggregation is a source of confusion in any type of ORM and Django is no different. The documentation provides a variety of examples and cheat-sheets that demonstrate how to group and aggregate data using the ORM, but I decided to approach this from a different angle.

In this article I put QuerySets and SQL side by side. If SQL is where you are most comfortable, this is the Django GROUP BY cheat-sheet for you.

Image for post
Image for post

How to Group By in Django

To demonstrate different GROUP BY queries, I will use models from Django’s built-in app.

>>> from django.contrib.auth.models import User

Django ORM produces SQL statements with long aliases. For brevity, I will show a cleaned-up, but equivalent, version of what Django executes.

How to Count Rows

Let’s count how many users we have:

SELECT
COUNT(*)
FROM
auth_user;
User.objects.count()

Counting rows is so common that Django includes a function for it right on the QuerySet. Unlike other QuerySets we’ll see next, returns a number.

How to Use Aggregate Functions

Django offers two more ways to count rows in a table.

We’ll start with aggregate:

SELECT
COUNT(id) AS id__count
FROM
auth_user;
from django.db.models import Count

User.objects.aggregate(Count('id'))

To use we imported the aggregate function . The function accepts an expression to count. In this case, we used the name of the primary key column to count all the rows in the table.

The result of is a dict:

>>> from django.db.models import Count
>>> User.objects.aggregate(Count('id'))
{"id__count": 891}

The name of the key is derived from the name of the field and the name of the aggregate. In this case, it’s . It's a good idea not to rely on this naming convention, and instead provide your own name:

SELECT
COUNT(id) as total
FROM
auth_user;
>>> from django.db.models import Count
>>> User.objects.aggregate(total=Count('id'))
{"total": 891}

The name of the argument to is also the name of the key in the resulting dictionary.

How to Group By

Using we got the result of applying the aggregate function on the entire table. This is useful, but usually we want to apply the aggregation on groups of rows.

Let’s count users by their active status:

SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
(User.objects
.values('is_active')
.annotate(total=Count('id')))

This time we used the function . To produce a GROUP BY we use a combination of and :

  • : what to group by
  • : what to aggregate

The order is important: failing to call before will not produce aggregate results.

Just like , the name of the argument to is the key in the result of the evaluated QuerySet. In this case it's .

How to Filter a QuerySet With Group By

To apply aggregation on a filtered query you can use anywhere in the query. For example, count only staff users by their active status:

SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
WHERE
is_staff = True
GROUP BY
is_active
(User.objects
.values('is_active')
.filter(is_staff=True)
.annotate(total=Count('id')))

How to Sort a QuerySet With Group By

Like filter, to sort a queryset use anywhere in the query:

SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
ORDER BY
is_active,
total
(User.objects
.values('is_active')
.annotate(total=Count('id'))
.order_by('is_staff', 'total'))

Notice that you can sort by both the GROUP BY key and the aggregate field.

How to Combine Multiple Aggregations

To produce multiple aggregations of the same group, add multiple annotations:

SELECT
is_active,
COUNT(id) AS total,
MAX(date_joined) AS last_joined
FROM
auth_user
GROUP BY
is_active
from django.db.models import Max

(User.objects
.values('is_active')
.annotate(
total=Count('id'),
last_joined=Max('date_joined'),
))

The query will produce the number of active and inactive users, and the last date a user joined in each group.

How to Group by Multiple Fields

Just like performing multiple aggregations, we might also want to group by multiple fields. For example, group by active status and staff status:

SELECT
is_active,
is_staff,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active,
is_staff
(User.objects
.values('is_active', 'is_staff')
.annotate(total=Count('id')))

The result of this query includes , and the number of users in each group.

How to Group by an Expression

Another common use case for GROUP BY is to group by an expression. For example, count the number of users that joined each year:

SELECT
EXTRACT('year' FROM date_joined),
COUNT(id) AS total
FROM
auth_user
GROUP BY
EXTRACT('year' FROM date_joined)
(User.objects
.values('date_joined__year')
.annotate(total=Count('id')))

Notice that to get the year from the date we used the special expression in the first call to . The result of the query is a dict, and the name of the key will be .

Sometimes, the built-in expressions are not enough, and you need to aggregate on a more complicated expression. For example, group by users that have logged in since they signed-up:

SELECT
last_login > date_joined AS logged_since_joined,
COUNT(id) AS total
FROM
auth_user
GROUP BY
last_login > date_joined
from django.db.models import (
ExpressionWrapper,
Q, F, BooleanField,
)

(User.objects
.annotate(
logged_since_joined=ExpressionWrapper(
Q(last_login__gt=F('date_joined')),
output_field=BooleanField(),
)
)
.values('logged_since_joined')
.annotate(total=Count('id'))
.values('logged_since_joined', 'total')

The expression here is fairly complicated. We first use to built the expression, and we mark it as a GROUP BY key by referencing the expression in the following call to . From here on, it's exactly the same.

How to Use Conditional Aggregation

Using conditional aggregation, you can aggregate only a part of the group. Conditions come in handy when you have multiple aggregates. For example, count the number of staff and non-staff users by the year they signed-up:

SELECT
EXTRACT('year' FROM date_joined),

COUNT(id) FILTER (
WHERE is_staff = True
) AS staff_users,

COUNT(id) FILTER (
WHERE is_staff = False
) AS non_staff_users

FROM
auth_user
GROUP BY
EXTRACT('year' FROM date_joined)
from django.db.models import F, Q

(User.objects
.values('date_joined__year')
.annotate(
staff_users=(
Count('id', filter=Q(is_staff=True))
),
non_staff_users=(
Count('id', filter=Q(is_staff=False))
),
))

The SQL above is from PostgreSQL, which along with SQLite is currently the only database backend that supports the syntax shortcut (formally called "selective aggregates"). For other database backends, the ORM will use instead.

I previously wrote about aggregations with filters. Check out my 9 Django tips for working with databases.

How to Use Having

The clause is used to filter on the result of an aggregate function. For example, find the years in which more than a 100 users joined:

SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
HAVING
COUNT(id) > 100
(User.objects
.annotate(year_joined=F('date_joined__year'))
.values('is_active')
.annotate(total=Count('id'))
.filter(total__gt=100))

The filter on the annotated field added an HAVING clause in the generated SQL.

How to Group by Distinct

For some aggregate functions such as , it is sometimes desirable to only count distinct occurrences. For example, how many different last names are there per user active status:

SELECT
is_active,
COUNT(id) AS total,
COUNT(DISTINCT last_name) AS unique_names
FROM
auth_user
GROUP BY
is_active
(User.objects
.values('is_active')
.annotate(
total=Count('id'),
unique_names=Count('last_name', distinct=True),
))

Notice the use of in the call to .

How to Create Expressions Using Aggregate Fields

Aggregate fields are often just the first step to a greater question. For example, what is the percent of unique last names by user active status:

SELECT
is_active,
COUNT(id) AS total,
COUNT(DISTINCT last_name) AS unique_names,
(COUNT(DISTINCT last_name)::float
/ COUNT(id)::float) AS pct_unique_names
FROM
auth_user
GROUP BY
is_active
from django.db.models import FloatField
from django.db.models.functions import Cast

(User.objects
.values('is_active')
.annotate(
total=Count('id'),
unique_names=Count('last_name', distinct=True),
)
.annotate(pct_unique_names=(
Cast('unique_names', FloatField())
/ Cast('total', FloatField())
))

The first defines the aggregate fields. The second uses the aggregate function to construct an expression.

How to Group By Across Relations

So far we’ve used only data in a single model, but aggregates are often used across relations. The simpler scenario is of a one-to-one or a foreign key relation. For example, say we have a with a one-to-one relationship to the User, and we want to count users by the type of profile:

SELECT
p.type,
COUNT(u.id) AS total
FROM
auth_user u
JOIN user_profile p ON u.id = p.user_id
GROUP BY
p.type
(User.objects
.values('user_profile__type')
.annotate(total=Count('id')))

Just like GROUP BY expressions, using relations in will group by that field. Note that the name of the user profile type in the result will be 'user_profile__type'.

How to Group By a Many to Many Relationship

A more complicated type of relation is the many to many relationship. For example, count in how many groups each user is a member:

SELECT
u.id,
COUNT(ug.group_id) AS memberships
FROM
auth_user
LEFT OUTER JOIN auth_user_groups ug ON (
u.id = ug.user_id
)
GROUP BY
u.id
(User.objects
.annotate(memberships=Count('groups'))
.values('id', 'memberships'))

A user can be a member of more than one group. To count the number of groups the user is member of we used the related name “groups” in the model. If the related name is not explicitly set (and not explicitly disabled), Django will automatically generate a name in the format . For example, .

Going Further

To dig deeper into the ORM and GROUP BY in particular, check out these links:

Originally published at https://hakibenita.com on February 11, 2020.

Written by

Full Stack Developer, Team Leader, Independent. More from me at https://hakibenita.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store