PostgreSQL – Select Distinct on One Column
In PostgreSQL, the SELECT DISTINCT
statement is used to eliminate duplicate rows from the query result. When you want to retrieve unique values from a specific column while including other columns in the result, PostgreSQL offers the DISTINCT ON
clause, which allows more granular control over distinct rows.
This tutorial will explain how to use DISTINCT ON
to retrieve distinct rows based on one column while maintaining specific rows for other columns. We will also demonstrate practical examples to understand its functionality.
Syntax for SELECT DISTINCT ON
SELECT DISTINCT ON (column1) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;
Explanation:
DISTINCT ON (column1)
: Ensures that only the first row for each unique value incolumn1
is returned.ORDER BY column1, column2, ...
: Determines which row is selected for each unique value incolumn1
. The first row based on the order is returned.
Unlike the SELECT DISTINCT
clause, which applies to all columns in the query, DISTINCT ON
applies to specific columns, offering more flexibility.
Example 1: Basic Usage of DISTINCT ON
Let’s create a table named orders
and insert some sample data:
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount NUMERIC(10, 2)
);
INSERT INTO orders (customer_id, order_date, total_amount)
VALUES
(1, '2024-01-01', 100.00),
(2, '2024-01-02', 200.00),
(1, '2024-01-03', 150.00),
(3, '2024-01-04', 300.00),
(2, '2024-01-05', 250.00),
(1, '2024-01-06', 180.00);
To retrieve the latest order for each customer based on the order_date
, use the following query:
SELECT DISTINCT ON (customer_id) customer_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;
Explanation: The DISTINCT ON (customer_id)
clause ensures that only one row is returned for each unique customer_id
. The ORDER BY customer_id, order_date DESC
specifies that the row with the latest order_date
for each customer is selected.
Output:
The result includes the most recent order for each customer.
Example 2: Combining DISTINCT ON with Additional Columns
To include more details about each customer’s latest order, you can add additional columns to the query:
SELECT DISTINCT ON (customer_id) customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;
Explanation: This query adds the order_id
column to the result, showing the unique order ID of the latest order for each customer. The ORDER BY
clause ensures the latest order is selected for each unique customer_id
.
Output:
The result now includes the order_id
, providing more information about the selected orders.
Example 3: Using DISTINCT ON with Filtering
You can combine DISTINCT ON
with a WHERE
clause to filter rows before applying the distinct logic. For example, retrieve the latest orders for customers who placed orders after 2024-01-02
:
SELECT DISTINCT ON (customer_id) customer_id, order_date, total_amount
FROM orders
WHERE order_date > '2024-01-02'
ORDER BY customer_id, order_date DESC;
Explanation: The WHERE
clause filters the rows to include only orders placed after 2024-01-02
. The DISTINCT ON
clause then ensures that only one row per customer is returned, based on the filtered rows.
Output:
The result excludes customers who placed orders only on or before 2024-01-02
.
Conclusion
The DISTINCT ON
clause in PostgreSQL provides a powerful way to retrieve unique rows based on a single column, while maintaining control over which rows are returned for other columns. By combining it with ORDER BY
and WHERE
, you can create complex queries to meet specific data requirements. Practice the examples above to understand how to effectively use DISTINCT ON
in PostgreSQL.