PostgreSQL – Select Distinct on One Column

In PostgreSQL, the SELECT DISTINCT statement is used to eliminate duplicate rows from the query result. When you want to retrieve unique values from a specific column while including other columns in the result, PostgreSQL offers the DISTINCT ON clause, which allows more granular control over distinct rows.

This tutorial will explain how to use DISTINCT ON to retrieve distinct rows based on one column while maintaining specific rows for other columns. We will also demonstrate practical examples to understand its functionality.


Syntax for SELECT DISTINCT ON

</>
Copy
SELECT DISTINCT ON (column1) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;

Explanation:

  • DISTINCT ON (column1): Ensures that only the first row for each unique value in column1 is returned.
  • ORDER BY column1, column2, ...: Determines which row is selected for each unique value in column1. The first row based on the order is returned.

Unlike the SELECT DISTINCT clause, which applies to all columns in the query, DISTINCT ON applies to specific columns, offering more flexibility.


Example 1: Basic Usage of DISTINCT ON

Let’s create a table named orders and insert some sample data:

</>
Copy
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount NUMERIC(10, 2)
);

INSERT INTO orders (customer_id, order_date, total_amount)
VALUES 
(1, '2024-01-01', 100.00),
(2, '2024-01-02', 200.00),
(1, '2024-01-03', 150.00),
(3, '2024-01-04', 300.00),
(2, '2024-01-05', 250.00),
(1, '2024-01-06', 180.00);

To retrieve the latest order for each customer based on the order_date, use the following query:

</>
Copy
SELECT DISTINCT ON (customer_id) customer_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;

Explanation: The DISTINCT ON (customer_id) clause ensures that only one row is returned for each unique customer_id. The ORDER BY customer_id, order_date DESC specifies that the row with the latest order_date for each customer is selected.

Output:

The result includes the most recent order for each customer.


Example 2: Combining DISTINCT ON with Additional Columns

To include more details about each customer’s latest order, you can add additional columns to the query:

</>
Copy
SELECT DISTINCT ON (customer_id) customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;

Explanation: This query adds the order_id column to the result, showing the unique order ID of the latest order for each customer. The ORDER BY clause ensures the latest order is selected for each unique customer_id.

Output:

The result now includes the order_id, providing more information about the selected orders.


Example 3: Using DISTINCT ON with Filtering

You can combine DISTINCT ON with a WHERE clause to filter rows before applying the distinct logic. For example, retrieve the latest orders for customers who placed orders after 2024-01-02:

</>
Copy
SELECT DISTINCT ON (customer_id) customer_id, order_date, total_amount
FROM orders
WHERE order_date > '2024-01-02'
ORDER BY customer_id, order_date DESC;

Explanation: The WHERE clause filters the rows to include only orders placed after 2024-01-02. The DISTINCT ON clause then ensures that only one row per customer is returned, based on the filtered rows.

Output:

The result excludes customers who placed orders only on or before 2024-01-02.


Conclusion

The DISTINCT ON clause in PostgreSQL provides a powerful way to retrieve unique rows based on a single column, while maintaining control over which rows are returned for other columns. By combining it with ORDER BY and WHERE, you can create complex queries to meet specific data requirements. Practice the examples above to understand how to effectively use DISTINCT ON in PostgreSQL.