Debugging duplicated data in SQL query results
What to do when your query returns data with duplicated rows or columns.
Where is your data being duplicated?
Duplicated rows
Before you start, make sure you know the schemas of your source tables or nested queries.
- Are you missing a
GROUP BY
clause? -
Check if your source tables or nested queries have duplicated rows. You’ll need to repeat steps 3 and 4 for every table or query result that contains duplicate rows.
-- If the row_count is greater than 1, -- you have duplicated rows in your results. SELECT < your_columns >, COUNT(*) AS row_count FROM < your_table_or_upstream_query > GROUP BY < your_columns > ORDER BY row_count DESC;
- Check your table below to see how your join type interacts with your table relationships.
- Change your join type or reduce your table relationships.
Explanation
Rows can get duplicated by accident when data gets refreshed in upstream systems or ETL jobs.
Some tables have rows that look like duplicates at a glance. This is common with tables that track state changes (e.g., an order status table that adds a row every time the status changes). State tables may have rows that look exactly the same, except for the timestamp of the row. It can be difficult to detect if you have tables with a lot of columns, so be sure to run through Step 2 above or ask your database admin if you’re unsure.
If you’ve written your joins assuming a one-to-one relationship for tables that actually have a one-to-many or many-to-many relationship, you’ll get duplicated rows for each match in the “many” table.
Further reading
- Common reasons for unexpected query results
- Combining tables with joins
- Common problems with SQL joins
- What is a schema?
- Database table relationships
Join types and table relationships
This table summarizes how join types interact with table relationships to produce duplicates when matching rows are found.
A is one-to-one with B | A is one-to-many with B | A is many-to-many with B | |
---|---|---|---|
A INNER JOIN B | No duplicate rows. | No duplicate rows. | Duplicated rows from A or B. |
A LEFT JOIN B | No duplicate rows. | Possible duplicates from table B. | Duplicated rows from A or B. |
B LEFT JOIN A | No duplicate rows. | Possible duplicates from table B. | Duplicated rows from A or B. |
A OUTER JOIN B | No duplicate rows. | Possible duplicates from table B. | Duplicated rows from A or B. |
A FULL JOIN B | No duplicate rows. | Duplicate rows from table B. | Duplicated rows from A or B. |
How to reduce table relationships
If you have duplicated rows because you’re assuming a one-to-one relationship when you actually have tables that are one-to-many or many-to-many, you can remove the duplicates using:
- An INNER JOIN for a one-to-many relationship.
- A CTE with an aggregate function for a one-to-many or many-to-many relationship.
For example:
-- Assume table_a is a one-to-many with table_b.
-- The query below will duplicate rows from table_b
-- for every matching row in table_a.
SELECT
< your_columns >
FROM
table_a
LEFT JOIN table_b ON key_a = key_b;
Option 1: Use an INNER JOIN
with a one-to-many relationship
-- The query below will get one row from table_b
-- for every matching row in table_a.
SELECT
< your_columns >
FROM
table_a
INNER JOIN table_b ON key_a = key_b;
Option 2: Use a CTE to reduce the table relationship
-- The query below will get aggregated values from table_b
-- for every matching row in table_a.
WITH table_b_reduced AS (
SELECT
AGGREGATE_FUNCTION (< your_columns >)
FROM
table_b_reduced
GROUP BY
< your_columns >
)
SELECT
< your_columns >
FROM
table_a
JOIN table_b_reduced ON key_a = key_b_reduced;
Duplicated columns
- If you are joining data, check if your
SELECT
statement is including both primary key and foreign key columns. - Check if your columns are duplicated at the source by following the steps under Debugging SQL logic.
- Learn more about common reasons for unexpected query results.
Do you have a different problem?
- My result has missing data.
- My aggregations (counts, sums, etc.) are wrong.
- My dates and times are wrong.
- My data isn’t up to date.
- I have a SQL syntax error.
- I have an error message that isn’t specific to my SQL query or syntax.
Are you still stuck?
Search or ask the Metabase community.
Next: Debugging missing data in SQL query results
What to do when your query returns data that's missing rows or columns.