Debugging missing data in SQL query results
What to do when your query returns data that's missing rows or columns.
Where is your data missing?
Missing rows
Before you start, make sure you know the schemas of your source tables or nested queries.
- Check if your source tables or nested queries have missing rows.
- Check the table below to see if you are missing rows because of your join type.
-
Check your join conditions in the
ON
clause. For example:-- The join condition below will filter out -- all transactions from the Orders table -- where the product category is 'Gizmo'. SELECT * FROM orders o JOIN products p ON o.product_id = p.id AND p.category <> 'Gizmo';
-
Check if your
WHERE
clause is interacting with yourJOIN
clause. For example:-- The WHERE clause below will filter out -- all transactions from the Orders table -- where the product category is 'Gizmo'. SELECT * FROM orders o JOIN products p ON o.product_id = p.id AND p.category = 'Gizmo' WHERE p.category <> 'Gizmo'
- If you want to add rows to your query result to fill in data that is empty, zero, or
NULL
, go to How to fill in data for missing report dates.
How joins filter out unmatched rows
Join type | If join condition isn’t met |
---|---|
A INNER JOIN B | Rows filtered out from both A and B. |
A LEFT JOIN B | Rows filtered out from B. |
B LEFT JOIN A | Rows filtered out from A. |
A OUTER JOIN B | Rows filtered out from both A and B. |
A FULL JOIN B | No rows are filtered out. |
Explanation
The order of the tables in your JOIN
clause affects which rows the query returns.
For example, when you write a LEFT JOIN
, the table that comes before the LEFT JOIN
clause in your query is “on the left”. The rows from the table “on the right” (the table after the LEFT JOIN
clause) will be filtered out if they don’t meet your join condition(s) in the ON
clause.
The execution order of the query may combine your join conditions and WHERE
clauses in ways that you might not expect.
Further reading
- Common reasons for unexpected query results
- Common problems with SQL joins
- Join types
- ETLs, ELTs, and Reverse ETLs
How to fill in data for missing report dates
If your source tables or nested queries only store rows for dates where something has happened, you’ll get results with missing report dates.
For example, the Orders
table in the Sample Database only stores rows for dates where orders were created. It doesn’t store any rows for dates where there was no order activity.
-- The query below calculates the total sales
-- for each day that had at least one order.
-- For example, note that there is no row
-- in the query results for May 5, 2016.
SELECT
DATE_TRUNC('day', o.created_at)::date AS "order_created_date",
SUM(p.price) AS "total_sales"
FROM
orders o
JOIN products p ON o.product_id = p.id
WHERE
o.created_at BETWEEN'2016-05-01'::date
AND '2016-05-30'::date
GROUP BY
"order_created_date"
ORDER BY
"order_created_date" ASC;
If you want a result like the table below, you’ll need to start your JOIN
with a table or column that has all the dates (or any other sequence) you want. Ask your database admin if there’s a table you can use for this.
+--------------------+-------------+
| report_date | total_sales |
+--------------------+-------------+
| May 4, 2016 | 98.78 |
+--------------------+-------------+
| May 5, 2016 | 0.00 |
+--------------------+-------------+
| May 6, 2016 | 87.29 |
+--------------------+-------------+
| May 7, 2016 | 0.00 |
+--------------------+-------------+
| May 8, 2016 | 81.61 |
+--------------------+-------------+
If your SQL dialect supports the GENERATE_SERIES
function, you can create a temporary column that stores your report dates.
-- The query below calculates the total sales
-- for every day in the report period,
-- including days with 0 orders.
-- The date_series CTE generates one row
-- per date that you want in your final result.
WITH date_series AS (
SELECT
*
FROM
GENERATE_SERIES('2016-05-01'::date, '2020-05-30'::date, '1 day'::interval) report_date
)
-- The fact_orders CTE generates the total sales
-- for each date that had an order.
, fact_orders AS (
SELECT
DATE_TRUNC('day', o.created_at)::date AS "order_created_date",
SUM(p.price) AS "total_sales"
FROM
orders o
JOIN products p ON o.product_id = p.id
GROUP BY
"order_created_date"
ORDER BY
"order_created_date" ASC
)
-- The main query joins the two CTEs together
-- and uses the COALESCE function to fill in the dates
-- where there were no orders (i.e. a total sales value of 0).
SELECT
d.report_date,
o.order_created_date,
COALESCE(o.total_sales, 0) AS total_sales
FROM
date_series d
LEFT JOIN fact_orders o ON d.date = o.order_created_date
;
Missing columns
- If you’re joining data, check if your
SELECT
statement contains the columns you want.- Are you using the correct table aliases?
- Are you missing the table in your
FROM
clause?
- Check if your source tables or query results have missing columns by following step 1 under Debugging SQL logic.
- Learn more about common reasons for unexpected query results.
Do you have a different problem?
- My result has duplicated data.
- My aggregations (counts, sums, etc.) are wrong.
- My dates and times are wrong.
- My data isn’t up to date.
- I have a SQL syntax error.
- I have an error message that isn’t specific to my SQL query or syntax.
Are you still stuck?
Search or ask the Metabase community.