Core Module
12 min forge

SQL Subqueries: Correlated vs Non-Correlated

Mastering the art of nesting queries. Performance implications and the EXISTS vs IN debate.

SQL Subqueries: Correlated vs Non-Correlated

πŸ›‘οΈ What is a Subquery?

A subquery (or inner query) is a query nested inside another SQL statement. Subqueries can be used in SELECT, FROM, WHERE, and HAVING clauses.

⏰ When to Use

  • Non-Correlated: When the inner query can be executed independently of the outer query. It runs once and provides a fixed result set (e.g., getting the average salary to filter employees).
  • Correlated: When the inner query refers to a column from the outer query. It is executed repeatedly, once for each candidate row processed by the outer query.

πŸ“Š Complexity & Performance

  • Non-Correlated: Often efficiently optimized by the engine into a join or a temporary table.
  • Correlated: Can be extremely slow ($O(N \times M)$) if not optimized. Modern optimizers try to "de-correlate" these into joins, but manual refactoring is sometimes necessary.
  • EXISTS vs IN: EXISTS is usually faster for correlated subqueries because it returns as soon as a match is found (semi-join), while IN may materialize the entire subquery result.

πŸ’» Code Example: The Performance Choice

sql Standard
-- 1. Non-Correlated (Independent) -- Get employees earning more than the average SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees); -- 2. Correlated (Dependent) -- Find employees earning more than the average in THEIR OWN department SELECT e1.name, e1.salary, e1.dept_id FROM employees e1 WHERE e1.salary > ( SELECT AVG(salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id -- Reference to outer table ); -- 3. The EXISTS Pattern (Optimized) SELECT name FROM departments d WHERE EXISTS ( SELECT 1 FROM employees e WHERE e.dept_id = d.id AND e.performance = 'High' );

⚠️ Interview Pitfalls

  1. The NULL Trap with NOT IN: If a subquery returns even one NULL value, NOT IN will return zero results for the entire outer query. This is a very common "gotcha" question. Use NOT EXISTS instead.
  2. Scalar vs Multi-row: Ensure you don't use = with a subquery that returns multiple rows; use IN or ANY.
  3. Derived Table Aliasing: In many SQL dialects (like MySQL), subqueries used in the FROM clause must have an alias.
  4. Performance: Be prepared to explain how to convert a correlated subquery into a JOIN or a LATERAL JOIN for better performance.