SQL Interview Questions & Answers

Master your SQL interviews with our comprehensive collection of frequently asked questions

Filter by Difficulty

1
Beginner

What is SQL and what is it used for?

Fundamentals 2 min read

SQL (Structured Query Language) is a standardized programming language used for managing and manipulating relational databases. It enables users to create, read, update, and delete data in database systems.

Key Points:

  • SQL is declarative - you specify what you want, not how to get it
  • Used across all major database systems (MySQL, PostgreSQL, Oracle, SQL Server)
  • Four main categories: DDL, DML, DCL, and TCL
  • Case-insensitive but conventionally written in uppercase
Example Query:
SELECT first_name, last_name, email
FROM employees
WHERE department = 'Engineering'
ORDER BY last_name;
2
Beginner

What is the difference between WHERE and HAVING clauses?

Filtering 3 min read

WHERE is used to filter rows before grouping, while HAVING is used to filter groups after the GROUP BY operation. WHERE cannot be used with aggregate functions, but HAVING can.

WHERE Example:
SELECT * FROM employees
WHERE salary > 50000;
HAVING Example:
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;

Remember:

  • WHERE filters individual rows
  • HAVING filters aggregated results
  • WHERE executes before GROUP BY
  • HAVING executes after GROUP BY
3
Intermediate

Explain different types of JOINs in SQL

JOINs 5 min read

JOINs are used to combine rows from two or more tables based on a related column. There are several types of JOINs in SQL:

JOIN Types:

  • INNER JOIN: Returns matching rows from both tables
  • LEFT JOIN: Returns all rows from left table and matching rows from right
  • RIGHT JOIN: Returns all rows from right table and matching rows from left
  • FULL OUTER JOIN: Returns all rows when there's a match in either table
  • CROSS JOIN: Returns Cartesian product of both tables
INNER JOIN Example:
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
LEFT JOIN Example:
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
4
Intermediate

What is a Primary Key and Foreign Key?

Constraints 4 min read

A Primary Key uniquely identifies each record in a table and cannot contain NULL values. A Foreign Key is a field that links to the Primary Key of another table, establishing relationships between tables.

Primary Key Example:
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);
Foreign Key Example:
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    order_date DATE,
    employee_id INT,
    FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);

Key Differences:

  • Primary Key: Unique identifier, cannot be NULL
  • Foreign Key: Can have duplicate values and NULL values
  • Primary Key: Only one per table
  • Foreign Key: Can have multiple per table
5
Advanced

What are Window Functions and how do they differ from GROUP BY?

Window Functions 6 min read

Window functions perform calculations across a set of rows that are related to the current row, but unlike GROUP BY, they don't collapse rows. Each row retains its identity while still accessing aggregate information.

Window Function Example:
SELECT 
    employee_name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;
Compared to GROUP BY:
SELECT 
    department,
    AVG(salary) as dept_avg_salary
FROM employees
GROUP BY department;

Key Advantages:

  • Maintains row-level detail while computing aggregates
  • Can use ranking functions (ROW_NUMBER, RANK, DENSE_RANK)
  • Supports moving averages and running totals
  • More flexible for complex analytical queries
6
Advanced

Explain the difference between UNION and UNION ALL

Set Operations 4 min read

UNION and UNION ALL are both used to combine results from multiple SELECT queries, but they differ in how they handle duplicates and performance.

UNION Example (removes duplicates):
SELECT employee_name FROM employees_ny
UNION
SELECT employee_name FROM employees_sf;
UNION ALL Example (keeps duplicates):
SELECT employee_name FROM employees_ny
UNION ALL
SELECT employee_name FROM employees_sf;

Key Differences:

  • UNION: Removes duplicate rows, sorts results (slower)
  • UNION ALL: Keeps all rows including duplicates, no sorting (faster)
  • Use UNION when you need distinct results
  • Use UNION ALL when you need all rows or know there are no duplicates
7
Expert

What are Common Table Expressions (CTEs) and when should you use them?

CTEs, Recursive Queries 7 min read

A Common Table Expression (CTE) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs make complex queries more readable and maintainable.

Simple CTE Example:
WITH high_earners AS (
    SELECT employee_id, name, salary
    FROM employees
    WHERE salary > 100000
)
SELECT * FROM high_earners
ORDER BY salary DESC;
Recursive CTE Example (for hierarchical data):
WITH RECURSIVE org_hierarchy AS (
    -- Anchor member
    SELECT employee_id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive member
    SELECT e.employee_id, e.name, e.manager_id, oh.level + 1
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy;

When to Use CTEs:

  • Breaking down complex queries into simpler parts
  • Recursive queries (organizational charts, tree structures)
  • Multiple references to the same subquery
  • Improving query readability and maintainability
  • Temporary result sets needed for multiple joins
8
Expert

What is database normalization and what are its normal forms?

Database Design 8 min read

Database normalization is the process of organizing data to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them.

Normal Forms:

  • 1NF (First Normal Form): Eliminate repeating groups, ensure atomic values
  • 2NF (Second Normal Form): Remove partial dependencies (meet 1NF + all non-key attributes depend on entire primary key)
  • 3NF (Third Normal Form): Remove transitive dependencies (meet 2NF + no non-prime attribute depends on another non-prime attribute)
  • BCNF (Boyce-Codd Normal Form): Stricter version of 3NF
  • 4NF (Fourth Normal Form): Remove multi-valued dependencies
  • 5NF (Fifth Normal Form): Remove join dependencies
Denormalized Table (Before):
CREATE TABLE employees (
    employee_id INT,
    name VARCHAR(100),
    department VARCHAR(100),
    dept_location VARCHAR(100),
    skill1 VARCHAR(50),
    skill2 VARCHAR(50),
    skill3 VARCHAR(50)
);
Normalized Tables (After 3NF):
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    name VARCHAR(100),
    location VARCHAR(100)
);

CREATE TABLE employee_skills (
    employee_id INT,
    skill VARCHAR(50),
    PRIMARY KEY (employee_id, skill)
);

Benefits of Normalization:

  • Reduces data redundancy
  • Improves data integrity
  • Simplifies data maintenance
  • Eliminates data anomalies (insert, update, delete)
9
Intermediate

What is the difference between DELETE, TRUNCATE, and DROP?

DML/DDL 4 min read

These three commands are used for removing data, but they work differently in terms of scope, speed, and reversibility.

DELETE Example:
-- Removes specific rows, can be rolled back
DELETE FROM employees WHERE department = 'HR';
-- Removes all rows but table structure remains
DELETE FROM employees;
TRUNCATE Example:
-- Removes all rows, cannot be rolled back, faster than DELETE
TRUNCATE TABLE employees;
DROP Example:
-- Removes entire table including structure
DROP TABLE employees;

Key Differences:

  • DELETE: DML command, row-by-row removal, can use WHERE, can be rolled back, slower
  • TRUNCATE: DDL command, removes all rows, faster, resets identity columns, minimal logging
  • DROP: DDL command, removes table completely including structure, fastest
  • Use DELETE when you need conditional removal or transaction rollback
  • Use TRUNCATE when you need to quickly remove all data but keep table structure
  • Use DROP when you want to remove the table entirely
10
Intermediate

What is a CROSS JOIN and when would you use it?

CROSS JOIN 3 min read

A CROSS JOIN produces the Cartesian product of two tables, meaning it combines each row from the first table with every row from the second table.

CROSS JOIN Example:
-- Create sample tables
CREATE TABLE colors (color_name VARCHAR(20));
INSERT INTO colors VALUES ('Red'), ('Green'), ('Blue');

CREATE TABLE sizes (size_name VARCHAR(20));
INSERT INTO sizes VALUES ('Small'), ('Medium'), ('Large');

-- CROSS JOIN to get all combinations
SELECT color_name, size_name
FROM colors
CROSS JOIN sizes;

Result (9 rows):

  • Red - Small
  • Red - Medium
  • Red - Large
  • Green - Small
  • Green - Medium
  • Green - Large
  • Blue - Small
  • Blue - Medium
  • Blue - Large

When to Use CROSS JOIN:

  • Generating all possible combinations (colors Γ— sizes)
  • Creating test data with all permutations
  • Mathematical operations requiring Cartesian product
  • Generating calendar or grid data
  • Warning: Can produce very large result sets (m Γ— n rows)
11
Intermediate

What are aggregate functions in SQL? List commonly used ones.

Aggregate Functions 3 min read

Aggregate functions perform calculations on multiple rows and return a single value. They are typically used with GROUP BY clauses.

Common Aggregate Functions:
SELECT 
    COUNT(*) as total_employees,
    SUM(salary) as total_salary,
    AVG(salary) as average_salary,
    MAX(salary) as highest_salary,
    MIN(salary) as lowest_salary,
    STDDEV(salary) as salary_stddev,
    VARIANCE(salary) as salary_variance
FROM employees;
With GROUP BY:
SELECT 
    department,
    COUNT(*) as emp_count,
    AVG(salary) as avg_salary,
    SUM(salary) as dept_budget
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

Commonly Used Aggregate Functions:

  • COUNT() - Counts number of rows
  • SUM() - Calculates total sum
  • AVG() - Calculates average value
  • MAX() - Finds maximum value
  • MIN() - Finds minimum value
  • GROUP_CONCAT() - Concatenates strings (MySQL)
  • STRING_AGG() - Concatenates strings (PostgreSQL)
  • STDDEV() - Calculates standard deviation
  • VAR() - Calculates variance
12
Beginner

What is NULL in SQL and how is it different from empty string or zero?

NULL Values 3 min read

NULL represents missing, unknown, or inapplicable data. It's not the same as empty string ('') or zero (0).

NULL vs Empty String vs Zero:
-- Sample data
INSERT INTO employees (id, name, phone, salary) VALUES
(1, 'John Doe', NULL, 50000),      -- phone is unknown
(2, 'Jane Smith', '', 60000),      -- phone is empty string
(3, 'Bob Wilson', '555-1234', 0);  -- phone exists, salary is 0

-- Different behaviors
SELECT * FROM employees WHERE phone IS NULL;     -- Returns John only
SELECT * FROM employees WHERE phone = '';        -- Returns Jane only
SELECT * FROM employees WHERE salary = 0;        -- Returns Bob only

Important NULL Characteristics:

  • NULL represents "unknown" or "not applicable"
  • NULL != NULL (use IS NULL or IS NOT NULL to check)
  • Arithmetic with NULL returns NULL (10 + NULL = NULL)
  • Aggregate functions ignore NULL values (except COUNT(*))
  • NULL sorts differently: usually last in ASC, first in DESC
  • Use COALESCE() or ISNULL() to handle NULL values
Handling NULL values:
-- Use COALESCE to provide default values
SELECT name, COALESCE(phone, 'No phone') as contact
FROM employees;

-- Use NULLIF to treat specific values as NULL
SELECT name, NULLIF(salary, 0) as actual_salary
FROM employees;
13
Advanced

What are correlated subqueries and how do they work?

Subqueries 5 min read

A correlated subquery is a subquery that references columns from the outer query. It executes once for each row processed by the outer query.

Correlated Subquery Example:
-- Find employees whose salary is above average in their department
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department
);
Non-Correlated Subquery (for comparison):
-- Find employees earning more than company average (executes once)
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Key Characteristics:

  • Correlated: References outer query, executes for each row
  • Non-Correlated: Independent, executes once
  • Correlated subqueries can be performance-intensive
  • Often can be rewritten as JOINs for better performance
  • Useful for row-by-row comparisons
Correlated subquery as JOIN (better performance):
-- Same query rewritten as JOIN
SELECT e.name, e.department, e.salary
FROM employees e
JOIN (
    SELECT department, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department
) dept_avg ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_salary;
14
Intermediate

What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

Window Functions 4 min read

These are window functions used for ranking rows. They differ in how they handle ties and gaps in ranking.

Comparison Example:
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,
    RANK() OVER (ORDER BY salary DESC) as rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees
WHERE department = 'Sales';
Sample Output for salaries: [100000, 90000, 90000, 80000]
Name       | Salary | ROW_NUMBER | RANK | DENSE_RANK
-----------|--------|------------|------|-----------
Alice      | 100000 |     1      |  1   |     1
Bob        |  90000 |     2      |  2   |     2
Charlie    |  90000 |     3      |  2   |     2
David      |  80000 |     4      |  4   |     3

Key Differences:

  • ROW_NUMBER(): Always unique sequential numbers, no ties
  • RANK(): Same rank for ties, leaves gaps (1, 2, 2, 4)
  • DENSE_RANK(): Same rank for ties, no gaps (1, 2, 2, 3)
  • Use ROW_NUMBER() for unique identifiers
  • Use RANK() for competition rankings with gaps
  • Use DENSE_RANK() for rankings without gaps
15
Advanced

What is a FULL OUTER JOIN and how does it differ from other joins?

FULL OUTER JOIN 4 min read

FULL OUTER JOIN returns all rows from both tables, matching them where possible and filling with NULLs where there's no match.

FULL OUTER JOIN Example:
-- Sample data
CREATE TABLE customers (id INT, name VARCHAR(50));
INSERT INTO customers VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');

CREATE TABLE orders (id INT, customer_id INT, amount DECIMAL);
INSERT INTO orders VALUES (101, 1, 100), (102, 2, 200), (103, 4, 300);

-- FULL OUTER JOIN
SELECT 
    c.name as customer_name,
    o.id as order_id,
    o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;

Result:

  • Alice - 101 - 100 (matched)
  • Bob - 102 - 200 (matched)
  • Charlie - NULL - NULL (customer with no orders)
  • NULL - 103 - 300 (order with no customer)

Comparison with Other JOINs:

  • INNER JOIN: Only matching rows from both tables
  • LEFT JOIN: All rows from left + matching rows from right
  • RIGHT JOIN: All rows from right + matching rows from left
  • FULL OUTER JOIN: All rows from both tables
  • Note: MySQL doesn't support FULL OUTER JOIN directly (use UNION of LEFT and RIGHT JOIN)
FULL OUTER JOIN in MySQL (workaround):
-- Using UNION of LEFT and RIGHT JOIN
SELECT c.name, o.id, o.amount
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
UNION
SELECT c.name, o.id, o.amount
FROM customers c
RIGHT JOIN orders o ON c.id = o.customer_id;
16
Intermediate

What is query execution plan and how do you read it?

Execution Plan 5 min read

An execution plan shows how the database will execute a query. It helps identify performance bottlenecks and optimization opportunities.

Getting Execution Plan:
-- MySQL
EXPLAIN SELECT * FROM employees WHERE department = 'Sales';

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM employees WHERE department = 'Sales';

-- SQL Server
SET SHOWPLAN_TEXT ON;
GO
SELECT * FROM employees WHERE department = 'Sales';
GO
SET SHOWPLAN_TEXT OFF;
Sample EXPLAIN Output (MySQL):
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    10.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+

Key Things to Look For:

  • Type: ALL = full table scan (bad), INDEX = using index (good)
  • Possible Keys: Indexes that could be used
  • Key: Index actually being used
  • Rows: Estimated rows examined
  • Extra: Additional info (Using filesort, Using temporary = bad)
  • Look for full table scans (type=ALL) - may need indexes
  • Check if appropriate indexes are being used
  • Watch for filesort or temporary table usage

Optimization Tips from EXPLAIN:

  • If "Using filesort" appears, consider adding ORDER BY index
  • If "Using temporary" appears, query may need rewriting
  • If "type=ALL", add appropriate WHERE clause indexes
  • If rows examined is high but filtered is low, improve WHERE conditions
17
Expert

What are database transactions and ACID properties?

Transactions 6 min read

A transaction is a logical unit of work that contains one or more SQL statements. ACID properties ensure reliable processing of database transactions.

Transaction Example:
BEGIN TRANSACTION;

-- Transfer $100 from account A to B
UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';

-- Check for errors
IF @@ERROR != 0
    ROLLBACK TRANSACTION;
ELSE
    COMMIT TRANSACTION;

ACID Properties:

  • Atomicity: All or nothing - either all operations succeed or all fail
  • Consistency: Database moves from one valid state to another
  • Isolation: Concurrent transactions don't interfere with each other
  • Durability: Once committed, changes persist even after system failure

Transaction Isolation Levels:

  • Read Uncommitted: Can read uncommitted changes (dirty reads)
  • Read Committed: Can only read committed changes (default in many DBs)
  • Repeatable Read: Same read returns same data within transaction
  • Serializable: Highest isolation, transactions execute serially
  • Lower isolation = better performance but more concurrency issues
  • Higher isolation = fewer issues but worse performance
Setting Isolation Level:
-- SQL Server
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- MySQL
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- PostgreSQL
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
18
Beginner

What is the difference between clustered and non-clustered indexes?

Indexing 4 min read

Clustered indexes determine the physical order of data in a table, while non-clustered indexes are separate structures that point to the data.

Creating Indexes:
-- Clustered index (only one per table, SQL Server syntax)
CREATE CLUSTERED INDEX idx_employees_id 
ON employees(id);

-- Non-clustered index (multiple per table)
CREATE NONCLUSTERED INDEX idx_employees_email 
ON employees(email);

CREATE NONCLUSTERED INDEX idx_employees_dept_salary 
ON employees(department, salary);

Clustered Index:

  • Determines physical storage order of rows
  • Only one per table (the table IS the clustered index)
  • Faster for range queries (data is physically ordered)
  • Primary key is often the clustered index
  • Slower for inserts (may require data reorganization)
  • Examples: Dictionary organized alphabetically

Non-Clustered Index:

  • Separate structure from table data
  • Contains index columns + pointer to data
  • Multiple can exist per table
  • Faster for exact match lookups
  • Additional storage required
  • Examples: Book index pointing to pages

When to Use Which:

  • Use clustered index for columns frequently used in range queries
  • Use clustered index for columns with sequential values
  • Use non-clustered index for columns in WHERE, JOIN, ORDER BY
  • Use non-clustered index for foreign key columns
  • Avoid too many indexes - each slows down INSERT/UPDATE/DELETE
19
Intermediate

What are string functions in SQL? Provide examples.

String Functions 4 min read

String functions manipulate text data. Different databases have similar functions but sometimes with different names.

Common String Functions:
SELECT 
    name,
    -- Length/Position functions
    LENGTH(name) as name_length,
    CHAR_LENGTH(name) as char_count,  -- Same as LENGTH for single-byte
    POSITION(' ' IN name) as space_position,
    INSTR(name, ' ') as space_pos_mysql,
    
    -- Case conversion
    UPPER(name) as upper_name,
    LOWER(name) as lower_name,
    
    -- Substring/extraction
    SUBSTRING(name, 1, 5) as first_5_chars,
    LEFT(name, 3) as first_3_chars,
    RIGHT(name, 3) as last_3_chars,
    
    -- Trimming
    TRIM(name) as trimmed_name,
    LTRIM(name) as left_trimmed,
    RTRIM(name) as right_trimmed,
    
    -- Replacement
    REPLACE(name, ' ', '_') as name_with_underscores
    
FROM employees;
More String Functions:
-- Concatenation
SELECT CONCAT(first_name, ' ', last_name) as full_name FROM employees;
-- MySQL: CONCAT(), PostgreSQL: || operator, SQL Server: + operator

-- Pattern matching
SELECT name FROM employees WHERE name LIKE 'J%';  -- Starts with J
SELECT name FROM employees WHERE name LIKE '%son'; -- Ends with son
SELECT name FROM employees WHERE name LIKE '%oh%'; -- Contains oh

-- Padding
SELECT LPAD(salary::text, 10, '0') as padded_salary FROM employees; -- Left pad
SELECT RPAD(name, 20, '.') as padded_name FROM employees; -- Right pad

-- Soundex (phonetic matching)
SELECT name FROM employees WHERE SOUNDEX(name) = SOUNDEX('Jon');

Database-Specific Functions:

  • MySQL: CONCAT(), CONCAT_WS(), GROUP_CONCAT()
  • PostgreSQL: || operator, STRING_AGG(), SPLIT_PART()
  • SQL Server: + operator, STRING_AGG(), STRING_SPLIT()
  • Oracle: || operator, LISTAGG(), REGEXP functions
  • Always check database documentation for exact function names
20
Advanced

What is database normalization and denormalization? When to use each?

Database Design 5 min read

Normalization organizes data to reduce redundancy, while denormalization intentionally adds redundancy to improve performance.

Normalized Design (3NF):
-- Separate tables for related data
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE order_items (
    order_item_id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Denormalized Design:
-- Combined table for faster reads
CREATE TABLE order_details (
    order_id INT,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    order_date DATE,
    product_name VARCHAR(100),
    quantity INT,
    unit_price DECIMAL(10,2),
    total_price DECIMAL(10,2)  -- Redundant but pre-calculated
);

When to Normalize:

  • OLTP systems (Online Transaction Processing)
  • When data integrity is critical
  • When write performance is important
  • When storage space is limited
  • When data relationships are complex

When to Denormalize:

  • OLAP systems (Online Analytical Processing)
  • Data warehouses and reporting databases
  • When read performance is critical
  • For frequently joined tables
  • For summary/reporting tables
  • When joins are too expensive

Common Denormalization Techniques:

  • Adding calculated columns (total_price = quantity * unit_price)
  • Flattening hierarchies into single table
  • Creating summary tables with pre-aggregated data
  • Adding redundant foreign key data
  • Creating materialized views
  • Using NoSQL databases for specific use cases
21
Intermediate

What is the difference between IN and EXISTS operators?

Subqueries 4 min read

Both IN and EXISTS are used for subqueries, but they work differently in terms of performance and use cases.

IN operator example:
-- Find employees in specific departments
SELECT name, department
FROM employees
WHERE department IN ('Sales', 'Marketing', 'Engineering');

-- Using subquery with IN
SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
);
EXISTS operator example:
-- Find customers who have placed orders
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id
);

-- Using NOT EXISTS to find customers without orders
SELECT customer_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id
);

Key Differences:

  • IN: Checks if value matches any value in a list/subquery
  • EXISTS: Checks if subquery returns any rows (returns boolean)
  • Performance: EXISTS often faster with large datasets
  • NULL handling: IN returns FALSE if any NULL in list, EXISTS ignores NULLs
  • Use IN for static lists or small result sets
  • Use EXISTS for correlated subqueries or when checking for existence
  • Use NOT EXISTS instead of NOT IN (better NULL handling)
22
Intermediate

What are date and time functions in SQL? Provide examples.

Date Functions 5 min read

Date and time functions allow manipulation and extraction of temporal data. Different databases have similar but sometimes different syntax.

Common Date Functions:
-- Current date/time
SELECT 
    CURRENT_DATE as today_date,          -- 2025-12-02
    CURRENT_TIMESTAMP as now,            -- 2025-12-02 14:30:45
    NOW() as mysql_now,                  -- MySQL specific
    GETDATE() as sqlserver_now,          -- SQL Server specific
    SYSDATE as oracle_now;               -- Oracle specific

-- Date extraction
SELECT 
    EXTRACT(YEAR FROM order_date) as order_year,
    EXTRACT(MONTH FROM order_date) as order_month,
    EXTRACT(DAY FROM order_date) as order_day,
    DATE_PART('dow', order_date) as day_of_week,  -- 0=Sunday
    DAY(order_date) as day_mysql,                 -- MySQL
    DATEPART(month, order_date) as month_sqlserver; -- SQL Server
Date Arithmetic and Formatting:
-- Date arithmetic
SELECT 
    order_date,
    order_date + INTERVAL '7 days' as due_date,      -- PostgreSQL/MySQL
    DATEADD(day, 7, order_date) as due_date_sqlserver, -- SQL Server
    order_date - INTERVAL '1 month' as last_month,
    DATEDIFF('day', order_date, CURRENT_DATE) as days_ago, -- MySQL
    DATE_DIFF(CURRENT_DATE, order_date, DAY) as days_diff; -- BigQuery

-- Date formatting
SELECT 
    TO_CHAR(order_date, 'YYYY-MM-DD') as iso_format,
    TO_CHAR(order_date, 'Month DD, YYYY') as long_format,
    DATE_FORMAT(order_date, '%W, %M %e, %Y') as mysql_format,
    FORMAT(order_date, 'dd/MM/yyyy') as sqlserver_format;

-- Date validation
SELECT 
    order_date,
    order_date::date as date_only,  -- Cast to date
    ISDATE(order_date_str) as is_valid_date;  -- SQL Server

Practical Examples:

-- Find orders from last 30 days
SELECT * FROM orders 
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

-- Group by month-year
SELECT 
    TO_CHAR(order_date, 'YYYY-MM') as month,
    COUNT(*) as order_count,
    SUM(amount) as total_amount
FROM orders
GROUP BY TO_CHAR(order_date, 'YYYY-MM')
ORDER BY month;

-- Calculate age
SELECT 
    name,
    birth_date,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date)) as age
FROM employees;
23
Expert

What are window functions with PARTITION BY and ORDER BY?

Window Functions 6 min read

Window functions perform calculations across a set of rows related to the current row. PARTITION BY divides rows into groups, ORDER BY determines order within partitions.

Basic Window Function Syntax:
SELECT 
    employee_id,
    name,
    department,
    salary,
    -- Running total within department
    SUM(salary) OVER (
        PARTITION BY department 
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total,
    
    -- Rank within department
    RANK() OVER (
        PARTITION BY department 
        ORDER BY salary DESC
    ) as dept_salary_rank,
    
    -- Compare to department average
    salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg,
    
    -- Moving average (last 3 months)
    AVG(sales) OVER (
        PARTITION BY salesperson_id 
        ORDER BY month_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg_3mo
FROM employees;

Window Frame Clauses:

  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Running total
  • ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING: 4-row window
  • RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW: 7-day window
  • ROWS UNBOUNDED PRECEDING: All previous rows in partition
  • ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: All following rows
Practical Use Cases:
-- Find top 3 earners in each department
WITH ranked_employees AS (
    SELECT 
        name,
        department,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as rank
    FROM employees
)
SELECT * FROM ranked_employees WHERE rank <= 3;

-- Calculate month-over-month growth
SELECT 
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) as prev_month_rev,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) as growth,
    (revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0 / 
    LAG(revenue, 1) OVER (ORDER BY month) as growth_pct
FROM monthly_sales;
24
Advanced

What is database locking and what are different lock types?

Concurrency 5 min read

Database locking prevents multiple transactions from modifying the same data simultaneously, ensuring data consistency.

Types of Locks:

  • Shared Lock (Read Lock): Multiple transactions can read, but none can write
  • Exclusive Lock (Write Lock): Only one transaction can read/write
  • Intent Lock: Indicates intention to lock at a finer granularity
  • Schema Lock: Lock on table structure (DDL operations)
  • Update Lock: Intermediate lock before upgrading to exclusive
Locking Examples:
-- SQL Server explicit locking
BEGIN TRANSACTION;
-- Get exclusive lock on table
SELECT * FROM employees WITH (TABLOCKX);
-- Or row-level lock
SELECT * FROM employees WITH (ROWLOCK, XLOCK) WHERE id = 1;
COMMIT;

-- MySQL locking
START TRANSACTION;
-- Lock tables explicitly
LOCK TABLES employees WRITE, departments READ;
-- Perform operations
UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering';
-- Release locks
UNLOCK TABLES;
COMMIT;

-- PostgreSQL advisory locks
BEGIN;
SELECT pg_advisory_lock(123);  -- Application-level lock
-- Critical section
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
SELECT pg_advisory_unlock(123);
COMMIT;

Lock Granularity:

  • Row-level locks: Most granular, best concurrency
  • Page-level locks: Locks database pages (8KB chunks)
  • Table-level locks: Locks entire table, poor concurrency
  • Database-level locks: Locks entire database

Common Locking Issues:

  • Deadlock: Two transactions waiting for each other
  • Blocking: One transaction blocking others
  • Lock escalation: Many row locks converted to table lock
  • Lock timeout: Transaction waiting too long for lock
  • Solution: Keep transactions short, access tables in same order, use appropriate isolation levels
25
Beginner

What are SQL constraints and what types exist?

Constraints 4 min read

Constraints enforce rules on data columns to maintain data integrity and accuracy.

Common Constraints:
CREATE TABLE employees (
    -- NOT NULL: Column cannot contain NULL
    id INT NOT NULL,
    
    -- UNIQUE: All values must be different
    email VARCHAR(100) UNIQUE,
    
    -- PRIMARY KEY: Unique identifier for table
    employee_id INT PRIMARY KEY,
    
    -- FOREIGN KEY: Links to another table
    department_id INT,
    CONSTRAINT fk_department 
        FOREIGN KEY (department_id) 
        REFERENCES departments(id),
    
    -- CHECK: Validates data based on condition
    salary DECIMAL(10,2) CHECK (salary >= 0),
    age INT CHECK (age >= 18 AND age <= 65),
    
    -- DEFAULT: Sets default value if not specified
    hire_date DATE DEFAULT CURRENT_DATE,
    status VARCHAR(20) DEFAULT 'active'
);
Adding Constraints After Table Creation:
-- Add NOT NULL constraint
ALTER TABLE employees MODIFY COLUMN name VARCHAR(100) NOT NULL;

-- Add UNIQUE constraint
ALTER TABLE employees ADD CONSTRAINT uc_email UNIQUE (email);

-- Add CHECK constraint
ALTER TABLE employees ADD CONSTRAINT chk_salary 
    CHECK (salary >= 0);

-- Add FOREIGN KEY constraint
ALTER TABLE employees ADD CONSTRAINT fk_dept
    FOREIGN KEY (department_id) REFERENCES departments(id);

-- Add DEFAULT constraint
ALTER TABLE employees ALTER COLUMN status 
    SET DEFAULT 'active';

Constraint Types:

  • NOT NULL: Column must have a value
  • UNIQUE: All values in column must be different
  • PRIMARY KEY: Unique + NOT NULL, identifies each row
  • FOREIGN KEY: References PRIMARY KEY in another table
  • CHECK: Validates data based on expression
  • DEFAULT: Provides default value for column
  • COMPOSITE PRIMARY KEY: Primary key using multiple columns
  • CASCADE constraints: Automatic updates/deletes on related tables
26
Intermediate

What is the CASE statement and how is it used?

Conditional Logic 3 min read

The CASE statement provides conditional logic in SQL queries, similar to if-else statements in programming languages.

Simple CASE (value comparison):
SELECT 
    name,
    department,
    CASE department
        WHEN 'Sales' THEN 'Revenue Department'
        WHEN 'Marketing' THEN 'Brand Department'
        WHEN 'Engineering' THEN 'Tech Department'
        ELSE 'Other Department'
    END as dept_category,
    
    CASE 
        WHEN salary >= 100000 THEN 'High'
        WHEN salary >= 60000 THEN 'Medium'
        WHEN salary >= 30000 THEN 'Low'
        ELSE 'Very Low'
    END as salary_grade
FROM employees;
Searched CASE (condition evaluation):
SELECT 
    order_id,
    amount,
    CASE 
        WHEN amount > 1000 THEN 'Large Order'
        WHEN amount > 500 THEN 'Medium Order'
        WHEN amount > 100 THEN 'Small Order'
        ELSE 'Mini Order'
    END as order_size,
    
    CASE 
        WHEN DATEDIFF(day, order_date, shipped_date) > 7 THEN 'Delayed'
        WHEN DATEDIFF(day, order_date, shipped_date) <= 2 THEN 'Express'
        ELSE 'Standard'
    END as shipping_status
FROM orders;
CASE in Aggregate Functions:
-- Count by category
SELECT 
    COUNT(CASE WHEN department = 'Sales' THEN 1 END) as sales_count,
    COUNT(CASE WHEN department = 'Engineering' THEN 1 END) as eng_count,
    COUNT(CASE WHEN salary > 80000 THEN 1 END) as high_earners,
    SUM(CASE WHEN status = 'active' THEN salary END) as active_payroll,
    AVG(CASE WHEN department = 'Marketing' THEN salary END) as marketing_avg
FROM employees;

-- PIVOT-like functionality
SELECT 
    EXTRACT(YEAR FROM hire_date) as hire_year,
    COUNT(CASE WHEN department = 'Sales' THEN 1 END) as sales,
    COUNT(CASE WHEN department = 'Engineering' THEN 1 END) as engineering,
    COUNT(CASE WHEN department = 'Marketing' THEN 1 END) as marketing
FROM employees
GROUP BY EXTRACT(YEAR FROM hire_date);
27
Expert

What are stored procedures and functions? What's the difference?

Stored Procedures 5 min read

Stored procedures and functions are database objects containing SQL statements, but they serve different purposes and have different capabilities.

Stored Procedure Example:
-- Create stored procedure
CREATE OR REPLACE PROCEDURE update_salary(
    p_employee_id INT,
    p_percentage DECIMAL(5,2)
)
LANGUAGE plpgsql  -- PostgreSQL
AS $$
BEGIN
    UPDATE employees 
    SET salary = salary * (1 + p_percentage/100)
    WHERE employee_id = p_employee_id;
    
    IF NOT FOUND THEN
        RAISE NOTICE 'Employee % not found', p_employee_id;
    END IF;
    
    COMMIT;
END;
$$;

-- Call stored procedure
CALL update_salary(123, 10.0);  -- Give 10% raise
Function Example:
-- Create function that returns a value
CREATE OR REPLACE FUNCTION calculate_bonus(
    p_salary DECIMAL,
    p_performance_rating INT
) 
RETURNS DECIMAL
LANGUAGE plpgsql
AS $$
DECLARE
    v_bonus DECIMAL;
BEGIN
    CASE p_performance_rating
        WHEN 1 THEN v_bonus := p_salary * 0.20;  -- 20%
        WHEN 2 THEN v_bonus := p_salary * 0.15;  -- 15%
        WHEN 3 THEN v_bonus := p_salary * 0.10;  -- 10%
        ELSE v_bonus := p_salary * 0.05;         -- 5%
    END CASE;
    
    RETURN v_bonus;
END;
$$;

-- Use function in query
SELECT 
    name,
    salary,
    calculate_bonus(salary, performance_rating) as bonus
FROM employees;

Key Differences:

  • Stored Procedure: Can perform actions, doesn't return value (but can have OUT parameters)
  • Function: Must return a value, can be used in SELECT statements
  • Transaction control: Procedures can have COMMIT/ROLLBACK, functions cannot
  • DML operations: Functions may have restrictions on DML in some databases
  • Calling: Procedures called with CALL/EXEC, functions used in expressions
  • Performance: Both can be precompiled for better performance
28
Advanced

What is query caching and how does it work?

Performance 4 min read

Query caching stores the results of SQL queries so subsequent identical queries can be served faster without re-execution.

How Query Caching Works:

  • Database parses and optimizes query
  • Checks cache for identical query (exact text match)
  • If cached, returns result immediately
  • If not cached, executes query and stores result in cache
  • Cache invalidated when underlying data changes
MySQL Query Cache Configuration:
-- Check query cache status
SHOW VARIABLES LIKE 'query_cache%';

-- Typical configuration in my.cnf
[mysqld]
query_cache_type = 1              -- Enable cache
query_cache_size = 64M           -- Cache size
query_cache_limit = 2M           -- Max result size to cache
query_cache_min_res_unit = 4K    -- Minimum allocation unit

-- Check cache performance
SHOW STATUS LIKE 'Qcache%';

-- Force query to use cache (MySQL 8+ removed query cache)
SELECT SQL_CACHE * FROM employees WHERE department = 'Sales';

-- Force query to NOT use cache
SELECT SQL_NO_CACHE * FROM employees WHERE department = 'Sales';
PostgreSQL (no built-in query cache):
-- PostgreSQL relies on shared buffers and OS cache
-- Check buffer cache hit ratio
SELECT 
    sum(blks_hit) * 100.0 / (sum(blks_hit) + sum(blks_read)) as cache_hit_ratio
FROM pg_stat_database;

-- Use pg_prewarm to load data into cache
CREATE EXTENSION pg_prewarm;
SELECT pg_prewarm('employees');

-- Materialized views as manual cache
CREATE MATERIALIZED VIEW mv_employee_stats AS
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department;

-- Refresh cache periodically
REFRESH MATERIALIZED VIEW mv_employee_stats;

When Query Caching is Effective:

  • Read-heavy applications with repetitive queries
  • Queries with complex joins or aggregations
  • Data that changes infrequently
  • Small to medium result sets
  • Applications with many identical queries

When Query Caching is Ineffective:

  • Write-heavy applications (frequent cache invalidation)
  • Queries with NOW(), RANDOM(), CURRENT_TIMESTAMP
  • Large result sets that consume cache memory
  • Highly dynamic data
  • Queries with user-specific parameters
29
Intermediate

What is the difference between INNER JOIN and OUTER JOIN?

JOIN Types 4 min read

INNER JOIN returns only matching rows from both tables, while OUTER JOIN returns all rows from one or both tables with NULLs for non-matching rows.

Sample Data:
-- Customers table
customer_id | name
----------- | ----------
1           | Alice
2           | Bob
3           | Charlie

-- Orders table
order_id | customer_id | amount
-------- | ----------- | ------
101      | 1          | 100
102      | 2          | 200
103      | 4          | 300   -- No matching customer
INNER JOIN (only matches):
SELECT c.name, o.order_id, o.amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

-- Result (2 rows):
-- Alice - 101 - 100
-- Bob   - 102 - 200
LEFT OUTER JOIN (all customers + orders):
SELECT c.name, o.order_id, o.amount
FROM customers c
LEFT OUTER JOIN orders o ON c.customer_id = o.customer_id;

-- Result (3 rows):
-- Alice    - 101 - 100
-- Bob      - 102 - 200
-- Charlie  - NULL - NULL
RIGHT OUTER JOIN (all orders + customers):
SELECT c.name, o.order_id, o.amount
FROM customers c
RIGHT OUTER JOIN orders o ON c.customer_id = o.customer_id;

-- Result (3 rows):
-- Alice  - 101 - 100
-- Bob    - 102 - 200
-- NULL   - 103 - 300
FULL OUTER JOIN (all from both tables):
SELECT c.name, o.order_id, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

-- Result (4 rows):
-- Alice    - 101 - 100
-- Bob      - 102 - 200
-- Charlie  - NULL - NULL
-- NULL     - 103 - 300

Venn Diagram Visualization:

  • INNER JOIN: Intersection of both circles
  • LEFT JOIN: Entire left circle + intersection
  • RIGHT JOIN: Entire right circle + intersection
  • FULL JOIN: Both entire circles
  • OUTER JOIN is optional keyword (LEFT JOIN = LEFT OUTER JOIN)
  • MySQL doesn't support FULL OUTER JOIN (use UNION of LEFT and RIGHT)
30
Expert

What are database triggers and when should they be used?

Triggers 5 min read

Triggers are database objects that automatically execute in response to specific events (INSERT, UPDATE, DELETE) on tables.

Basic Trigger Syntax:
-- PostgreSQL trigger example
CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.modified_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_employees_modified
    BEFORE UPDATE ON employees
    FOR EACH ROW
    EXECUTE FUNCTION update_modified_column();
MySQL Trigger Example:
-- Audit trail trigger
DELIMITER $$
CREATE TRIGGER audit_employee_changes
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
    IF OLD.salary != NEW.salary THEN
        INSERT INTO salary_audit (
            employee_id, 
            old_salary, 
            new_salary, 
            changed_by, 
            change_date
        ) VALUES (
            NEW.employee_id,
            OLD.salary,
            NEW.salary,
            USER(),
            NOW()
        );
    END IF;
END$$
DELIMITER ;
SQL Server Trigger Example:
-- Instead of trigger
CREATE TRIGGER trg_instead_of_delete
ON employees
INSTEAD OF DELETE
AS
BEGIN
    -- Archive instead of delete
    INSERT INTO employees_archive
    SELECT *, GETDATE(), SYSTEM_USER
    FROM deleted;
    
    -- Then delete from main table
    DELETE e
    FROM employees e
    INNER JOIN deleted d ON e.employee_id = d.employee_id;
END;

Trigger Types:

  • BEFORE trigger: Executes before the event
  • AFTER trigger: Executes after the event
  • INSTEAD OF trigger: Replaces the event action
  • ROW-level trigger: Executes for each affected row
  • STATEMENT-level trigger: Executes once per statement

When to Use Triggers:

  • Audit logging and change tracking
  • Data validation/complex constraints
  • Maintaining derived/calculated columns
  • Enforcing business rules
  • Synchronizing related tables
  • Implementing soft deletes

When to Avoid Triggers:

  • Complex business logic (use application code instead)
  • Performance-critical operations
  • Cascading triggers (hard to debug)
  • When same logic can be in stored procedure
  • For simple data validation (use CHECK constraints)
31
Advanced

What is database partitioning and what are its benefits?

Partitioning 5 min read

Partitioning divides a large table into smaller, more manageable pieces while maintaining logical unity. Each partition can be stored and accessed separately.

Range Partitioning Example (PostgreSQL):
-- Create partitioned table
CREATE TABLE sales (
    sale_id SERIAL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2),
    region VARCHAR(50)
) PARTITION BY RANGE (sale_date);

-- Create partitions for each year
CREATE TABLE sales_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE sales_2024 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE sales_2025 PARTITION OF sales
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Create default partition
CREATE TABLE sales_default PARTITION OF sales DEFAULT;
List Partitioning Example (MySQL):
-- Partition by region
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    region VARCHAR(50),
    email VARCHAR(100)
)
PARTITION BY LIST COLUMNS(region) (
    PARTITION p_north VALUES IN ('NY', 'NJ', 'CT'),
    PARTITION p_south VALUES IN ('TX', 'FL', 'GA'),
    PARTITION p_west VALUES IN ('CA', 'OR', 'WA'),
    PARTITION p_other VALUES IN (DEFAULT)
);

Partitioning Strategies:

  • Range Partitioning: Based on value ranges (dates, numbers)
  • List Partitioning: Based on discrete values (regions, status)
  • Hash Partitioning: Based on hash function (even distribution)
  • Composite Partitioning: Combination of methods

Benefits:

  • Improved Performance: Query only relevant partitions
  • Easier Maintenance: Manage partitions independently
  • Better Availability: Failure affects only one partition
  • Efficient Archiving: Drop/archive old partitions easily
  • Parallel Processing: Partitions can be processed in parallel
32
Expert

What are Common Table Expressions (CTEs) vs Temporary Tables?

CTEs 4 min read

CTEs and temporary tables both help organize complex queries but differ in scope, performance, and use cases.

CTE Example:
-- Simple CTE
WITH high_earners AS (
    SELECT employee_id, name, salary
    FROM employees
    WHERE salary > 100000
),
department_stats AS (
    SELECT 
        department,
        COUNT(*) as emp_count,
        AVG(salary) as avg_salary
    FROM employees
    GROUP BY department
)
SELECT 
    h.name,
    h.salary,
    d.department,
    d.avg_salary
FROM high_earners h
JOIN employees e ON h.employee_id = e.employee_id
JOIN department_stats d ON e.department = d.department;
Temporary Table Example:
-- Create temporary table
CREATE TEMPORARY TABLE temp_high_earners AS
SELECT employee_id, name, salary
FROM employees
WHERE salary > 100000;

-- Add index for performance
CREATE INDEX idx_temp_earners ON temp_high_earners(employee_id);

-- Use in multiple queries
SELECT * FROM temp_high_earners;

-- Complex operation on temp table
UPDATE temp_high_earners 
SET salary = salary * 1.1 
WHERE employee_id IN (
    SELECT employee_id FROM employees WHERE performance_rating = 'A'
);

-- Clean up (optional, auto-dropped at session end)
DROP TABLE temp_high_earners;

Key Differences:

  • Scope: CTEs are query-scoped, temp tables are session-scoped
  • Performance: CTEs may be optimized away, temp tables have physical storage
  • Indexes: Temp tables can have indexes, CTEs cannot
  • Reusability: Temp tables can be reused in multiple queries
  • Storage: CTEs use memory, temp tables use tempdb/disk
  • Use CTEs for readability and one-time use
  • Use temp tables for intermediate results in complex workflows
33
Intermediate

What is the difference between UNION and JOIN?

Set Operations 3 min read

UNION combines rows from multiple queries vertically, while JOIN combines columns from multiple tables horizontally.

UNION Example (vertical combination):
-- Combine employees from different cities
SELECT name, 'New York' as location FROM employees_ny
UNION ALL
SELECT name, 'San Francisco' as location FROM employees_sf
UNION ALL
SELECT name, 'Chicago' as location FROM employees_chi;

-- UNION vs UNION ALL
SELECT product_id FROM current_products
UNION  -- Removes duplicates
SELECT product_id FROM discontinued_products;

SELECT product_id FROM current_products
UNION ALL  -- Keeps duplicates
SELECT product_id FROM popular_products;
JOIN Example (horizontal combination):
-- Combine employee info with department info
SELECT 
    e.name,
    e.salary,
    d.department_name,
    m.name as manager_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
LEFT JOIN employees m ON e.manager_id = m.employee_id;

-- Multiple JOIN types
SELECT 
    c.customer_name,
    o.order_date,
    p.product_name,
    oi.quantity
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;

Key Differences:

  • UNION: Stacks rows, requires same columns, combines result sets
  • JOIN: Merges columns, requires relationship, combines tables
  • Result Shape: UNION adds rows, JOIN adds columns
  • Use Case: UNION for similar data from different sources, JOIN for related data
  • Performance: UNION sorts to remove duplicates, JOIN uses indexes
34
Intermediate

What are PIVOT and UNPIVOT operations in SQL?

Pivoting 4 min read

PIVOT rotates rows to columns (transposing), while UNPIVOT rotates columns to rows. Useful for reporting and data transformation.

PIVOT Example (SQL Server):
-- Sample sales data
SELECT * FROM monthly_sales;

-- year | month | amount
-- 2024 | Jan   | 1000
-- 2024 | Feb   | 1500
-- 2024 | Mar   | 1200
-- 2025 | Jan   | 1100
-- 2025 | Feb   | 1600

-- Pivot to show months as columns
SELECT *
FROM monthly_sales
PIVOT (
    SUM(amount)
    FOR month IN ([Jan], [Feb], [Mar], [Apr])
) AS pivoted_sales;

-- Result:
-- year | Jan  | Feb  | Mar  | Apr
-- 2024 | 1000 | 1500 | 1200 | NULL
-- 2025 | 1100 | 1600 | NULL | NULL
Manual PIVOT (without PIVOT operator):
-- Using CASE statements
SELECT 
    year,
    SUM(CASE WHEN month = 'Jan' THEN amount END) as Jan,
    SUM(CASE WHEN month = 'Feb' THEN amount END) as Feb,
    SUM(CASE WHEN month = 'Mar' THEN amount END) as Mar,
    SUM(CASE WHEN month = 'Apr' THEN amount END) as Apr
FROM monthly_sales
GROUP BY year;
UNPIVOT Example:
-- Sample pivoted data
CREATE TABLE sales_pivot (
    year INT,
    Q1 DECIMAL,
    Q2 DECIMAL,
    Q3 DECIMAL,
    Q4 DECIMAL
);

-- Unpivot quarterly data
SELECT year, quarter, amount
FROM sales_pivot
UNPIVOT (
    amount FOR quarter IN (Q1, Q2, Q3, Q4)
) AS unpivoted_sales;

-- Result:
-- year | quarter | amount
-- 2024 | Q1      | 1000
-- 2024 | Q2      | 1500
-- 2024 | Q3      | 1200
-- 2024 | Q4      | 1800

Use Cases:

  • PIVOT: Create cross-tab reports, summarize data by categories
  • UNPIVOT: Normalize denormalized data, prepare data for ETL
  • Reporting: Convert row-based data to columnar format
  • Data Analysis: Compare metrics across different dimensions
35
Advanced

What are covering indexes and how do they improve performance?

Indexing 4 min read

A covering index includes all columns needed for a query, allowing the database to satisfy the query entirely from the index without accessing the table.

Creating Covering Indexes:
-- Query needing multiple columns
SELECT employee_id, name, department, salary
FROM employees
WHERE department = 'Engineering'
AND hire_date > '2023-01-01';

-- Non-covering index (requires table lookup)
CREATE INDEX idx_dept_hire ON employees(department, hire_date);

-- Covering index (includes all needed columns)
CREATE INDEX idx_covering ON employees(department, hire_date)
INCLUDE (employee_id, name, salary);

-- Or include columns in index key
CREATE INDEX idx_covering_complete ON employees(
    department, 
    hire_date, 
    employee_id, 
    name, 
    salary
);
How It Works:
-- Without covering index:
1. Use idx_dept_hire to find matching rows
2. For each match, read table (heap/clustered index) to get employee_id, name, salary
3. Return results

-- With covering index:
1. Use idx_covering to find matching rows
2. All needed columns are in index
3. Return results directly from index
4. No table access needed!

Benefits:

  • Eliminates table lookups: Biggest performance gain
  • Reduces I/O: Index only vs index + table
  • Improves cache efficiency: More rows fit in memory
  • Faster queries: Especially for range scans

When to Use Covering Indexes:

  • Frequently executed queries with specific column sets
  • Queries that return many rows (reduces many table lookups)
  • Reporting queries with aggregations
  • When table is large and scattered (high fragmentation)
  • For read-heavy tables where writes are infrequent

Trade-offs:

  • Larger index size (more columns = more storage)
  • Slower INSERT/UPDATE/DELETE (more indexes to update)
  • Increased maintenance overhead
  • May not help if query patterns change
36
Expert

What are database views with CHECK OPTION?

Views 4 min read

CHECK OPTION ensures that data inserted or updated through a view satisfies the view's defining condition, maintaining data integrity.

View with CHECK OPTION Example:
-- Create view for active employees
CREATE VIEW active_employees AS
SELECT employee_id, name, department, salary, hire_date
FROM employees
WHERE status = 'active'
WITH CHECK OPTION;

-- This will succeed (status = 'active')
INSERT INTO active_employees (name, department, salary, status)
VALUES ('John Doe', 'Engineering', 80000, 'active');

-- This will FAIL due to CHECK OPTION
INSERT INTO active_employees (name, department, salary, status)
VALUES ('Jane Smith', 'Sales', 70000, 'inactive');
-- Error: new row violates check option for view "active_employees"

-- Update that maintains view condition (succeeds)
UPDATE active_employees 
SET salary = 85000 
WHERE employee_id = 123;

-- Update that violates view condition (fails)
UPDATE active_employees 
SET status = 'inactive' 
WHERE employee_id = 123;
-- Error: new row violates check option
CASCADED vs LOCAL CHECK OPTION:
-- Base view
CREATE VIEW ny_employees AS
SELECT * FROM employees WHERE location = 'NY'
WITH CHECK OPTION;

-- View based on another view
CREATE VIEW ny_engineers AS
SELECT * FROM ny_employees WHERE department = 'Engineering';

-- WITH CASCADED CHECK OPTION (default)
CREATE VIEW ny_engineers_cascaded AS
SELECT * FROM ny_employees WHERE department = 'Engineering'
WITH CASCADED CHECK OPTION;

-- Insert must satisfy ALL underlying view conditions
INSERT INTO ny_engineers_cascaded (location, department)
VALUES ('NY', 'Engineering');  -- Success
INSERT INTO ny_engineers_cascaded (location, department)
VALUES ('CA', 'Engineering');  -- Fails (location β‰  'NY')
INSERT INTO ny_engineers_cascaded (location, department)
VALUES ('NY', 'Sales');        -- Fails (department β‰  'Engineering')

-- WITH LOCAL CHECK OPTION
CREATE VIEW ny_engineers_local AS
SELECT * FROM ny_employees WHERE department = 'Engineering'
WITH LOCAL CHECK OPTION;

-- Insert only needs to satisfy THIS view's condition
INSERT INTO ny_engineers_local (location, department)
VALUES ('CA', 'Engineering');  -- May succeed (depends on DB)

Use Cases:

  • Row-level security (users only see/insert their data)
  • Data validation through views
  • Ensuring data integrity for specific subsets
  • Simplifying complex constraints
  • Application interfaces with controlled access
37
Beginner

What are SQL aliases and why are they used?

Aliases 3 min read

Aliases give temporary names to tables, columns, or expressions, improving query readability and enabling self-joins.

Column Aliases:
-- Basic column aliases
SELECT 
    first_name || ' ' || last_name AS full_name,
    salary * 12 AS annual_salary,
    department AS dept,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

-- Aliases with special characters or spaces
SELECT 
    name AS "Employee Name",
    salary AS "Monthly Salary (USD)",
    hire_date AS "Date Hired"
FROM employees;

-- Aliases in ORDER BY
SELECT name, salary * 1.1 AS new_salary
FROM employees
ORDER BY new_salary DESC;
Table Aliases:
-- Simple table aliases
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

-- Self-join using aliases (essential!)
SELECT 
    e.name AS employee_name,
    m.name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

-- Multiple table aliases in complex queries
SELECT 
    c.name AS customer_name,
    o.order_date,
    p.name AS product_name,
    oi.quantity,
    s.name AS supplier_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN suppliers s ON p.supplier_id = s.supplier_id;
Subquery Aliases:
-- Subquery in FROM clause requires alias
SELECT dept_stats.department, dept_stats.avg_salary
FROM (
    SELECT 
        department,
        AVG(salary) AS avg_salary,
        COUNT(*) AS emp_count
    FROM employees
    GROUP BY department
) AS dept_stats
WHERE dept_stats.emp_count > 10;

-- CTE aliases
WITH department_summary AS (
    SELECT 
        department,
        AVG(salary) AS avg_sal,
        MAX(salary) AS max_sal
    FROM employees
    GROUP BY department
)
SELECT * FROM department_summary
WHERE avg_sal > 60000;

Benefits of Aliases:

  • Improves query readability
  • Enables self-joins (same table referenced twice)
  • Shortens long table names
  • Provides meaningful names for expressions
  • Required for derived tables and some subqueries
  • Helps avoid column name conflicts
38
Advanced

What is database connection pooling and its benefits?

Connection Pooling 4 min read

Connection pooling maintains a cache of database connections that can be reused by multiple applications, reducing overhead of establishing new connections.

Connection Pool Configuration (Java - HikariCP):
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://localhost:3306/mydb");
config.setUsername("user");
config.setPassword("password");
config.setMaximumPoolSize(20);       // Max connections in pool
config.setMinimumIdle(10);           // Minimum idle connections
config.setConnectionTimeout(30000);  // 30 seconds timeout
config.setIdleTimeout(600000);       // 10 minutes idle timeout
config.setMaxLifetime(1800000);      // 30 minutes max lifetime
config.setConnectionTestQuery("SELECT 1");

HikariDataSource dataSource = new HikariDataSource(config);
Python (psycopg2 with pooling):
import psycopg2
from psycopg2 import pool

# Create connection pool
connection_pool = psycopg2.pool.SimpleConnectionPool(
    5, 20,  # minconn, maxconn
    host="localhost",
    database="mydb",
    user="user",
    password="password",
    port=5432
)

# Get connection from pool
conn = connection_pool.getconn()
try:
    cur = conn.cursor()
    cur.execute("SELECT * FROM employees")
    results = cur.fetchall()
finally:
    # Return connection to pool (don't close!)
    connection_pool.putconn(conn)

Benefits:

  • Reduced Latency: Reuse existing connections (no TCP handshake)
  • Resource Efficiency: Limits concurrent connections to database
  • Improved Performance: Avoids connection establishment overhead
  • Better Scalability: Handles connection spikes gracefully
  • Connection Management: Automatic validation and cleanup

Common Pool Parameters:

  • maxPoolSize: Maximum connections in pool
  • minPoolSize: Minimum connections kept ready
  • connectionTimeout: Max wait time for connection
  • idleTimeout: How long idle connections stay in pool
  • maxLifetime: Total lifetime of a connection
  • validationQuery: Test query to validate connection

When to Use Connection Pooling:

  • Web applications with many concurrent users
  • Microservices architecture
  • Applications with frequent database calls
  • When database connection limits are low
  • To prevent connection leaks and timeouts
39
Intermediate

What are recursive queries and how are they written?

Recursive Queries 5 min read

Recursive queries use Common Table Expressions (CTEs) to query hierarchical or graph-like data, such as organizational charts or bill of materials.

Organizational Hierarchy Example:
-- Employees table with manager hierarchy
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT REFERENCES employees(employee_id)
);

-- Recursive CTE to get hierarchy
WITH RECURSIVE org_chart AS (
    -- Anchor member: top-level employees (no manager)
    SELECT 
        employee_id,
        name,
        manager_id,
        1 as level,
        name as path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive member: join with previous level
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        oc.level + 1 as level,
        oc.path || ' -> ' || e.name as path
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT 
    employee_id,
    name,
    level,
    path
FROM org_chart
ORDER BY path;
Bill of Materials (BOM) Example:
-- Parts explosion (component hierarchy)
CREATE TABLE bom (
    assembly_id INT,
    component_id INT,
    quantity INT
);

WITH RECURSIVE parts_explosion AS (
    -- Anchor: top-level assemblies
    SELECT 
        assembly_id,
        component_id,
        quantity,
        1 as level,
        CAST(component_id AS VARCHAR) as path
    FROM bom
    WHERE assembly_id = 1000  -- Starting assembly
    
    UNION ALL
    
    -- Recursive: explode components
    SELECT 
        b.assembly_id,
        b.component_id,
        pe.quantity * b.quantity as total_qty,
        pe.level + 1 as level,
        pe.path || '.' || CAST(b.component_id AS VARCHAR)
    FROM bom b
    INNER JOIN parts_explosion pe ON b.assembly_id = pe.component_id
)
SELECT * FROM parts_explosion
ORDER BY level, path;
Finding All Ancestors (Bottom-up):
-- Find all managers above an employee
WITH RECURSIVE manager_chain AS (
    -- Start with specific employee
    SELECT 
        employee_id,
        name,
        manager_id,
        0 as level
    FROM employees
    WHERE employee_id = 123  -- Starting employee
    
    UNION ALL
    
    -- Move up to manager
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        mc.level + 1
    FROM employees e
    INNER JOIN manager_chain mc ON e.employee_id = mc.manager_id
)
SELECT * FROM manager_chain
ORDER BY level DESC;

Recursive CTE Components:

  • Anchor Member: Initial query, provides starting point
  • Recursive Member: References CTE itself, continues recursion
  • Termination Condition: When no more rows are returned
  • UNION/UNION ALL: Combines anchor and recursive results
  • WITH RECURSIVE: Keyword to declare recursive CTE
40
Expert

What are database sharding strategies and their trade-offs?

Sharding 6 min read

Sharding horizontally partitions data across multiple databases/servers to distribute load and enable scalability beyond single server limits.

Range-based Sharding:
-- Shard by customer_id ranges
-- Shard 1: customer_id 1-1000000
-- Shard 2: customer_id 1000001-2000000
-- Shard 3: customer_id 2000001-3000000

-- Application determines shard based on range
function get_shard_for_customer(customer_id) {
    if (customer_id <= 1000000) return 'shard1';
    if (customer_id <= 2000000) return 'shard2';
    return 'shard3';
}

-- Query specific shard
SELECT * FROM shard1.customers WHERE customer_id = 500000;
SELECT * FROM shard2.orders WHERE customer_id = 1500000;
Hash-based Sharding:
-- Shard by hash of customer_id
-- Number of shards: 4
-- Shard = customer_id % 4

-- Consistent hashing algorithm
function get_shard_for_customer(customer_id) {
    const shard_count = 4;
    const hash = md5(customer_id.toString());
    const shard_index = parseInt(hash.substring(0, 8), 16) % shard_count;
    return `shard${shard_index + 1}`;
}

-- Hash sharding distributes evenly
-- customer_id 123 β†’ shard3 (123 % 4 = 3)
-- customer_id 456 β†’ shard1 (456 % 4 = 0)
-- customer_id 789 β†’ shard2 (789 % 4 = 1)
Directory-based Sharding:
-- Maintain shard mapping in lookup table
CREATE TABLE shard_mapping (
    customer_id INT PRIMARY KEY,
    shard_id INT,
    shard_host VARCHAR(100),
    shard_database VARCHAR(50)
);

-- Look up shard before querying
SELECT shard_host, shard_database 
FROM shard_mapping 
WHERE customer_id = 12345;

-- Result: 'db-server-3', 'customers_db_3'

-- Application connects to correct shard
-- Connection string: jdbc:mysql://db-server-3/customers_db_3

Sharding Strategies Comparison:

  • Range-based: Easy to implement, supports range queries, but can cause hotspots
  • Hash-based: Even distribution, no hotspots, but range queries difficult
  • Directory-based: Most flexible, easy rebalancing, but single point of failure
  • Geography-based: Shard by region/country, low latency, but uneven distribution
  • Tenant-based: Separate shard per customer/organization, good isolation

Challenges with Sharding:

  • Cross-shard queries: Difficult joins across shards
  • Transactions: Distributed transactions complex
  • Rebalancing: Moving data between shards is hard
  • Hotspots: Uneven load distribution
  • Operational complexity: More servers to manage
  • Backup/Recovery: Coordinating across shards
41
Intermediate

What are SQL analytic functions vs aggregate functions?

Analytic Functions 4 min read

Analytic functions compute aggregate values while retaining individual row details, unlike aggregate functions that collapse multiple rows into one.

Aggregate Functions (collapse rows):
-- Returns single row with totals
SELECT 
    department,
    COUNT(*) as emp_count,
    AVG(salary) as avg_salary,
    SUM(salary) as total_salary
FROM employees
GROUP BY department;

-- Result (collapsed):
-- department | emp_count | avg_salary | total_salary
-- Sales      | 25        | 65000      | 1625000
-- Engineering| 50        | 85000      | 4250000
Analytic Functions (keep rows):
-- Returns all rows with additional analytic columns
SELECT 
    name,
    department,
    salary,
    -- Analytic functions
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
    SUM(salary) OVER (PARTITION BY department) as dept_total_salary,
    salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank,
    PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) as percentile,
    LAG(salary, 1) OVER (PARTITION BY department ORDER BY hire_date) as prev_salary
FROM employees;

-- Result (individual rows preserved):
-- name | department | salary | dept_avg | dept_total | diff | rank | ...
-- John | Sales      | 70000  | 65000    | 1625000    | 5000 | 1    | ...
-- Jane | Sales      | 60000  | 65000    | 1625000    | -5000| 2    | ...

Common Analytic Functions:

  • ROW_NUMBER(): Sequential numbering of rows
  • RANK(), DENSE_RANK(): Ranking with/without gaps
  • NTILE(n): Divide rows into n buckets
  • LEAD(), LAG(): Access subsequent/previous rows
  • FIRST_VALUE(), LAST_VALUE(): First/last in window
  • PERCENT_RANK(), CUME_DIST(): Relative rankings
  • Aggregates with OVER(): SUM, AVG, COUNT, MIN, MAX

Key Differences:

  • Aggregate: Collapses rows, requires GROUP BY
  • Analytic: Preserves rows, uses OVER() clause
  • Result Set: Aggregate reduces rows, analytic adds columns
  • Use Case: Aggregate for summaries, analytic for row-level analysis
  • Performance: Analytic can be more expensive but more flexible
42
Advanced

What is database connection multiplexing and its benefits?

Connection Management 4 min read

Connection multiplexing allows multiple logical client connections to share a single physical database connection, reducing connection overhead.

Traditional vs Multiplexed Connections:
-- Traditional: Each client has dedicated connection
Client1 β†’ Connection1 β†’ Database
Client2 β†’ Connection2 β†’ Database
Client3 β†’ Connection3 β†’ Database
-- 3 physical connections, each with memory/CPU overhead

-- Multiplexed: Multiple clients share connections
Client1 β†’ \
Client2 β†’ β†’ Multiplexer β†’ Single Connection β†’ Database
Client3 β†’ /
-- 3 logical connections, 1 physical connection
How Multiplexing Works:
// Pseudo-code for connection multiplexer
class ConnectionMultiplexer {
    private ConnectionPool physicalPool;
    private Map virtualConnections;
    
    executeQuery(clientId, sql) {
        // Get or create virtual connection for client
        VirtualConnection vconn = virtualConnections.get(clientId);
        if (vconn == null) {
            vconn = new VirtualConnection(clientId);
            virtualConnections.put(clientId, vconn);
        }
        
        // Borrow physical connection from pool
        PhysicalConnection pconn = physicalPool.borrow();
        try {
            // Execute on physical connection
            ResultSet rs = pconn.execute(sql);
            
            // Map results back to virtual connection
            return vconn.processResults(rs);
        } finally {
            // Return physical connection to pool
            physicalPool.return(pconn);
        }
    }
}

Benefits:

  • Reduced Connection Count: Fewer physical connections to database
  • Lower Memory Usage: Each connection consumes 1-10MB
  • Better Scalability: Support more clients with same resources
  • Connection Pool Efficiency: Better utilization of pooled connections
  • Reduced Latency: Reuse established connections

Challenges:

  • Session State: Temporary tables, session variables may conflict
  • Transactions: Need to manage transaction isolation
  • Error Handling: Errors from one client shouldn't affect others
  • Load Balancing: Need to distribute queries evenly
  • Complexity: More complex than simple connection pooling

When to Use:

  • Microservices with many instances
  • Serverless functions (Lambda) with database access
  • Applications hitting database connection limits
  • Read-heavy workloads with short queries
  • When database memory is constrained
43
Beginner

What are SQL data types and their common uses?

Data Types 5 min read

SQL data types define what kind of data a column can hold. Choosing the right type affects storage, performance, and data integrity.

Numeric Data Types:
CREATE TABLE example_numbers (
    -- Integer types
    id INT PRIMARY KEY,              -- 4 bytes, -2B to 2B
    small_id SMALLINT,               -- 2 bytes, -32K to 32K
    big_id BIGINT,                   -- 8 bytes, huge range
    
    -- Exact numeric (decimal)
    price DECIMAL(10, 2),            -- 10 total digits, 2 decimal places
    precise NUMERIC(15, 5),          -- Same as DECIMAL
    
    -- Floating point (approximate)
    weight FLOAT,                    -- 4 bytes, ~7 digits precision
    measurement DOUBLE PRECISION,    -- 8 bytes, ~15 digits precision
    rating REAL                      -- 4 bytes, ~6 digits precision
    
    -- Specialized
    serial_id SERIAL,                -- Auto-incrementing integer (PostgreSQL)
    money_amount MONEY               -- Currency type (SQL Server/PostgreSQL)
);
Character/String Data Types:
CREATE TABLE example_strings (
    -- Fixed length (padded with spaces)
    country_code CHAR(2),            -- Exactly 2 characters
    ssn CHAR(9),                     -- Exactly 9 characters
    
    -- Variable length
    name VARCHAR(100),               -- Max 100 characters
    description TEXT,                -- Very large text (unlimited in some DBs)
    
    -- Unicode support
    unicode_name NVARCHAR(100),      -- SQL Server unicode
    postgres_unicode VARCHAR(100),   -- PostgreSQL (always unicode)
    
    -- Specialized
    email VARCHAR(255) CHECK (email LIKE '%@%'),
    url VARCHAR(2083),               -- Max URL length
    json_data JSON,                  -- JSON document (PostgreSQL/MySQL 5.7+)
    xml_data XML                     -- XML document
);
Date/Time Data Types:
CREATE TABLE example_dates (
    -- Date only
    birth_date DATE,                 -- '2023-12-25'
    
    -- Time only
    meeting_time TIME,               -- '14:30:00'
    meeting_time_tz TIME WITH TIME ZONE, -- '14:30:00-05'
    
    -- Date and time
    created_at TIMESTAMP,            -- '2023-12-25 14:30:00'
    updated_at TIMESTAMPTZ,          -- With timezone
    
    -- Specialized
    appointment TIMESTAMP(0),        -- No fractional seconds
    interval_field INTERVAL,         -- Time interval '1 day 2 hours'
    
    -- Database-specific
    year_field YEAR,                 -- MySQL only
    datetime_field DATETIME,         -- MySQL/SQL Server
    smalldatetime_field SMALLDATETIME -- SQL Server
);

Best Practices:

  • Use smallest data type that fits your data
  • Use VARCHAR instead of CHAR for variable-length strings
  • Use DECIMAL for financial data (exact precision)
  • Use TIMESTAMP WITH TIME ZONE for timezone-aware dates
  • Consider storage requirements and performance implications
  • Use appropriate constraints (CHECK, NOT NULL) with data types
44
Expert

What are database cursors and when should they be avoided?

Cursors 5 min read

Cursors allow row-by-row processing of result sets, but they are generally inefficient and should be avoided in favor of set-based operations.

Cursor Example (SQL Server):
DECLARE @employee_id INT;
DECLARE @employee_name VARCHAR(100);
DECLARE @salary DECIMAL(10,2);

-- Declare cursor
DECLARE employee_cursor CURSOR FOR
SELECT employee_id, name, salary
FROM employees
WHERE department = 'Sales';

-- Open cursor
OPEN employee_cursor;

-- Fetch first row
FETCH NEXT FROM employee_cursor INTO @employee_id, @employee_name, @salary;

-- Process rows
WHILE @@FETCH_STATUS = 0
BEGIN
    -- Process each row
    IF @salary < 50000
    BEGIN
        UPDATE employees
        SET salary = salary * 1.10
        WHERE employee_id = @employee_id;
        
        PRINT 'Updated salary for ' + @employee_name;
    END
    
    -- Fetch next row
    FETCH NEXT FROM employee_cursor INTO @employee_id, @employee_name, @salary;
END

-- Clean up
CLOSE employee_cursor;
DEALLOCATE employee_cursor;
Set-Based Alternative (Recommended):
-- Same operation, set-based (much faster)
UPDATE employees
SET salary = salary * 1.10
WHERE department = 'Sales'
AND salary < 50000;

-- Print affected rows
PRINT 'Updated ' + CAST(@@ROWCOUNT AS VARCHAR) + ' employees';

Why Cursors Are Inefficient:

  • Row-by-row processing: Multiple round trips to database
  • Locking overhead: May hold locks longer
  • Memory usage: Cursor result sets stored in tempdb
  • Network overhead: Multiple small operations vs one bulk
  • Poor scalability: Performance degrades with data volume

When Cursors Might Be Acceptable:

  • Admin/ETL scripts (not production queries)
  • Complex business logic requiring row-by-row decisions
  • When calling stored procedures for each row
  • Migrating data between incompatible schemas
  • Debugging/testing scenarios
  • Very small datasets (few hundred rows)

Cursor Types and Performance:

  • Forward-only, read-only: Fastest cursor type
  • Static: Copy of result set, insensitive to changes
  • Dynamic: Reflects all changes, most expensive
  • Keyset: Middle ground, tracks keys only
  • Always use fastest cursor that meets requirements
  • Consider table variables or temp tables as alternatives
45
Advanced

What is query plan caching and parameter sniffing problem?

Query Optimization 5 min read

Parameter sniffing occurs when SQL Server creates an execution plan optimized for the first parameter values, which may be suboptimal for subsequent calls with different values.

Parameter Sniffing Example:
-- Stored procedure with parameter
CREATE PROCEDURE GetOrdersByStatus
    @status VARCHAR(20)
AS
BEGIN
    SELECT * FROM orders
    WHERE order_status = @status
    ORDER BY order_date DESC;
END;

-- First execution (with 'Completed' - few rows)
EXEC GetOrdersByStatus @status = 'Completed';
-- Creates plan optimized for few rows (uses index seek)

-- Second execution (with 'Pending' - many rows)
EXEC GetOrdersByStatus @status = 'Pending';
-- Uses same plan (seek), but table scan would be better
-- Result: Poor performance for 'Pending' status
Solutions to Parameter Sniffing:
-- 1. Use OPTION (RECOMPILE) - recompile each time
CREATE PROCEDURE GetOrdersByStatus
    @status VARCHAR(20)
AS
BEGIN
    SELECT * FROM orders
    WHERE order_status = @status
    ORDER BY order_date DESC
    OPTION (RECOMPILE);  -- New plan each execution
END;

-- 2. Use OPTIMIZE FOR UNKNOWN
CREATE PROCEDURE GetOrdersByStatus
    @status VARCHAR(20)
AS
BEGIN
    SELECT * FROM orders
    WHERE order_status = @status
    ORDER BY order_date DESC
    OPTION (OPTIMIZE FOR UNKNOWN);  -- Use average statistics
END;

-- 3. Use OPTIMIZE FOR specific value
CREATE PROCEDURE GetOrdersByStatus
    @status VARCHAR(20)
AS
BEGIN
    SELECT * FROM orders
    WHERE order_status = @status
    ORDER BY order_date DESC
    OPTION (OPTIMIZE FOR (@status = 'Pending'));  -- Optimize for common case
END;

-- 4. Use local variables (disables sniffing)
CREATE PROCEDURE GetOrdersByStatus
    @status VARCHAR(20)
AS
BEGIN
    DECLARE @local_status VARCHAR(20) = @status;
    
    SELECT * FROM orders
    WHERE order_status = @local_status
    ORDER BY order_date DESC;
END;

How Query Plan Caching Works:

  • SQL Server compiles query and creates execution plan
  • Plan cached in memory (plan cache)
  • Subsequent identical queries reuse cached plan
  • Parameter values sniffed during first compilation
  • Plan optimized for those specific parameter values

When Parameter Sniffing Helps vs Hurts:

  • Helps: When parameter distribution is uniform
  • Hurts: When data distribution varies widely
  • Example: Status='Deleted' (few rows) vs Status='Active' (many rows)
  • Solution: Monitor performance, use appropriate mitigation
46
Intermediate

What is the difference between ANY, ALL, and SOME operators?

Comparison Operators 4 min read

These operators compare a value to each value in a list or subquery, with different comparison semantics.

ANY/SOME Operator (same meaning):
-- Find employees earning more than ANY manager
SELECT name, salary
FROM employees
WHERE salary > ANY (
    SELECT salary 
    FROM employees 
    WHERE title = 'Manager'
);
-- Equivalent to: salary > MIN(manager_salaries)

-- Using SOME (synonym for ANY)
SELECT name, salary
FROM employees
WHERE salary > SOME (
    SELECT salary 
    FROM employees 
    WHERE title = 'Manager'
);

-- ANY with IN-like behavior
SELECT product_name
FROM products
WHERE category_id = ANY (SELECT category_id FROM popular_categories);
-- Equivalent to: category_id IN (...)
ALL Operator:
-- Find employees earning more than ALL managers
SELECT name, salary
FROM employees
WHERE salary > ALL (
    SELECT salary 
    FROM employees 
    WHERE title = 'Manager'
);
-- Equivalent to: salary > MAX(manager_salaries)

-- ALL with comparison
SELECT product_name, price
FROM products
WHERE price <= ALL (
    SELECT price 
    FROM products 
    WHERE category_id = 5
);
-- Finds cheapest product in category 5

-- ALL with NOT
SELECT customer_name
FROM customers
WHERE customer_id != ALL (
    SELECT customer_id 
    FROM orders 
    WHERE order_date > '2024-01-01'
);
-- Customers with no recent orders
Practical Examples:
-- Find departments where ALL employees earn > 50000
SELECT department
FROM employees e1
WHERE 50000 < ALL (
    SELECT salary 
    FROM employees e2 
    WHERE e2.department = e1.department
)
GROUP BY department;

-- Using EXISTS alternative for ALL
SELECT department
FROM employees e1
WHERE NOT EXISTS (
    SELECT 1 
    FROM employees e2 
    WHERE e2.department = e1.department 
    AND e2.salary <= 50000
)
GROUP BY department;

Key Differences:

  • ANY/SOME: True if comparison holds for AT LEAST ONE value
  • ALL: True if comparison holds for ALL values
  • SOME = ANY: They are synonyms
  • Empty subquery: ANY returns FALSE, ALL returns TRUE
  • Use ANY for "at least one" conditions
  • Use ALL for "every" conditions
  • Often rewritable with MIN/MAX or EXISTS
47
Expert

What are database snapshots and their use cases?

Snapshots 5 min read

Database snapshots provide read-only, static views of a database at a point in time, using copy-on-write mechanism for storage efficiency.

Creating Snapshots (SQL Server):
-- Create database snapshot
CREATE DATABASE SalesDB_Snapshot_1200
ON (
    NAME = SalesDB_Data,
    FILENAME = 'D:\Snapshots\SalesDB_1200.ss'
)
AS SNAPSHOT OF SalesDB;

-- Multiple snapshots possible
CREATE DATABASE SalesDB_Snapshot_1400
ON (
    NAME = SalesDB_Data,
    FILENAME = 'D:\Snapshots\SalesDB_1400.ss'
)
AS SNAPSHOT OF SalesDB;

-- Query snapshot like regular database
SELECT * FROM SalesDB_Snapshot_1200.dbo.orders;

-- Restore database from snapshot
USE master;
RESTORE DATABASE SalesDB
FROM DATABASE_SNAPSHOT = 'SalesDB_Snapshot_1200';

-- Drop snapshot
DROP DATABASE SalesDB_Snapshot_1200;
How Snapshots Work (Copy-on-Write):
-- Before modification:
-- Source page: [A B C D]
-- Snapshot:    [empty]

-- After first modification (change C to X):
-- Source saves original page to snapshot
-- Source page: [A B X D]  (changed)
-- Snapshot:    [A B C D]  (original saved)

-- Subsequent reads from snapshot:
-- If reading unchanged data (A,B,D): read from source
-- If reading changed data (C): read from snapshot

-- Storage efficiency: Only changed pages stored
-- If 1% of pages change, snapshot uses 1% of source size

Use Cases:

  • Reporting: Consistent view for long-running reports
  • Recovery: Quick recovery from user errors
  • Testing: Test against production-like data
  • Auditing: Historical data analysis
  • Data Protection: Guard against accidental changes
  • Migration Testing: Test migrations without affecting production

Limitations:

  • Read-only (cannot modify snapshot)
  • Performance impact on source database (copy-on-write overhead)
  • Storage requirements grow as source changes
  • Cannot create snapshot of system databases
  • Snapshot must be on same server as source
  • Not a backup replacement (if source lost, snapshots lost)
48
Intermediate

What are SQL mathematical and statistical functions?

Mathematical Functions 4 min read

SQL provides various mathematical and statistical functions for numeric data analysis and calculations.

Basic Mathematical Functions:
SELECT 
    -- Basic arithmetic
    10 + 5 as addition,
    10 - 5 as subtraction,
    10 * 5 as multiplication,
    10 / 5 as division,
    10 % 3 as modulus,
    
    -- Power and roots
    POWER(2, 3) as power,          -- 2^3 = 8
    SQRT(16) as square_root,       -- 4
    EXP(1) as exponential,         -- e^1 β‰ˆ 2.718
    LOG(100, 10) as logarithm,     -- log₁₀100 = 2
    LN(EXP(1)) as natural_log,     -- ln(e) = 1
    
    -- Rounding
    ROUND(123.4567, 2) as rounded, -- 123.46
    CEILING(123.45) as ceiling,    -- 124 (smallest integer β‰₯ value)
    FLOOR(123.45) as floor,        -- 123 (largest integer ≀ value)
    TRUNC(123.456, 1) as truncate, -- 123.4 (PostgreSQL)
    
    -- Sign and absolute
    ABS(-123) as absolute,         -- 123
    SIGN(-123) as sign,            -- -1
    SIGN(0) as sign_zero,          -- 0
    SIGN(123) as sign_positive;    -- 1
Trigonometric Functions:
SELECT 
    -- Angles in radians
    PI() as pi,                    -- 3.14159
    RADIANS(180) as radians,       -- Ο€
    DEGREES(PI()) as degrees,      -- 180
    
    -- Basic trig
    SIN(PI()/2) as sine,           -- 1
    COS(PI()) as cosine,           -- -1
    TAN(PI()/4) as tangent,        -- 1
    
    -- Inverse trig
    ASIN(1) as arcsine,            -- Ο€/2
    ACOS(0) as arccosine,          -- Ο€/2
    ATAN(1) as arctangent,         -- Ο€/4
    
    -- Hyperbolic
    SINH(1) as hyperbolic_sine,
    COSH(0) as hyperbolic_cosine,
    TANH(0.5) as hyperbolic_tangent;
Statistical Functions:
-- Basic statistics
SELECT 
    AVG(salary) as mean_salary,
    MEDIAN(salary) as median_salary,  -- Some databases
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) as q1,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) as q3,
    STDDEV(salary) as std_deviation,
    VARIANCE(salary) as variance,
    CORR(salary, commission) as correlation,
    COVAR_POP(salary, commission) as covariance
FROM employees;

-- Window statistical functions
SELECT 
    name,
    salary,
    AVG(salary) OVER () as overall_mean,
    salary - AVG(salary) OVER () as diff_from_mean,
    (salary - AVG(salary) OVER ()) / STDDEV(salary) OVER () as z_score
FROM employees;

Database-Specific Notes:

  • MySQL: Limited statistical functions, use aggregate or window functions
  • PostgreSQL: Rich set including MEDIAN, MODE, statistical aggregates
  • SQL Server: Good statistical support, PERCENTILE_CONT, STDEV, VAR
  • Oracle: Extensive statistical package, advanced analytics
  • Check documentation for exact function availability
49
Advanced

What is database connection failover and high availability?

High Availability 5 min read

Failover automatically switches to a standby database when the primary fails, ensuring high availability and minimal downtime.

Database Mirroring (SQL Server):
-- Configure database mirroring
ALTER DATABASE MyDB SET PARTNER = 'TCP://primary-server:5022';
ALTER DATABASE MyDB SET WITNESS = 'TCP://witness-server:5023';

-- On secondary server
ALTER DATABASE MyDB SET PARTNER = 'TCP://secondary-server:5022';

-- Check mirroring status
SELECT 
    database_name,
    mirroring_state_desc,
    mirroring_role_desc,
    mirroring_safety_level_desc
FROM sys.database_mirroring
WHERE mirroring_state IS NOT NULL;

-- Manual failover
ALTER DATABASE MyDB SET PARTNER FAILOVER;
PostgreSQL Streaming Replication:
-- Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB

-- Primary pg_hba.conf
host replication repuser secondary-ip/32 md5

-- Secondary setup
# Create base backup
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U repuser -v -P

-- Secondary recovery.conf (PostgreSQL 12+ uses postgresql.conf)
primary_conninfo = 'host=primary-host user=repuser password=secret'
promote_trigger_file = '/tmp/promote_to_primary'

-- Check replication status on primary
SELECT * FROM pg_stat_replication;

-- Promote secondary to primary
# On secondary server
touch /tmp/promote_to_primary
# Or: pg_ctl promote -D /var/lib/postgresql/data

High Availability Patterns:

  • Active-Passive: Primary handles traffic, standby ready
  • Active-Active: Both handle traffic, load balanced
  • Multi-Master: All nodes accept writes, complex sync
  • Read Replicas: Scale reads, async replication

Failover Strategies:

  • Automatic Failover: Detects failure, switches automatically
  • Manual Failover: Admin initiates switch
  • Planned Failover: For maintenance, zero data loss
  • Forced Failover: When primary unreachable, possible data loss

Connection String with Failover:

-- JDBC connection string with failover
jdbc:postgresql://primary:5432,secondary:5432/mydb
?targetServerType=primary
&loadBalanceHosts=true
&hostRecheckSeconds=10

-- Application logic
try {
    // Connect to primary
    executeQuery();
} catch (SQLException e) {
    // Connection failed, retry with secondary
    updateConnectionString();
    executeQuery();
}
50
Expert

What are database materialized view logs and fast refresh?

Materialized Views 5 min read

Materialized view logs track changes to base tables, enabling fast incremental refreshes instead of complete rebuilds.

Oracle Materialized View Log Example:
-- Create materialized view log on base table
CREATE MATERIALIZED VIEW LOG ON employees
WITH ROWID, SEQUENCE
(employee_id, department_id, salary, hire_date)
INCLUDING NEW VALUES;

-- Create fast refreshable materialized view
CREATE MATERIALIZED VIEW mv_employee_summary
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT 
    department_id,
    COUNT(*) as emp_count,
    AVG(salary) as avg_salary,
    SUM(salary) as total_salary
FROM employees
GROUP BY department_id;

-- Manual refresh
BEGIN
    DBMS_MVIEW.REFRESH('mv_employee_summary', 'F');
    -- 'F' = Fast refresh (incremental)
    -- 'C' = Complete refresh (rebuild)
    -- '?' = Force refresh (try fast, fallback to complete)
END;

-- Check refresh status
SELECT mview_name, last_refresh_type, staleness
FROM user_mviews;
How Fast Refresh Works:
-- 1. Materialized view log captures changes
-- When base table changes:
-- INSERT: Logs new row with ROWID and values
-- UPDATE: Logs old and new values
-- DELETE: Logs deleted row ROWID

-- 2. Refresh uses log to update materialized view
-- Instead of: SELECT * FROM base_table (full scan)
-- It does: Apply logged changes to materialized view

-- Example: Add 3 rows, update 2 rows, delete 1 row
-- Complete refresh: Processes all 1,000,000 rows
-- Fast refresh: Processes only 6 changed rows

-- 3. Log cleaned after refresh
DELETE FROM mlog$_employees WHERE ...;

Benefits of Fast Refresh:

  • Performance: Much faster than complete refresh
  • Reduced Overhead: Minimal impact during refresh
  • Fresher Data: Can refresh more frequently
  • Less Locking: Shorter duration of locks
  • Scalability: Works well with large tables

Limitations:

  • Storage overhead for logs
  • Not all queries support fast refresh
  • Complex joins/aggregations may require complete refresh
  • Log maintenance required
  • Database-specific implementation

When to Use:

  • Large tables with incremental changes
  • Frequently refreshed materialized views
  • Real-time or near-real-time reporting
  • When complete refresh is too expensive
  • For ON COMMIT refresh (immediate consistency)