Master your SQL interviews with our comprehensive collection of frequently asked questions
SQL (Structured Query Language) is a standardized programming language used for managing and manipulating relational databases. It enables users to create, read, update, and delete data in database systems.
SELECT first_name, last_name, email FROM employees WHERE department = 'Engineering' ORDER BY last_name;
WHERE is used to filter rows before grouping, while HAVING is used to filter groups after the GROUP BY operation. WHERE cannot be used with aggregate functions, but HAVING can.
SELECT * FROM employees WHERE salary > 50000;
SELECT department, AVG(salary) as avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 60000;
JOINs are used to combine rows from two or more tables based on a related column. There are several types of JOINs in SQL:
SELECT e.name, d.department_name FROM employees e INNER JOIN departments d ON e.department_id = d.id;
SELECT e.name, d.department_name FROM employees e LEFT JOIN departments d ON e.department_id = d.id;
A Primary Key uniquely identifies each record in a table and cannot contain NULL values. A Foreign Key is a field that links to the Primary Key of another table, establishing relationships between tables.
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE,
employee_id INT,
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
Window functions perform calculations across a set of rows that are related to the current row, but unlike GROUP BY, they don't collapse rows. Each row retains its identity while still accessing aggregate information.
SELECT
employee_name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;
SELECT
department,
AVG(salary) as dept_avg_salary
FROM employees
GROUP BY department;
UNION and UNION ALL are both used to combine results from multiple SELECT queries, but they differ in how they handle duplicates and performance.
SELECT employee_name FROM employees_ny UNION SELECT employee_name FROM employees_sf;
SELECT employee_name FROM employees_ny UNION ALL SELECT employee_name FROM employees_sf;
A Common Table Expression (CTE) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs make complex queries more readable and maintainable.
WITH high_earners AS (
SELECT employee_id, name, salary
FROM employees
WHERE salary > 100000
)
SELECT * FROM high_earners
ORDER BY salary DESC;
WITH RECURSIVE org_hierarchy AS (
-- Anchor member
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member
SELECT e.employee_id, e.name, e.manager_id, oh.level + 1
FROM employees e
INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy;
Database normalization is the process of organizing data to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them.
CREATE TABLE employees (
employee_id INT,
name VARCHAR(100),
department VARCHAR(100),
dept_location VARCHAR(100),
skill1 VARCHAR(50),
skill2 VARCHAR(50),
skill3 VARCHAR(50)
);
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT
);
CREATE TABLE departments (
department_id INT PRIMARY KEY,
name VARCHAR(100),
location VARCHAR(100)
);
CREATE TABLE employee_skills (
employee_id INT,
skill VARCHAR(50),
PRIMARY KEY (employee_id, skill)
);
These three commands are used for removing data, but they work differently in terms of scope, speed, and reversibility.
-- Removes specific rows, can be rolled back DELETE FROM employees WHERE department = 'HR'; -- Removes all rows but table structure remains DELETE FROM employees;
-- Removes all rows, cannot be rolled back, faster than DELETE TRUNCATE TABLE employees;
-- Removes entire table including structure DROP TABLE employees;
A CROSS JOIN produces the Cartesian product of two tables, meaning it combines each row from the first table with every row from the second table.
-- Create sample tables
CREATE TABLE colors (color_name VARCHAR(20));
INSERT INTO colors VALUES ('Red'), ('Green'), ('Blue');
CREATE TABLE sizes (size_name VARCHAR(20));
INSERT INTO sizes VALUES ('Small'), ('Medium'), ('Large');
-- CROSS JOIN to get all combinations
SELECT color_name, size_name
FROM colors
CROSS JOIN sizes;
Aggregate functions perform calculations on multiple rows and return a single value. They are typically used with GROUP BY clauses.
SELECT
COUNT(*) as total_employees,
SUM(salary) as total_salary,
AVG(salary) as average_salary,
MAX(salary) as highest_salary,
MIN(salary) as lowest_salary,
STDDEV(salary) as salary_stddev,
VARIANCE(salary) as salary_variance
FROM employees;
SELECT
department,
COUNT(*) as emp_count,
AVG(salary) as avg_salary,
SUM(salary) as dept_budget
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
NULL represents missing, unknown, or inapplicable data. It's not the same as empty string ('') or zero (0).
-- Sample data INSERT INTO employees (id, name, phone, salary) VALUES (1, 'John Doe', NULL, 50000), -- phone is unknown (2, 'Jane Smith', '', 60000), -- phone is empty string (3, 'Bob Wilson', '555-1234', 0); -- phone exists, salary is 0 -- Different behaviors SELECT * FROM employees WHERE phone IS NULL; -- Returns John only SELECT * FROM employees WHERE phone = ''; -- Returns Jane only SELECT * FROM employees WHERE salary = 0; -- Returns Bob only
-- Use COALESCE to provide default values SELECT name, COALESCE(phone, 'No phone') as contact FROM employees; -- Use NULLIF to treat specific values as NULL SELECT name, NULLIF(salary, 0) as actual_salary FROM employees;
A correlated subquery is a subquery that references columns from the outer query. It executes once for each row processed by the outer query.
-- Find employees whose salary is above average in their department
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department
);
-- Find employees earning more than company average (executes once) SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
-- Same query rewritten as JOIN
SELECT e.name, e.department, e.salary
FROM employees e
JOIN (
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
) dept_avg ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_salary;
These are window functions used for ranking rows. They differ in how they handle ties and gaps in ranking.
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,
RANK() OVER (ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees
WHERE department = 'Sales';
Name | Salary | ROW_NUMBER | RANK | DENSE_RANK -----------|--------|------------|------|----------- Alice | 100000 | 1 | 1 | 1 Bob | 90000 | 2 | 2 | 2 Charlie | 90000 | 3 | 2 | 2 David | 80000 | 4 | 4 | 3
FULL OUTER JOIN returns all rows from both tables, matching them where possible and filling with NULLs where there's no match.
-- Sample data
CREATE TABLE customers (id INT, name VARCHAR(50));
INSERT INTO customers VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
CREATE TABLE orders (id INT, customer_id INT, amount DECIMAL);
INSERT INTO orders VALUES (101, 1, 100), (102, 2, 200), (103, 4, 300);
-- FULL OUTER JOIN
SELECT
c.name as customer_name,
o.id as order_id,
o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;
-- Using UNION of LEFT and RIGHT JOIN SELECT c.name, o.id, o.amount FROM customers c LEFT JOIN orders o ON c.id = o.customer_id UNION SELECT c.name, o.id, o.amount FROM customers c RIGHT JOIN orders o ON c.id = o.customer_id;
An execution plan shows how the database will execute a query. It helps identify performance bottlenecks and optimization opportunities.
-- MySQL EXPLAIN SELECT * FROM employees WHERE department = 'Sales'; -- PostgreSQL EXPLAIN ANALYZE SELECT * FROM employees WHERE department = 'Sales'; -- SQL Server SET SHOWPLAN_TEXT ON; GO SELECT * FROM employees WHERE department = 'Sales'; GO SET SHOWPLAN_TEXT OFF;
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | employees | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 10.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
A transaction is a logical unit of work that contains one or more SQL statements. ACID properties ensure reliable processing of database transactions.
BEGIN TRANSACTION;
-- Transfer $100 from account A to B
UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';
-- Check for errors
IF @@ERROR != 0
ROLLBACK TRANSACTION;
ELSE
COMMIT TRANSACTION;
-- SQL Server SET TRANSACTION ISOLATION LEVEL READ COMMITTED; -- MySQL SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- PostgreSQL SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Clustered indexes determine the physical order of data in a table, while non-clustered indexes are separate structures that point to the data.
-- Clustered index (only one per table, SQL Server syntax) CREATE CLUSTERED INDEX idx_employees_id ON employees(id); -- Non-clustered index (multiple per table) CREATE NONCLUSTERED INDEX idx_employees_email ON employees(email); CREATE NONCLUSTERED INDEX idx_employees_dept_salary ON employees(department, salary);
String functions manipulate text data. Different databases have similar functions but sometimes with different names.
SELECT
name,
-- Length/Position functions
LENGTH(name) as name_length,
CHAR_LENGTH(name) as char_count, -- Same as LENGTH for single-byte
POSITION(' ' IN name) as space_position,
INSTR(name, ' ') as space_pos_mysql,
-- Case conversion
UPPER(name) as upper_name,
LOWER(name) as lower_name,
-- Substring/extraction
SUBSTRING(name, 1, 5) as first_5_chars,
LEFT(name, 3) as first_3_chars,
RIGHT(name, 3) as last_3_chars,
-- Trimming
TRIM(name) as trimmed_name,
LTRIM(name) as left_trimmed,
RTRIM(name) as right_trimmed,
-- Replacement
REPLACE(name, ' ', '_') as name_with_underscores
FROM employees;
-- Concatenation
SELECT CONCAT(first_name, ' ', last_name) as full_name FROM employees;
-- MySQL: CONCAT(), PostgreSQL: || operator, SQL Server: + operator
-- Pattern matching
SELECT name FROM employees WHERE name LIKE 'J%'; -- Starts with J
SELECT name FROM employees WHERE name LIKE '%son'; -- Ends with son
SELECT name FROM employees WHERE name LIKE '%oh%'; -- Contains oh
-- Padding
SELECT LPAD(salary::text, 10, '0') as padded_salary FROM employees; -- Left pad
SELECT RPAD(name, 20, '.') as padded_name FROM employees; -- Right pad
-- Soundex (phonetic matching)
SELECT name FROM employees WHERE SOUNDEX(name) = SOUNDEX('Jon');
Normalization organizes data to reduce redundancy, while denormalization intentionally adds redundancy to improve performance.
-- Separate tables for related data
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE order_items (
order_item_id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
-- Combined table for faster reads
CREATE TABLE order_details (
order_id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
order_date DATE,
product_name VARCHAR(100),
quantity INT,
unit_price DECIMAL(10,2),
total_price DECIMAL(10,2) -- Redundant but pre-calculated
);
Both IN and EXISTS are used for subqueries, but they work differently in terms of performance and use cases.
-- Find employees in specific departments
SELECT name, department
FROM employees
WHERE department IN ('Sales', 'Marketing', 'Engineering');
-- Using subquery with IN
SELECT name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
-- Find customers who have placed orders
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Using NOT EXISTS to find customers without orders
SELECT customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Date and time functions allow manipulation and extraction of temporal data. Different databases have similar but sometimes different syntax.
-- Current date/time
SELECT
CURRENT_DATE as today_date, -- 2025-12-02
CURRENT_TIMESTAMP as now, -- 2025-12-02 14:30:45
NOW() as mysql_now, -- MySQL specific
GETDATE() as sqlserver_now, -- SQL Server specific
SYSDATE as oracle_now; -- Oracle specific
-- Date extraction
SELECT
EXTRACT(YEAR FROM order_date) as order_year,
EXTRACT(MONTH FROM order_date) as order_month,
EXTRACT(DAY FROM order_date) as order_day,
DATE_PART('dow', order_date) as day_of_week, -- 0=Sunday
DAY(order_date) as day_mysql, -- MySQL
DATEPART(month, order_date) as month_sqlserver; -- SQL Server
-- Date arithmetic
SELECT
order_date,
order_date + INTERVAL '7 days' as due_date, -- PostgreSQL/MySQL
DATEADD(day, 7, order_date) as due_date_sqlserver, -- SQL Server
order_date - INTERVAL '1 month' as last_month,
DATEDIFF('day', order_date, CURRENT_DATE) as days_ago, -- MySQL
DATE_DIFF(CURRENT_DATE, order_date, DAY) as days_diff; -- BigQuery
-- Date formatting
SELECT
TO_CHAR(order_date, 'YYYY-MM-DD') as iso_format,
TO_CHAR(order_date, 'Month DD, YYYY') as long_format,
DATE_FORMAT(order_date, '%W, %M %e, %Y') as mysql_format,
FORMAT(order_date, 'dd/MM/yyyy') as sqlserver_format;
-- Date validation
SELECT
order_date,
order_date::date as date_only, -- Cast to date
ISDATE(order_date_str) as is_valid_date; -- SQL Server
-- Find orders from last 30 days
SELECT * FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';
-- Group by month-year
SELECT
TO_CHAR(order_date, 'YYYY-MM') as month,
COUNT(*) as order_count,
SUM(amount) as total_amount
FROM orders
GROUP BY TO_CHAR(order_date, 'YYYY-MM')
ORDER BY month;
-- Calculate age
SELECT
name,
birth_date,
EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date)) as age
FROM employees;
Window functions perform calculations across a set of rows related to the current row. PARTITION BY divides rows into groups, ORDER BY determines order within partitions.
SELECT
employee_id,
name,
department,
salary,
-- Running total within department
SUM(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total,
-- Rank within department
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) as dept_salary_rank,
-- Compare to department average
salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg,
-- Moving average (last 3 months)
AVG(sales) OVER (
PARTITION BY salesperson_id
ORDER BY month_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3mo
FROM employees;
-- Find top 3 earners in each department
WITH ranked_employees AS (
SELECT
name,
department,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) as rank
FROM employees
)
SELECT * FROM ranked_employees WHERE rank <= 3;
-- Calculate month-over-month growth
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) as prev_month_rev,
revenue - LAG(revenue, 1) OVER (ORDER BY month) as growth,
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0 /
LAG(revenue, 1) OVER (ORDER BY month) as growth_pct
FROM monthly_sales;
Database locking prevents multiple transactions from modifying the same data simultaneously, ensuring data consistency.
-- SQL Server explicit locking BEGIN TRANSACTION; -- Get exclusive lock on table SELECT * FROM employees WITH (TABLOCKX); -- Or row-level lock SELECT * FROM employees WITH (ROWLOCK, XLOCK) WHERE id = 1; COMMIT; -- MySQL locking START TRANSACTION; -- Lock tables explicitly LOCK TABLES employees WRITE, departments READ; -- Perform operations UPDATE employees SET salary = salary * 1.1 WHERE department = 'Engineering'; -- Release locks UNLOCK TABLES; COMMIT; -- PostgreSQL advisory locks BEGIN; SELECT pg_advisory_lock(123); -- Application-level lock -- Critical section UPDATE accounts SET balance = balance - 100 WHERE id = 1; SELECT pg_advisory_unlock(123); COMMIT;
Constraints enforce rules on data columns to maintain data integrity and accuracy.
CREATE TABLE employees (
-- NOT NULL: Column cannot contain NULL
id INT NOT NULL,
-- UNIQUE: All values must be different
email VARCHAR(100) UNIQUE,
-- PRIMARY KEY: Unique identifier for table
employee_id INT PRIMARY KEY,
-- FOREIGN KEY: Links to another table
department_id INT,
CONSTRAINT fk_department
FOREIGN KEY (department_id)
REFERENCES departments(id),
-- CHECK: Validates data based on condition
salary DECIMAL(10,2) CHECK (salary >= 0),
age INT CHECK (age >= 18 AND age <= 65),
-- DEFAULT: Sets default value if not specified
hire_date DATE DEFAULT CURRENT_DATE,
status VARCHAR(20) DEFAULT 'active'
);
-- Add NOT NULL constraint
ALTER TABLE employees MODIFY COLUMN name VARCHAR(100) NOT NULL;
-- Add UNIQUE constraint
ALTER TABLE employees ADD CONSTRAINT uc_email UNIQUE (email);
-- Add CHECK constraint
ALTER TABLE employees ADD CONSTRAINT chk_salary
CHECK (salary >= 0);
-- Add FOREIGN KEY constraint
ALTER TABLE employees ADD CONSTRAINT fk_dept
FOREIGN KEY (department_id) REFERENCES departments(id);
-- Add DEFAULT constraint
ALTER TABLE employees ALTER COLUMN status
SET DEFAULT 'active';
The CASE statement provides conditional logic in SQL queries, similar to if-else statements in programming languages.
SELECT
name,
department,
CASE department
WHEN 'Sales' THEN 'Revenue Department'
WHEN 'Marketing' THEN 'Brand Department'
WHEN 'Engineering' THEN 'Tech Department'
ELSE 'Other Department'
END as dept_category,
CASE
WHEN salary >= 100000 THEN 'High'
WHEN salary >= 60000 THEN 'Medium'
WHEN salary >= 30000 THEN 'Low'
ELSE 'Very Low'
END as salary_grade
FROM employees;
SELECT
order_id,
amount,
CASE
WHEN amount > 1000 THEN 'Large Order'
WHEN amount > 500 THEN 'Medium Order'
WHEN amount > 100 THEN 'Small Order'
ELSE 'Mini Order'
END as order_size,
CASE
WHEN DATEDIFF(day, order_date, shipped_date) > 7 THEN 'Delayed'
WHEN DATEDIFF(day, order_date, shipped_date) <= 2 THEN 'Express'
ELSE 'Standard'
END as shipping_status
FROM orders;
-- Count by category
SELECT
COUNT(CASE WHEN department = 'Sales' THEN 1 END) as sales_count,
COUNT(CASE WHEN department = 'Engineering' THEN 1 END) as eng_count,
COUNT(CASE WHEN salary > 80000 THEN 1 END) as high_earners,
SUM(CASE WHEN status = 'active' THEN salary END) as active_payroll,
AVG(CASE WHEN department = 'Marketing' THEN salary END) as marketing_avg
FROM employees;
-- PIVOT-like functionality
SELECT
EXTRACT(YEAR FROM hire_date) as hire_year,
COUNT(CASE WHEN department = 'Sales' THEN 1 END) as sales,
COUNT(CASE WHEN department = 'Engineering' THEN 1 END) as engineering,
COUNT(CASE WHEN department = 'Marketing' THEN 1 END) as marketing
FROM employees
GROUP BY EXTRACT(YEAR FROM hire_date);
Stored procedures and functions are database objects containing SQL statements, but they serve different purposes and have different capabilities.
-- Create stored procedure
CREATE OR REPLACE PROCEDURE update_salary(
p_employee_id INT,
p_percentage DECIMAL(5,2)
)
LANGUAGE plpgsql -- PostgreSQL
AS $$
BEGIN
UPDATE employees
SET salary = salary * (1 + p_percentage/100)
WHERE employee_id = p_employee_id;
IF NOT FOUND THEN
RAISE NOTICE 'Employee % not found', p_employee_id;
END IF;
COMMIT;
END;
$$;
-- Call stored procedure
CALL update_salary(123, 10.0); -- Give 10% raise
-- Create function that returns a value
CREATE OR REPLACE FUNCTION calculate_bonus(
p_salary DECIMAL,
p_performance_rating INT
)
RETURNS DECIMAL
LANGUAGE plpgsql
AS $$
DECLARE
v_bonus DECIMAL;
BEGIN
CASE p_performance_rating
WHEN 1 THEN v_bonus := p_salary * 0.20; -- 20%
WHEN 2 THEN v_bonus := p_salary * 0.15; -- 15%
WHEN 3 THEN v_bonus := p_salary * 0.10; -- 10%
ELSE v_bonus := p_salary * 0.05; -- 5%
END CASE;
RETURN v_bonus;
END;
$$;
-- Use function in query
SELECT
name,
salary,
calculate_bonus(salary, performance_rating) as bonus
FROM employees;
Query caching stores the results of SQL queries so subsequent identical queries can be served faster without re-execution.
-- Check query cache status SHOW VARIABLES LIKE 'query_cache%'; -- Typical configuration in my.cnf [mysqld] query_cache_type = 1 -- Enable cache query_cache_size = 64M -- Cache size query_cache_limit = 2M -- Max result size to cache query_cache_min_res_unit = 4K -- Minimum allocation unit -- Check cache performance SHOW STATUS LIKE 'Qcache%'; -- Force query to use cache (MySQL 8+ removed query cache) SELECT SQL_CACHE * FROM employees WHERE department = 'Sales'; -- Force query to NOT use cache SELECT SQL_NO_CACHE * FROM employees WHERE department = 'Sales';
-- PostgreSQL relies on shared buffers and OS cache
-- Check buffer cache hit ratio
SELECT
sum(blks_hit) * 100.0 / (sum(blks_hit) + sum(blks_read)) as cache_hit_ratio
FROM pg_stat_database;
-- Use pg_prewarm to load data into cache
CREATE EXTENSION pg_prewarm;
SELECT pg_prewarm('employees');
-- Materialized views as manual cache
CREATE MATERIALIZED VIEW mv_employee_stats AS
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department;
-- Refresh cache periodically
REFRESH MATERIALIZED VIEW mv_employee_stats;
INNER JOIN returns only matching rows from both tables, while OUTER JOIN returns all rows from one or both tables with NULLs for non-matching rows.
-- Customers table customer_id | name ----------- | ---------- 1 | Alice 2 | Bob 3 | Charlie -- Orders table order_id | customer_id | amount -------- | ----------- | ------ 101 | 1 | 100 102 | 2 | 200 103 | 4 | 300 -- No matching customer
SELECT c.name, o.order_id, o.amount FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id; -- Result (2 rows): -- Alice - 101 - 100 -- Bob - 102 - 200
SELECT c.name, o.order_id, o.amount FROM customers c LEFT OUTER JOIN orders o ON c.customer_id = o.customer_id; -- Result (3 rows): -- Alice - 101 - 100 -- Bob - 102 - 200 -- Charlie - NULL - NULL
SELECT c.name, o.order_id, o.amount FROM customers c RIGHT OUTER JOIN orders o ON c.customer_id = o.customer_id; -- Result (3 rows): -- Alice - 101 - 100 -- Bob - 102 - 200 -- NULL - 103 - 300
SELECT c.name, o.order_id, o.amount FROM customers c FULL OUTER JOIN orders o ON c.customer_id = o.customer_id; -- Result (4 rows): -- Alice - 101 - 100 -- Bob - 102 - 200 -- Charlie - NULL - NULL -- NULL - 103 - 300
Triggers are database objects that automatically execute in response to specific events (INSERT, UPDATE, DELETE) on tables.
-- PostgreSQL trigger example
CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.modified_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_employees_modified
BEFORE UPDATE ON employees
FOR EACH ROW
EXECUTE FUNCTION update_modified_column();
-- Audit trail trigger
DELIMITER $$
CREATE TRIGGER audit_employee_changes
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
IF OLD.salary != NEW.salary THEN
INSERT INTO salary_audit (
employee_id,
old_salary,
new_salary,
changed_by,
change_date
) VALUES (
NEW.employee_id,
OLD.salary,
NEW.salary,
USER(),
NOW()
);
END IF;
END$$
DELIMITER ;
-- Instead of trigger
CREATE TRIGGER trg_instead_of_delete
ON employees
INSTEAD OF DELETE
AS
BEGIN
-- Archive instead of delete
INSERT INTO employees_archive
SELECT *, GETDATE(), SYSTEM_USER
FROM deleted;
-- Then delete from main table
DELETE e
FROM employees e
INNER JOIN deleted d ON e.employee_id = d.employee_id;
END;
Partitioning divides a large table into smaller, more manageable pieces while maintaining logical unity. Each partition can be stored and accessed separately.
-- Create partitioned table
CREATE TABLE sales (
sale_id SERIAL,
sale_date DATE NOT NULL,
amount DECIMAL(10,2),
region VARCHAR(50)
) PARTITION BY RANGE (sale_date);
-- Create partitions for each year
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE sales_2025 PARTITION OF sales
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
-- Create default partition
CREATE TABLE sales_default PARTITION OF sales DEFAULT;
-- Partition by region
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
region VARCHAR(50),
email VARCHAR(100)
)
PARTITION BY LIST COLUMNS(region) (
PARTITION p_north VALUES IN ('NY', 'NJ', 'CT'),
PARTITION p_south VALUES IN ('TX', 'FL', 'GA'),
PARTITION p_west VALUES IN ('CA', 'OR', 'WA'),
PARTITION p_other VALUES IN (DEFAULT)
);
CTEs and temporary tables both help organize complex queries but differ in scope, performance, and use cases.
-- Simple CTE
WITH high_earners AS (
SELECT employee_id, name, salary
FROM employees
WHERE salary > 100000
),
department_stats AS (
SELECT
department,
COUNT(*) as emp_count,
AVG(salary) as avg_salary
FROM employees
GROUP BY department
)
SELECT
h.name,
h.salary,
d.department,
d.avg_salary
FROM high_earners h
JOIN employees e ON h.employee_id = e.employee_id
JOIN department_stats d ON e.department = d.department;
-- Create temporary table
CREATE TEMPORARY TABLE temp_high_earners AS
SELECT employee_id, name, salary
FROM employees
WHERE salary > 100000;
-- Add index for performance
CREATE INDEX idx_temp_earners ON temp_high_earners(employee_id);
-- Use in multiple queries
SELECT * FROM temp_high_earners;
-- Complex operation on temp table
UPDATE temp_high_earners
SET salary = salary * 1.1
WHERE employee_id IN (
SELECT employee_id FROM employees WHERE performance_rating = 'A'
);
-- Clean up (optional, auto-dropped at session end)
DROP TABLE temp_high_earners;
UNION combines rows from multiple queries vertically, while JOIN combines columns from multiple tables horizontally.
-- Combine employees from different cities SELECT name, 'New York' as location FROM employees_ny UNION ALL SELECT name, 'San Francisco' as location FROM employees_sf UNION ALL SELECT name, 'Chicago' as location FROM employees_chi; -- UNION vs UNION ALL SELECT product_id FROM current_products UNION -- Removes duplicates SELECT product_id FROM discontinued_products; SELECT product_id FROM current_products UNION ALL -- Keeps duplicates SELECT product_id FROM popular_products;
-- Combine employee info with department info
SELECT
e.name,
e.salary,
d.department_name,
m.name as manager_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
LEFT JOIN employees m ON e.manager_id = m.employee_id;
-- Multiple JOIN types
SELECT
c.customer_name,
o.order_date,
p.product_name,
oi.quantity
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;
PIVOT rotates rows to columns (transposing), while UNPIVOT rotates columns to rows. Useful for reporting and data transformation.
-- Sample sales data
SELECT * FROM monthly_sales;
-- year | month | amount
-- 2024 | Jan | 1000
-- 2024 | Feb | 1500
-- 2024 | Mar | 1200
-- 2025 | Jan | 1100
-- 2025 | Feb | 1600
-- Pivot to show months as columns
SELECT *
FROM monthly_sales
PIVOT (
SUM(amount)
FOR month IN ([Jan], [Feb], [Mar], [Apr])
) AS pivoted_sales;
-- Result:
-- year | Jan | Feb | Mar | Apr
-- 2024 | 1000 | 1500 | 1200 | NULL
-- 2025 | 1100 | 1600 | NULL | NULL
-- Using CASE statements
SELECT
year,
SUM(CASE WHEN month = 'Jan' THEN amount END) as Jan,
SUM(CASE WHEN month = 'Feb' THEN amount END) as Feb,
SUM(CASE WHEN month = 'Mar' THEN amount END) as Mar,
SUM(CASE WHEN month = 'Apr' THEN amount END) as Apr
FROM monthly_sales
GROUP BY year;
-- Sample pivoted data
CREATE TABLE sales_pivot (
year INT,
Q1 DECIMAL,
Q2 DECIMAL,
Q3 DECIMAL,
Q4 DECIMAL
);
-- Unpivot quarterly data
SELECT year, quarter, amount
FROM sales_pivot
UNPIVOT (
amount FOR quarter IN (Q1, Q2, Q3, Q4)
) AS unpivoted_sales;
-- Result:
-- year | quarter | amount
-- 2024 | Q1 | 1000
-- 2024 | Q2 | 1500
-- 2024 | Q3 | 1200
-- 2024 | Q4 | 1800
A covering index includes all columns needed for a query, allowing the database to satisfy the query entirely from the index without accessing the table.
-- Query needing multiple columns
SELECT employee_id, name, department, salary
FROM employees
WHERE department = 'Engineering'
AND hire_date > '2023-01-01';
-- Non-covering index (requires table lookup)
CREATE INDEX idx_dept_hire ON employees(department, hire_date);
-- Covering index (includes all needed columns)
CREATE INDEX idx_covering ON employees(department, hire_date)
INCLUDE (employee_id, name, salary);
-- Or include columns in index key
CREATE INDEX idx_covering_complete ON employees(
department,
hire_date,
employee_id,
name,
salary
);
-- Without covering index: 1. Use idx_dept_hire to find matching rows 2. For each match, read table (heap/clustered index) to get employee_id, name, salary 3. Return results -- With covering index: 1. Use idx_covering to find matching rows 2. All needed columns are in index 3. Return results directly from index 4. No table access needed!
CHECK OPTION ensures that data inserted or updated through a view satisfies the view's defining condition, maintaining data integrity.
-- Create view for active employees
CREATE VIEW active_employees AS
SELECT employee_id, name, department, salary, hire_date
FROM employees
WHERE status = 'active'
WITH CHECK OPTION;
-- This will succeed (status = 'active')
INSERT INTO active_employees (name, department, salary, status)
VALUES ('John Doe', 'Engineering', 80000, 'active');
-- This will FAIL due to CHECK OPTION
INSERT INTO active_employees (name, department, salary, status)
VALUES ('Jane Smith', 'Sales', 70000, 'inactive');
-- Error: new row violates check option for view "active_employees"
-- Update that maintains view condition (succeeds)
UPDATE active_employees
SET salary = 85000
WHERE employee_id = 123;
-- Update that violates view condition (fails)
UPDATE active_employees
SET status = 'inactive'
WHERE employee_id = 123;
-- Error: new row violates check option
-- Base view
CREATE VIEW ny_employees AS
SELECT * FROM employees WHERE location = 'NY'
WITH CHECK OPTION;
-- View based on another view
CREATE VIEW ny_engineers AS
SELECT * FROM ny_employees WHERE department = 'Engineering';
-- WITH CASCADED CHECK OPTION (default)
CREATE VIEW ny_engineers_cascaded AS
SELECT * FROM ny_employees WHERE department = 'Engineering'
WITH CASCADED CHECK OPTION;
-- Insert must satisfy ALL underlying view conditions
INSERT INTO ny_engineers_cascaded (location, department)
VALUES ('NY', 'Engineering'); -- Success
INSERT INTO ny_engineers_cascaded (location, department)
VALUES ('CA', 'Engineering'); -- Fails (location β 'NY')
INSERT INTO ny_engineers_cascaded (location, department)
VALUES ('NY', 'Sales'); -- Fails (department β 'Engineering')
-- WITH LOCAL CHECK OPTION
CREATE VIEW ny_engineers_local AS
SELECT * FROM ny_employees WHERE department = 'Engineering'
WITH LOCAL CHECK OPTION;
-- Insert only needs to satisfy THIS view's condition
INSERT INTO ny_engineers_local (location, department)
VALUES ('CA', 'Engineering'); -- May succeed (depends on DB)
Aliases give temporary names to tables, columns, or expressions, improving query readability and enabling self-joins.
-- Basic column aliases
SELECT
first_name || ' ' || last_name AS full_name,
salary * 12 AS annual_salary,
department AS dept,
COUNT(*) AS employee_count
FROM employees
GROUP BY department;
-- Aliases with special characters or spaces
SELECT
name AS "Employee Name",
salary AS "Monthly Salary (USD)",
hire_date AS "Date Hired"
FROM employees;
-- Aliases in ORDER BY
SELECT name, salary * 1.1 AS new_salary
FROM employees
ORDER BY new_salary DESC;
-- Simple table aliases
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
-- Self-join using aliases (essential!)
SELECT
e.name AS employee_name,
m.name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
-- Multiple table aliases in complex queries
SELECT
c.name AS customer_name,
o.order_date,
p.name AS product_name,
oi.quantity,
s.name AS supplier_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
JOIN suppliers s ON p.supplier_id = s.supplier_id;
-- Subquery in FROM clause requires alias
SELECT dept_stats.department, dept_stats.avg_salary
FROM (
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS emp_count
FROM employees
GROUP BY department
) AS dept_stats
WHERE dept_stats.emp_count > 10;
-- CTE aliases
WITH department_summary AS (
SELECT
department,
AVG(salary) AS avg_sal,
MAX(salary) AS max_sal
FROM employees
GROUP BY department
)
SELECT * FROM department_summary
WHERE avg_sal > 60000;
Connection pooling maintains a cache of database connections that can be reused by multiple applications, reducing overhead of establishing new connections.
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://localhost:3306/mydb");
config.setUsername("user");
config.setPassword("password");
config.setMaximumPoolSize(20); // Max connections in pool
config.setMinimumIdle(10); // Minimum idle connections
config.setConnectionTimeout(30000); // 30 seconds timeout
config.setIdleTimeout(600000); // 10 minutes idle timeout
config.setMaxLifetime(1800000); // 30 minutes max lifetime
config.setConnectionTestQuery("SELECT 1");
HikariDataSource dataSource = new HikariDataSource(config);
import psycopg2
from psycopg2 import pool
# Create connection pool
connection_pool = psycopg2.pool.SimpleConnectionPool(
5, 20, # minconn, maxconn
host="localhost",
database="mydb",
user="user",
password="password",
port=5432
)
# Get connection from pool
conn = connection_pool.getconn()
try:
cur = conn.cursor()
cur.execute("SELECT * FROM employees")
results = cur.fetchall()
finally:
# Return connection to pool (don't close!)
connection_pool.putconn(conn)
Recursive queries use Common Table Expressions (CTEs) to query hierarchical or graph-like data, such as organizational charts or bill of materials.
-- Employees table with manager hierarchy
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT REFERENCES employees(employee_id)
);
-- Recursive CTE to get hierarchy
WITH RECURSIVE org_chart AS (
-- Anchor member: top-level employees (no manager)
SELECT
employee_id,
name,
manager_id,
1 as level,
name as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: join with previous level
SELECT
e.employee_id,
e.name,
e.manager_id,
oc.level + 1 as level,
oc.path || ' -> ' || e.name as path
FROM employees e
INNER JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT
employee_id,
name,
level,
path
FROM org_chart
ORDER BY path;
-- Parts explosion (component hierarchy)
CREATE TABLE bom (
assembly_id INT,
component_id INT,
quantity INT
);
WITH RECURSIVE parts_explosion AS (
-- Anchor: top-level assemblies
SELECT
assembly_id,
component_id,
quantity,
1 as level,
CAST(component_id AS VARCHAR) as path
FROM bom
WHERE assembly_id = 1000 -- Starting assembly
UNION ALL
-- Recursive: explode components
SELECT
b.assembly_id,
b.component_id,
pe.quantity * b.quantity as total_qty,
pe.level + 1 as level,
pe.path || '.' || CAST(b.component_id AS VARCHAR)
FROM bom b
INNER JOIN parts_explosion pe ON b.assembly_id = pe.component_id
)
SELECT * FROM parts_explosion
ORDER BY level, path;
-- Find all managers above an employee
WITH RECURSIVE manager_chain AS (
-- Start with specific employee
SELECT
employee_id,
name,
manager_id,
0 as level
FROM employees
WHERE employee_id = 123 -- Starting employee
UNION ALL
-- Move up to manager
SELECT
e.employee_id,
e.name,
e.manager_id,
mc.level + 1
FROM employees e
INNER JOIN manager_chain mc ON e.employee_id = mc.manager_id
)
SELECT * FROM manager_chain
ORDER BY level DESC;
Sharding horizontally partitions data across multiple databases/servers to distribute load and enable scalability beyond single server limits.
-- Shard by customer_id ranges
-- Shard 1: customer_id 1-1000000
-- Shard 2: customer_id 1000001-2000000
-- Shard 3: customer_id 2000001-3000000
-- Application determines shard based on range
function get_shard_for_customer(customer_id) {
if (customer_id <= 1000000) return 'shard1';
if (customer_id <= 2000000) return 'shard2';
return 'shard3';
}
-- Query specific shard
SELECT * FROM shard1.customers WHERE customer_id = 500000;
SELECT * FROM shard2.orders WHERE customer_id = 1500000;
-- Shard by hash of customer_id
-- Number of shards: 4
-- Shard = customer_id % 4
-- Consistent hashing algorithm
function get_shard_for_customer(customer_id) {
const shard_count = 4;
const hash = md5(customer_id.toString());
const shard_index = parseInt(hash.substring(0, 8), 16) % shard_count;
return `shard${shard_index + 1}`;
}
-- Hash sharding distributes evenly
-- customer_id 123 β shard3 (123 % 4 = 3)
-- customer_id 456 β shard1 (456 % 4 = 0)
-- customer_id 789 β shard2 (789 % 4 = 1)
-- Maintain shard mapping in lookup table
CREATE TABLE shard_mapping (
customer_id INT PRIMARY KEY,
shard_id INT,
shard_host VARCHAR(100),
shard_database VARCHAR(50)
);
-- Look up shard before querying
SELECT shard_host, shard_database
FROM shard_mapping
WHERE customer_id = 12345;
-- Result: 'db-server-3', 'customers_db_3'
-- Application connects to correct shard
-- Connection string: jdbc:mysql://db-server-3/customers_db_3
Analytic functions compute aggregate values while retaining individual row details, unlike aggregate functions that collapse multiple rows into one.
-- Returns single row with totals
SELECT
department,
COUNT(*) as emp_count,
AVG(salary) as avg_salary,
SUM(salary) as total_salary
FROM employees
GROUP BY department;
-- Result (collapsed):
-- department | emp_count | avg_salary | total_salary
-- Sales | 25 | 65000 | 1625000
-- Engineering| 50 | 85000 | 4250000
-- Returns all rows with additional analytic columns
SELECT
name,
department,
salary,
-- Analytic functions
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
SUM(salary) OVER (PARTITION BY department) as dept_total_salary,
salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank,
PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) as percentile,
LAG(salary, 1) OVER (PARTITION BY department ORDER BY hire_date) as prev_salary
FROM employees;
-- Result (individual rows preserved):
-- name | department | salary | dept_avg | dept_total | diff | rank | ...
-- John | Sales | 70000 | 65000 | 1625000 | 5000 | 1 | ...
-- Jane | Sales | 60000 | 65000 | 1625000 | -5000| 2 | ...
Connection multiplexing allows multiple logical client connections to share a single physical database connection, reducing connection overhead.
-- Traditional: Each client has dedicated connection Client1 β Connection1 β Database Client2 β Connection2 β Database Client3 β Connection3 β Database -- 3 physical connections, each with memory/CPU overhead -- Multiplexed: Multiple clients share connections Client1 β \ Client2 β β Multiplexer β Single Connection β Database Client3 β / -- 3 logical connections, 1 physical connection
// Pseudo-code for connection multiplexer
class ConnectionMultiplexer {
private ConnectionPool physicalPool;
private Map virtualConnections;
executeQuery(clientId, sql) {
// Get or create virtual connection for client
VirtualConnection vconn = virtualConnections.get(clientId);
if (vconn == null) {
vconn = new VirtualConnection(clientId);
virtualConnections.put(clientId, vconn);
}
// Borrow physical connection from pool
PhysicalConnection pconn = physicalPool.borrow();
try {
// Execute on physical connection
ResultSet rs = pconn.execute(sql);
// Map results back to virtual connection
return vconn.processResults(rs);
} finally {
// Return physical connection to pool
physicalPool.return(pconn);
}
}
}
SQL data types define what kind of data a column can hold. Choosing the right type affects storage, performance, and data integrity.
CREATE TABLE example_numbers (
-- Integer types
id INT PRIMARY KEY, -- 4 bytes, -2B to 2B
small_id SMALLINT, -- 2 bytes, -32K to 32K
big_id BIGINT, -- 8 bytes, huge range
-- Exact numeric (decimal)
price DECIMAL(10, 2), -- 10 total digits, 2 decimal places
precise NUMERIC(15, 5), -- Same as DECIMAL
-- Floating point (approximate)
weight FLOAT, -- 4 bytes, ~7 digits precision
measurement DOUBLE PRECISION, -- 8 bytes, ~15 digits precision
rating REAL -- 4 bytes, ~6 digits precision
-- Specialized
serial_id SERIAL, -- Auto-incrementing integer (PostgreSQL)
money_amount MONEY -- Currency type (SQL Server/PostgreSQL)
);
CREATE TABLE example_strings (
-- Fixed length (padded with spaces)
country_code CHAR(2), -- Exactly 2 characters
ssn CHAR(9), -- Exactly 9 characters
-- Variable length
name VARCHAR(100), -- Max 100 characters
description TEXT, -- Very large text (unlimited in some DBs)
-- Unicode support
unicode_name NVARCHAR(100), -- SQL Server unicode
postgres_unicode VARCHAR(100), -- PostgreSQL (always unicode)
-- Specialized
email VARCHAR(255) CHECK (email LIKE '%@%'),
url VARCHAR(2083), -- Max URL length
json_data JSON, -- JSON document (PostgreSQL/MySQL 5.7+)
xml_data XML -- XML document
);
CREATE TABLE example_dates (
-- Date only
birth_date DATE, -- '2023-12-25'
-- Time only
meeting_time TIME, -- '14:30:00'
meeting_time_tz TIME WITH TIME ZONE, -- '14:30:00-05'
-- Date and time
created_at TIMESTAMP, -- '2023-12-25 14:30:00'
updated_at TIMESTAMPTZ, -- With timezone
-- Specialized
appointment TIMESTAMP(0), -- No fractional seconds
interval_field INTERVAL, -- Time interval '1 day 2 hours'
-- Database-specific
year_field YEAR, -- MySQL only
datetime_field DATETIME, -- MySQL/SQL Server
smalldatetime_field SMALLDATETIME -- SQL Server
);
Cursors allow row-by-row processing of result sets, but they are generally inefficient and should be avoided in favor of set-based operations.
DECLARE @employee_id INT;
DECLARE @employee_name VARCHAR(100);
DECLARE @salary DECIMAL(10,2);
-- Declare cursor
DECLARE employee_cursor CURSOR FOR
SELECT employee_id, name, salary
FROM employees
WHERE department = 'Sales';
-- Open cursor
OPEN employee_cursor;
-- Fetch first row
FETCH NEXT FROM employee_cursor INTO @employee_id, @employee_name, @salary;
-- Process rows
WHILE @@FETCH_STATUS = 0
BEGIN
-- Process each row
IF @salary < 50000
BEGIN
UPDATE employees
SET salary = salary * 1.10
WHERE employee_id = @employee_id;
PRINT 'Updated salary for ' + @employee_name;
END
-- Fetch next row
FETCH NEXT FROM employee_cursor INTO @employee_id, @employee_name, @salary;
END
-- Clean up
CLOSE employee_cursor;
DEALLOCATE employee_cursor;
-- Same operation, set-based (much faster) UPDATE employees SET salary = salary * 1.10 WHERE department = 'Sales' AND salary < 50000; -- Print affected rows PRINT 'Updated ' + CAST(@@ROWCOUNT AS VARCHAR) + ' employees';
Parameter sniffing occurs when SQL Server creates an execution plan optimized for the first parameter values, which may be suboptimal for subsequent calls with different values.
-- Stored procedure with parameter
CREATE PROCEDURE GetOrdersByStatus
@status VARCHAR(20)
AS
BEGIN
SELECT * FROM orders
WHERE order_status = @status
ORDER BY order_date DESC;
END;
-- First execution (with 'Completed' - few rows)
EXEC GetOrdersByStatus @status = 'Completed';
-- Creates plan optimized for few rows (uses index seek)
-- Second execution (with 'Pending' - many rows)
EXEC GetOrdersByStatus @status = 'Pending';
-- Uses same plan (seek), but table scan would be better
-- Result: Poor performance for 'Pending' status
-- 1. Use OPTION (RECOMPILE) - recompile each time
CREATE PROCEDURE GetOrdersByStatus
@status VARCHAR(20)
AS
BEGIN
SELECT * FROM orders
WHERE order_status = @status
ORDER BY order_date DESC
OPTION (RECOMPILE); -- New plan each execution
END;
-- 2. Use OPTIMIZE FOR UNKNOWN
CREATE PROCEDURE GetOrdersByStatus
@status VARCHAR(20)
AS
BEGIN
SELECT * FROM orders
WHERE order_status = @status
ORDER BY order_date DESC
OPTION (OPTIMIZE FOR UNKNOWN); -- Use average statistics
END;
-- 3. Use OPTIMIZE FOR specific value
CREATE PROCEDURE GetOrdersByStatus
@status VARCHAR(20)
AS
BEGIN
SELECT * FROM orders
WHERE order_status = @status
ORDER BY order_date DESC
OPTION (OPTIMIZE FOR (@status = 'Pending')); -- Optimize for common case
END;
-- 4. Use local variables (disables sniffing)
CREATE PROCEDURE GetOrdersByStatus
@status VARCHAR(20)
AS
BEGIN
DECLARE @local_status VARCHAR(20) = @status;
SELECT * FROM orders
WHERE order_status = @local_status
ORDER BY order_date DESC;
END;
These operators compare a value to each value in a list or subquery, with different comparison semantics.
-- Find employees earning more than ANY manager
SELECT name, salary
FROM employees
WHERE salary > ANY (
SELECT salary
FROM employees
WHERE title = 'Manager'
);
-- Equivalent to: salary > MIN(manager_salaries)
-- Using SOME (synonym for ANY)
SELECT name, salary
FROM employees
WHERE salary > SOME (
SELECT salary
FROM employees
WHERE title = 'Manager'
);
-- ANY with IN-like behavior
SELECT product_name
FROM products
WHERE category_id = ANY (SELECT category_id FROM popular_categories);
-- Equivalent to: category_id IN (...)
-- Find employees earning more than ALL managers
SELECT name, salary
FROM employees
WHERE salary > ALL (
SELECT salary
FROM employees
WHERE title = 'Manager'
);
-- Equivalent to: salary > MAX(manager_salaries)
-- ALL with comparison
SELECT product_name, price
FROM products
WHERE price <= ALL (
SELECT price
FROM products
WHERE category_id = 5
);
-- Finds cheapest product in category 5
-- ALL with NOT
SELECT customer_name
FROM customers
WHERE customer_id != ALL (
SELECT customer_id
FROM orders
WHERE order_date > '2024-01-01'
);
-- Customers with no recent orders
-- Find departments where ALL employees earn > 50000
SELECT department
FROM employees e1
WHERE 50000 < ALL (
SELECT salary
FROM employees e2
WHERE e2.department = e1.department
)
GROUP BY department;
-- Using EXISTS alternative for ALL
SELECT department
FROM employees e1
WHERE NOT EXISTS (
SELECT 1
FROM employees e2
WHERE e2.department = e1.department
AND e2.salary <= 50000
)
GROUP BY department;
Database snapshots provide read-only, static views of a database at a point in time, using copy-on-write mechanism for storage efficiency.
-- Create database snapshot
CREATE DATABASE SalesDB_Snapshot_1200
ON (
NAME = SalesDB_Data,
FILENAME = 'D:\Snapshots\SalesDB_1200.ss'
)
AS SNAPSHOT OF SalesDB;
-- Multiple snapshots possible
CREATE DATABASE SalesDB_Snapshot_1400
ON (
NAME = SalesDB_Data,
FILENAME = 'D:\Snapshots\SalesDB_1400.ss'
)
AS SNAPSHOT OF SalesDB;
-- Query snapshot like regular database
SELECT * FROM SalesDB_Snapshot_1200.dbo.orders;
-- Restore database from snapshot
USE master;
RESTORE DATABASE SalesDB
FROM DATABASE_SNAPSHOT = 'SalesDB_Snapshot_1200';
-- Drop snapshot
DROP DATABASE SalesDB_Snapshot_1200;
-- Before modification: -- Source page: [A B C D] -- Snapshot: [empty] -- After first modification (change C to X): -- Source saves original page to snapshot -- Source page: [A B X D] (changed) -- Snapshot: [A B C D] (original saved) -- Subsequent reads from snapshot: -- If reading unchanged data (A,B,D): read from source -- If reading changed data (C): read from snapshot -- Storage efficiency: Only changed pages stored -- If 1% of pages change, snapshot uses 1% of source size
SQL provides various mathematical and statistical functions for numeric data analysis and calculations.
SELECT
-- Basic arithmetic
10 + 5 as addition,
10 - 5 as subtraction,
10 * 5 as multiplication,
10 / 5 as division,
10 % 3 as modulus,
-- Power and roots
POWER(2, 3) as power, -- 2^3 = 8
SQRT(16) as square_root, -- 4
EXP(1) as exponential, -- e^1 β 2.718
LOG(100, 10) as logarithm, -- logββ100 = 2
LN(EXP(1)) as natural_log, -- ln(e) = 1
-- Rounding
ROUND(123.4567, 2) as rounded, -- 123.46
CEILING(123.45) as ceiling, -- 124 (smallest integer β₯ value)
FLOOR(123.45) as floor, -- 123 (largest integer β€ value)
TRUNC(123.456, 1) as truncate, -- 123.4 (PostgreSQL)
-- Sign and absolute
ABS(-123) as absolute, -- 123
SIGN(-123) as sign, -- -1
SIGN(0) as sign_zero, -- 0
SIGN(123) as sign_positive; -- 1
SELECT
-- Angles in radians
PI() as pi, -- 3.14159
RADIANS(180) as radians, -- Ο
DEGREES(PI()) as degrees, -- 180
-- Basic trig
SIN(PI()/2) as sine, -- 1
COS(PI()) as cosine, -- -1
TAN(PI()/4) as tangent, -- 1
-- Inverse trig
ASIN(1) as arcsine, -- Ο/2
ACOS(0) as arccosine, -- Ο/2
ATAN(1) as arctangent, -- Ο/4
-- Hyperbolic
SINH(1) as hyperbolic_sine,
COSH(0) as hyperbolic_cosine,
TANH(0.5) as hyperbolic_tangent;
-- Basic statistics
SELECT
AVG(salary) as mean_salary,
MEDIAN(salary) as median_salary, -- Some databases
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) as q1,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) as q3,
STDDEV(salary) as std_deviation,
VARIANCE(salary) as variance,
CORR(salary, commission) as correlation,
COVAR_POP(salary, commission) as covariance
FROM employees;
-- Window statistical functions
SELECT
name,
salary,
AVG(salary) OVER () as overall_mean,
salary - AVG(salary) OVER () as diff_from_mean,
(salary - AVG(salary) OVER ()) / STDDEV(salary) OVER () as z_score
FROM employees;
Failover automatically switches to a standby database when the primary fails, ensuring high availability and minimal downtime.
-- Configure database mirroring
ALTER DATABASE MyDB SET PARTNER = 'TCP://primary-server:5022';
ALTER DATABASE MyDB SET WITNESS = 'TCP://witness-server:5023';
-- On secondary server
ALTER DATABASE MyDB SET PARTNER = 'TCP://secondary-server:5022';
-- Check mirroring status
SELECT
database_name,
mirroring_state_desc,
mirroring_role_desc,
mirroring_safety_level_desc
FROM sys.database_mirroring
WHERE mirroring_state IS NOT NULL;
-- Manual failover
ALTER DATABASE MyDB SET PARTNER FAILOVER;
-- Primary server configuration (postgresql.conf) wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB -- Primary pg_hba.conf host replication repuser secondary-ip/32 md5 -- Secondary setup # Create base backup pg_basebackup -h primary-host -D /var/lib/postgresql/data -U repuser -v -P -- Secondary recovery.conf (PostgreSQL 12+ uses postgresql.conf) primary_conninfo = 'host=primary-host user=repuser password=secret' promote_trigger_file = '/tmp/promote_to_primary' -- Check replication status on primary SELECT * FROM pg_stat_replication; -- Promote secondary to primary # On secondary server touch /tmp/promote_to_primary # Or: pg_ctl promote -D /var/lib/postgresql/data
-- JDBC connection string with failover
jdbc:postgresql://primary:5432,secondary:5432/mydb
?targetServerType=primary
&loadBalanceHosts=true
&hostRecheckSeconds=10
-- Application logic
try {
// Connect to primary
executeQuery();
} catch (SQLException e) {
// Connection failed, retry with secondary
updateConnectionString();
executeQuery();
}
Materialized view logs track changes to base tables, enabling fast incremental refreshes instead of complete rebuilds.
-- Create materialized view log on base table
CREATE MATERIALIZED VIEW LOG ON employees
WITH ROWID, SEQUENCE
(employee_id, department_id, salary, hire_date)
INCLUDING NEW VALUES;
-- Create fast refreshable materialized view
CREATE MATERIALIZED VIEW mv_employee_summary
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT
department_id,
COUNT(*) as emp_count,
AVG(salary) as avg_salary,
SUM(salary) as total_salary
FROM employees
GROUP BY department_id;
-- Manual refresh
BEGIN
DBMS_MVIEW.REFRESH('mv_employee_summary', 'F');
-- 'F' = Fast refresh (incremental)
-- 'C' = Complete refresh (rebuild)
-- '?' = Force refresh (try fast, fallback to complete)
END;
-- Check refresh status
SELECT mview_name, last_refresh_type, staleness
FROM user_mviews;
-- 1. Materialized view log captures changes -- When base table changes: -- INSERT: Logs new row with ROWID and values -- UPDATE: Logs old and new values -- DELETE: Logs deleted row ROWID -- 2. Refresh uses log to update materialized view -- Instead of: SELECT * FROM base_table (full scan) -- It does: Apply logged changes to materialized view -- Example: Add 3 rows, update 2 rows, delete 1 row -- Complete refresh: Processes all 1,000,000 rows -- Fast refresh: Processes only 6 changed rows -- 3. Log cleaned after refresh DELETE FROM mlog$_employees WHERE ...;