Query Optimization is the process of changing a query so that SQL Server is able to return the same results more efficiently. This process also reduces the load on the database.
Below are 2 query examples that return the exact same data.
First a sub-optimal query, followed by a query with optimized SQL.
Problem: List orders with customer name and number of items in the order.
SELECT FirstName = (SELECT FirstName FROM Customer C WHERE C.Id = O.CustomerId), LastName = (SELECT LastName FROM Customer C WHERE C.Id = O.CustomerId), OrderNumber, OrderDate, Items = (SELECT COUNT(Id) FROM OrderItem I WHERE I.OrderId = O.Id) FROM [Order] O ORDER BY OrderNumber
A query with 3 correlated subqueries getting data from Customer and OrderItem tables.
Next, an optimized query returning the exact same results.
SELECT FirstName, LastName, OrderNumber, OrderDate, COUNT(I.Id) AS Items FROM [Order] O JOIN Customer C ON C.Id = O.CustomerId JOIN OrderItem I ON I.OrderId = O.Id GROUP BY FirstName, LastName, OrderNumber, OrderDate ORDER BY OrderNumber
A query with 3 JOIN operations and an Aggregate function. With proper indexing on primary keys, foreign keys, and aggregate columns, this query will be extremely fast, even with very large tables.
Optimization is the process of writing effective SQL to get fast and accurate results.
Minor changes can often have a dramatic impact on query performance.
Optimization involves indexing, selection limits, matching records, and more.
Fundamental factors for slow queries are 1) poorly designed data model, and 2) missing indexes.
Below are some tips and techniques on improving query and datbase performance.
An index is a data structure that speeds up data retrieval from a database tables.
Indexes are also used to define primary keys with unique indexes to avoid duplicate key values.
Using proper indexing allows for very fast SELECT and ORDER BY operations.
Indexing may slow down
When inserting a batch of a million records, it may be best to drop the index and then re-create it afterwards.
This diagram outlines the logic flow of adding indexes to a table.
SELECT * use
SELECT column_1, column_2, ....
SELECT * may return unused columns which negatively affects bandwidth and database resources.
When table changes are made,
SELECT * may automatically return columns that are undesirable.
DISTINCT keyword removes duplicates from a query.
To ensure there are no duplicates requires extensive processing power.
You can select more fields from the database table to create more unique results instead of using
SELECT DISTINCT FirstName, LastName, Country FROM Customer
This query will group duplicate first name, last name, and country customers causing inaccurate results.
Imagine having John Smith or Jane Smith multiple times with exactly the same country origin but different customers.
For larger sets of data, grouping these duplicates will cause the query to be executed slowly.
SELECT FirstName, LastName, City, Country FROM Customer
By omitting the
DISTINCT and adding more fields to compare, the records will be easily identified as duplicated or not. The database will not group the records as well so same names and address won't be an issue.
Coding SQL queries in loop slows down the entire sequence.
Instead of writing a query that loops an
INSERT statement, you can use bulk insert or update.
This way as well the application will not handle another function to loop and call the database to perform action everytime it is looped.
Consider this query below:
SELECT Product.ProductName, Supplier.CompanyName FROM Product, Supplier WHERE Product.SupplierId = Supplier.Id
This type of query is a cartesian join. In this type of join, all possible combinations are created which means it will retrieve all records first then filter it from there. This will cause the database to slow down especially for large-scale databases. To prevent this
INNER JOIN can be used.
SELECT Product.ProductName, Supplier.CompanyName FROM Product INNER JOIN Supplier ON Product.SupplierId = Supplier.Id
In here, the database will only generate the desired number of records that has the same Supplier Id. Some database management system automatically recognizes the
WHERE joins and run them like
INNER JOIN but some does not so it is recommended to use JOIN instead.
COUNT() function is used to check if a record exists from the database.
This function scans the entire table even if the record is found on the first row which makes it efficient.
Instead of using
EXISTS() function can be used so when a record is found, the execution stops as well and return the result right away.
HAVING statements are all calculated after the
If the goal of the
HAVING statement is just to filter result, it is recommended to use the
WHERE statement instead for higher efficiency.
For instance, if you wish to see a date range, the
HAVING statement will retrieve all records then filter them while
WHERE statement instantly limits the number of records pull based on the condition.
Correlated subqueries is dependent on the parent or outer query.
Since this type of query executes row by row, it decreases overall database performance.
SELECT ProductName, SupplierName = (SELECT CompanyName FROM Supplier WHERE Product.SupplierId = Supplier.Id) FROM Product
The query above may have performance issues as it will run the inner query for each product row. This can easily be solved by using a JOIN instead.
Wildcards create a widest search possible which is also considered as the most inefficient type of search.
When using a lot of wildcards to search for a string, the database is tasked to search records for a match anywhere.
As much as possible, try to limit using wildcard searching.
The lesser data is retrieved, the faster the query will execute.
Rather than applying many filters or conditions to the actual query, you can do this on the server or application side first which will limit the number or results the database has to retrieve.
You can also use the
LIMIT statement to limit the number of results to return.
Having pagination as well on the application can help increase the database performance.