SQL count(*) performance

In Azure, the performance of index and table scans is negatively impacted. To improve this, a ‘useless’ addition to the query can force an index seek on the clustered index. When using nonclustered indexes, it’s beneficial to consider using them for columns with a high number of distinct values, such as a combination of first and last names (if a clustered index is already in use for other columns).


Question:

My SQL table BookChapters contains more than 20 million rows and it has a clustered primary key (bookChapterID) but no other keys or indexes. Running the query takes only milliseconds.

if (select count(*) from BookChapters) = 0
...

On the other hand, modifying it in this way consumes more than 10 minutes.

if (select count(*) from BookChapters) = 1
...

or

if (select count(*) from BookChapters) > 1
...

What is the reason for the delay and what steps can I take to speed up the execution of

select count(*)

?



Solution 1:

Below,
mikael eriksson
provides a sound reasoning for the quickness of the initial inquiry.

Instead of counting all the rows in the table,
SQL Server
can be optimized by looking for the presence of just one row using

if exists(select * from BookChapters)

.

When executing the other two queries, SQL Server follows a specific rule. In the case of a query like

SELECT COUNT(*)

, the server will utilize the most narrow non-clustered index to determine the row count. In the absence of a non-clustered index, the server will have to scan the entire table.

Additionally, in case your table has a clustered index, you can avail a quicker count by implementing the following query that has been taken from the website “Get
Row Counts
Fast!

--SQL Server 2005/2008
SELECT OBJECT_NAME(i.id) [Table_Name], i.rowcnt [Row_Count]
FROM sys.sysindexes i WITH (NOLOCK)
WHERE i.indid in (0,1)
ORDER BY i.rowcnt desc
--SQL Server 2000
SELECT OBJECT_NAME(i.id) [Table_Name], i.rows [Row_Count]
FROM sysindexes i (NOLOCK)
WHERE i.indid in (0,1)
ORDER BY i.rows desc

The
sysindexes
system table is utilized and further information can be obtained for SQL Server 2000, SQL Server 2005, SQL Server 2008, and SQL Server 2012.

Another solution for the slow running
SELECT COUNT
(*) can be found in this link. The technique demonstrated in the link is the one utilized by Microsoft to promptly exhibit the number of rows by right-clicking on the table and choosing properties.

select sum (spart.rows)
from sys.partitions spart
where spart.object_id = object_id(’YourTable’)
and spart.index_id < 2

Regardless of the number of tables you have, you will notice that the return is very prompt.

You can obtain the count by querying the sysindexes table if you are still using SQL 2000.

select max(ROWS)
from sysindexes
where id = object_id(’YourTable’)

The accuracy of this number may vary slightly based on the frequency of SQL updates to the sysindexes table, but it is typically correct or at least close enough.


Solution 2:


If you are only interested in obtaining the number of rows, give this a try.

exec sp_spaceused [TABLE_NAME]


Solution 3:


By inspecting the execution plans of your queries, you can gain insight into the processes taking place.

The query optimizer identifies your initial query,

if (select count(*) from BookChapters) = 0

, as equivalent to

if exists(select * from BookChapters)

. SQL Server assumes that the expression is correct if there is at least one row, rather than counting all rows in the table, and therefore, it searches for the presence of one row.

Regarding your additional inquiries, the system may not possess the required level of intelligence and would need to tally the quantity of rows in the table before determining whether the expression is true or false.


Solution 4:

Have you taken into account the possibility of utilizing query

select count(BookChapterId) from BookChapterTable

? This query involves a non-clustered index on the BookChapterId, which can significantly enhance its performance.

The crucial factor might be the utilization of the table and comparing it with a non-clustered index, as stated in some points from MDSN.

  • Gain knowledge about the way your data will be accessed before generating nonclustered indexes. Give a thought to using nonclustered indexes in scenarios such as:
  • When a column has a high number of
    distinct values
    , like a combination of first and last names, a clustered index may be used for other columns. However, if there are only a few distinct values, like 1 and 0, using the index may not be efficient for most queries and a table scan is usually preferred.
  • Queries that produce small result sets.
  • The columns that are often used in search conditions of a query’s WHERE clause to produce precise results.
  • For decision-support-system applications that necessitate joins and grouping, it is recommended to generate several nonclustered indexes on the columns implicated in these operations. Additionally, a clustered index should be created on any foreign key columns.
  • Including every column of a table in a query avoids the need to access the table or clustered index.

Frequently Asked Questions

Posted in Sql