Table of Contents
One of the most common and most deadly errors you can make: using length instead of count. You can repeat this multiple times, but you will always find someone who'll use it the way it shouldn't be used.
So, first just to make it clear:
#count - collection.count
- Counts number of elements using SQL query (SELECT COUNT(*) FROM...)
- #count result is not stored internally during object life cycle, which means, that each time we invoke this method, SQL query is performed again
- count is really fast comparing to length
2.1.2 :048 > collection = User.all; nil => nil 2.1.2 :049 > collection.count (0.7ms) SELECT COUNT(*) FROM `users` => 16053 2.1.2 :050 > collection.count => 16053
#length - collection.length
- Returns length of a collecion without performing additional queries... as long as collection is loaded
- When we have lazy loaded collection, length will load whole colletion into memory and then will return length of it
- Might use all of your memory when used in a bad way
- Really fast when having a eagerly loaded collection
2.1.2 :055 > collection = User.all; nil => nil 2.1.2 :056 > collection.length User Load (122.9ms) SELECT `users`.* FROM `users` => 16053 2.1.2 :057 > collection = User.all; nil => nil 2.1.2 :058 > collection.to_a; nil User Load (140.9ms) SELECT `users`.* FROM `users` => nil 2.1.2 :059 > collection.length => 16053 2.1.2 :060 > collection.length => 16053
#size - collection.size
- Combines abilities of both previous methods;
- If collection is loaded, will count it's elements (no additional query)
- If collection is not loaded, will perform additional query
2.1.2 :034 > collection = User.all; nil => nil 2.1.2 :035 > collection.count (0.3ms) SELECT COUNT(*) FROM `users` => 16053 2.1.2 :036 > collection.count (0.3ms) SELECT COUNT(*) FROM `users` => 16053 2.1.2 :037 > collection.size (0.2ms) SELECT COUNT(*) FROM `users` => 16053 2.1.2 :038 > collection.to_a; nil User Load (64.2ms) SELECT `users`.* FROM `users` => nil 2.1.2 :039 > collection.size => 16053
Why would you even care?
Well it might have a huge impact on your apps performance (and resource consumption). In general if you don't want to care at all and you want to delegate this responsibility to someone else, use #size. If you want to care, then play with it and understand how it works, otherwise you might end up doing something like this:
print "We have #{User.all.length} users!"
And this is the performance difference on my computer (with only 16k users):
user system total real count 0.010000 0.000000 0.010000 ( 0.002989) length 0.730000 0.060000 0.790000 ( 0.846671)
Nearly 1 second to perform such simple task. And this could have a serious impact on your web app! Keep that in mind.
September 16, 2014 — 03:26
I always use #size since I know this.
Thanks for the tip!
September 18, 2014 — 08:10
Good post. Thanks!
September 18, 2014 — 14:59
You should test SQL performance too.
When you do User.all.length AR fetches all records from database and then works with plain ruby array.
When you do User.all.count AR adds count(*) and delegates the count logic to SQL server. In 90% cases it will be faster, but if you do a complex query with joins, the count() function can raise unexpected performance issues.
September 18, 2014 — 17:16
It should be mentioned that `size` also integrates seamlessly with counter_cache columns. That is, it will not perform COUNT query if counter_cache column exists for a collection.
September 20, 2014 — 15:31
+1 for testing SQL performance. With large tables, of course loading the entire data set will be slower (or even impossible), but that does NOT mean count(*) will be fast – even if there is no where condition. Google for your specific MySQL version, but in general “naked” (SELECT COUNT(*) FROM table) will use the primary (clustered) index, which isn’t fast at all.
(MyISAM excels at this, because it stores the row count on the database header, so this operation is always constant time and extremely fast. But I see less and less installations using MyISAM nowadays)
September 22, 2014 — 15:43
I have recently seen all those three returning different results when used with `joins`: 100, nil and some kind of exception. Careful!
December 28, 2016 — 00:54
I just ran into a unique situation where all three of these methods returned different results and I had to go through all of them until I found the answer I wanted. I needed to find out how many unique searches have been formed with combination of keywords and location fields:
Search.select(:keywords, :location).distinct
Which generates the following SQL, as desired:
SELECT DISTINCT `searches`.`keywords`, `searches`.`location` FROM `searches`
For the purposes of this example let’s say we had performed 10 total searches and there were 3 unique combinations of keywords and locations.
Trying to get the total via #count returns 1:
Search.select(:keywords, :location).distinct
SELECT DISTINCT COUNT(DISTINCT keywords, location) FROM `searches`
Trying to get the total via #size returned 10:
Search.select(:keywords, :location).size
SELECT DISTINCT COUNT(DISTINCT `searches`.`id`) FROM `searches`
And finally #length, which was the correct answer of 3:
Search.select(:keywords, :location).length
SELECT DISTINCT `searches`.`keywords`, `searches`.`location` FROM `searches`