Table of Contents
- 1 Let's try to pick any similar first (without similarity level)
- 2 Mongo Aggregation Framework to the rescue
- 2.1 Step 1 - Excluding
- 2.2 Step 2 - All articles with at least one similar tag
- 2.3 Step 3 - Unwind by tags
- 2.4 Step 4 - Second matching You may wonder, why we filter results again. Well The initial filtering was not required, but we did this to remove all non-related articles, so the data set is much smaller. Unfortunately unwind created document copy per each of the tags - even those that we don't want to. That's why we have to filter it again. "$match" => { tags: { "$in" => %w{ ruby rails mongoid mongodb } } } Note that we don't need to filter out again by ID, since in incoming dataset we already don't have the current_article document instance. Step 5 - Grouping
- 2.5 Step 6 - Sorting
- 2.6 Step 7 - 10 first elements
- 3 Making it all work together
So, lets say we want have an Article model with tags array:
class Article include Mongoid::Document include Mongoid::Timestamps field :content, type: String, default: '' field :tags, type: Array, default: [] end
Let's try to pick any similar first (without similarity level)
We have an article, that has some tags (%w{ ruby rails mongoid mongodb }) and we would like to get similar articles. Nothing special (yet):
current_article = Article.first similar = Article.in tags: current_article.tags
Let's also pick elements without our base article (current_article) though we decided to get similar articles, not similar or equal:
Article .ne(_id: current_article.id) .in(tags: current_article.tags)
We could even refactor it a bit...
class Article include Mongoid::Document include Mongoid::Timestamps scope :exclude, -> article { ne(_id: article.id) } scope :similar_to, -> article { exclude(article).in(tags: article.tags ) } field :content, type: String, default: '' field :tags, type: Array, default: [] def similar @similar ||= self.class.similar_to self end end # Example usage: current_article.similar #=> [Article, Article]
Seems pretty decent, but this won't give us most similar articles. It will just return most recent, that have equal at least one tag with our current_article. What should we do then?
Mongo Aggregation Framework to the rescue
To get such information, sorted in a proper way, we need to perform following steps:
- Don't include current_article in resultset
- Get all articles (except current one), that have at least one tag as current_article (we did this earlier)
- Count how many similar tags occurred in each of articles
- Sort articles by similarity
- Take first 10 articles
Step 1 - Excluding
# Mongoid Article.where(id: {"$ne" => current_article.id}) # Mongo (this is still in Ruby - not in Mongo shell!) "$match" => { _id: { "$ne" => current_article.id } }
Step 2 - All articles with at least one similar tag
# Mongoid Article.in(tags: current_article.tags ) # Mongo (this is still in Ruby - not in Mongo shell!) "$match" => { tags: { "$in" => %w{ ruby rails mongoid mongodb } } }
If you're not familiar with unwind look here. That way, we get article copy for every tag for each article.
{ "$unwind" => "$tags" }
March 24, 2015 — 11:52
How about getting the sum of the two first elements of an array (in every document) without using $unwind?