Closer to Code

import re from urlparse import urlparse, urljoin, urlunparse def expand_url(home, url): if re.match(r"^\w+\://", url): return url else: parts = home.split('/') if len(parts) > 2: if re.match(r"^/", url): return "%s//%s%s" % (parts[0], parts[2], url) else: url = url.split('/') if url[0] == '.': del(url[0]) proto = parts.pop(0) return "%s//%s" % (proto, "/".join(parts[1:-1] + url)) else: return False

import posixpath from urlparse import urlparse, urljoin, urlunparse def expand_url(home, url): join = urljoin(home,url) url2 = urlparse(join) path = posixpath.normpath(url2[2]) return urlunparse( (url2.scheme,url2.netloc,path,url2.params,url2.query,url2.fragment) )

Where is my memory?

Recently, when browsing large dataset from MongoDB using Padrino and Thin, Ruby started to have memory leaks. After each request it grew approximately 2-5 MB.

I've started debugging by putting following line in my action, to see memory usage increase per request:

puts 'RAM USAGE: ' + `pmap #{Process.pid} | tail -1`[10,40].strip

Results:

RAM USAGE: 796156K
RAM USAGE: 798284K
RAM USAGE: 798824K
RAM USAGE: 799088K
RAM USAGE: 799900K
RAM USAGE: 799900K
RAM USAGE: 812044K
RAM USAGE: 816152K
RAM USAGE: 816292K
RAM USAGE: 816836K
RAM USAGE: 818956K
RAM USAGE: 819088K
RAM USAGE: 830572K
RAM USAGE: 884604K
RAM USAGE: 887648K
RAM USAGE: 892800K
RAM USAGE: 897160K
RAM USAGE: 906960K

As you can see it grows rapidly. When looking at htop things get even worse:

88,4 MB
93,9 MB
97,5 MB
99,2 MB
109,4 MB
113,4 MB
122,7 MB
127,1 MB
...
1,2 GB!

It was definitely too much! Memory consumption reached it's limits and everything slowed down.

I knew that it had something to do with this line:

@analyses = Analysis.finished.page(params[:page] ||= 1).per(10)

Kaminari?

At the beginning I've suspected Kaminari and its pagination engine, however it is just a more complex layer covering some scopes. To check this I've removed Kaminari:

@analyses = Analysis.finished.skip(((params[:page] ||= 1)-1)*10).limit(10)

Unfortunately nothing good happened and memory consumption kept growing with same speed. Interesting is that, when I've turned off all MongoDB indexes:

db.collection1.dropIndexes()
db.collection2.dropIndexes()
...
db.collectionN.dropIndexes()

memory usage grew much slower than before. So WTF?

Identity map!

Finally I've discovered damn source of my problem. It was identity map in Mongoid. What is identity map?

The identity map in Mongoid is a current aid to assist with excessive database queries in relations, and is necessary for eager loading to work. (...) When a document is now loaded from the database, is is automatically added to the identity map by it's class and id. Subsequent request for that document by it's id will not hit the database, but rather pull the document back from the identity map itself. It's primary function in this capacity is to aid in cutting down queries for belongs_to relations when iterating over the parents.

Seems like identity map was never cleared (or it has a memory leak bug in it). Adding:

use Rack::Mongoid::Middleware::IdentityMap

didn't help at all so I've just turned identity map off:

mongo.identity_map_enabled = false

and everything went back to normal. Interesting thing is that identity map in ActiveRecord is by default turned off in Rails because it's known to cause similar problems.

Page 104 of 172

Relative and absolute urls expanding in Python