Back to feed
Scrapbook

[Dev.] Facebook's memory allocator jemalloc ( http://dev.paran.com/ )

NS
normalstory
cover image

[Dev.] Facebook's memory allocator jemalloc

 

The key summary for busy readers

jemalloc, introduced by Facebook, is - alongside Google's tcmalloc - one of the rising memory allocators ( malloc ) these days. Both let you get tens of percent of performance improvement just by adding a single line before executionwithout modifying the existing binary. Definitely test them and start using them.

 

Introduction

Earlier this year, the post by Jason Evans at Facebook titled "Facebook gained a speed boost by switching its memory allocator to jemalloc" was making the rounds across developers' Twitter timelines. It felt like the company that gets people talking - whatever they do - had moved from Google to Facebook. So what is the jemalloc they're using?

 

Malloc

No matter what a program is trying to do, the very first task is to grab memory from the system. You need a canvas before you can paint. That's why malloc, which allocates memory, is the call most heavily used by C and C++ programmers. The challenge of building an efficient malloc is one that the masters are still pursuing today. Since it's called tens or hundreds of millions of times during a program's run, just using a faster malloc - without touching the source - speeds up the entire program.

 

The renewed importance of malloc

malloc has always been important, but in modern server programs running on multicore, multithreaded environments, it's becoming even more important for the following reasons.

Speed - Recently, a single program uses many threads, and each thread is distributed and run across multiple CPUs. Even in this situation, distributing memory efficiently isn't easy. As you start handling many threads, the performance of existing malloc libraries begins to crumble. For instance, glibc's default malloc on Linux falls to about 60% of its peak performance once you start running 8 or more threads. Naturally, the program using it also suffers a sharp performance drop. Countermeasures are needed.

Space efficiency - malloc is similar to staking out areas on a canvas to draw on. If you grab regions on the canvas haphazardly, holes get punched all over the place. Over a long period of usage, while there's still plenty of total area left to paint, big single-block free space gradually disappears, so you end up with space you can't actually use (fragmentation). For long-running server programs, this is especially fatal. You've probably seen the situation where there's plenty of memory left, but the system dies because of a lack of memory space - this is exactly that case. So a malloc that preserves large contiguous areas well, even after long periods of allocations and frees, has become even more important.

In reality, speed and space efficiency are like trying to chase two rabbits at once - hard to achieve simultaneously. But thanks to the great efforts of software engineers, mallocs that can chase both are gradually emerging. The jemalloc I'll introduce today is exactly one of those examples.

 

Notable mallocs

jemalloc didn't just fall out of the sky. It follows the lineage of malloc development history. Setting aside the very ancient history, the most-used mallocs of recent years are the following.

dlmalloc - a malloc made by Doug Lea. Not fast, and since it was made in earlier days, multicore and multithread concepts weren't considered. But many subsequent mallocs are based on it. By the way, Doug Lea is a master of Java Concurrency. The book he wrote in 2006, Java Concurrency in Practice is still very strongly recommended for anyone doing server-side programming in Java. (A Korean translation is also available.)

ptmalloc - the malloc included in glibc. In other words, it's effectively the Linux standard. Built on dlmalloc with multicore and multithread concepts considered. The arena concept of jemalloc, which I'll explain later, was actually first introduced in ptmalloc2. It's not the fastest malloc, but it shows average performance for general-purpose use, so it's still adopted as the default in Linux glibc. It's a stiff, by-the-book honor student.

tcmalloc - a malloc made by Google's Sanjay Ghemawat. A malloc that thrilled many people by declaring "When Google makes one, even malloc is different." The name itself is thread caching malloc, so threads were considered very heavily, and it's much faster than ptmalloc. As a bonus, when you use this malloc, you also get a variety of Google's program analysis and tuning tools. They're excellent. By the way, this guy also made the Google File System, MapReduce, and BigTable. He's the person who built Google's infrastructure. He's closer to a monster than a person.

 

Now then, let's really take a look at jemalloc.

-> For the full content, see the original ^^  I'm introducing you to a good place. ha
     http://dev.paran.com

This English version was translated by Claude.

친절한 찰쓰씨
Written by
친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

More on the author's page

Keep reading

Scrapbook

What rich people work harder at than making money: keeping the maker and the money-earner separate is the key!

Sep 20, 2025·1 min
Scrapbook

Me, who doesn't know when to let go in life

Sep 20, 2025·1 min
Scrapbook

Passion is not intensity, it's grit

Sep 20, 2025·1 min