The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.
We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.
Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.
Append-only files in Redis and write-ahead logs in Tarantool options enable data persistence for current databases. Comparisons only for similar configurations of different databases are described in this paper. It means we don’t compare, for example, Redis with enabled append-only files and Tarantool with disabled write-ahead logs.
Yahoo! Cloud Serving Benchmark, or YCSB is a powerful utility for performance measuring of a wide range of NoSQL databases including in-memory and on-disk solutions. YCSB is a branch standard for performance measuring of NoSQL solutions, which is why we are using it. We are interested in Redis and Tarantool drivers which are included in YCSB and the Memcached driver which is created by us based on the spymemcached library. The source of this YCSB branch can be seen here.
YCSB provides few core workload types that are presented in its own directory as configuration files. There are six major workload types named by letters from A to F.
Workload A is an update heavy workload. It has a mix of 50/50 reads and writes. An application example is a session store recording recent actions. Workload B is primarily a read workload. It has a 95/5 read/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags. Workload C is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop). In Workload D, new records are inserted and the most recently inserted records are the most popular. Application example: user status updates; people want to read the latest. In Workload E, short ranges of records are queried instead of individual records. Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id). In Workload F, the client will read a record, modify it, and write back the changes. Application example: user database, where user records are read and modified by the user or to record user activity.
We have changed two parameters in each of these configuration files: recordcount to 2000000 and operationcount to 5000000. YCSB is a multithreaded tester and we start it with 8, 16, 32, 64, 128 and 256 threads.
Now we will show and describe some packs of plots drawn by us in R. Sources of plot scrips can be downloaded here.
Tarantool with both hash and tree indices is the best for all investigated workloads. It creates a lock-free in-memory engine, which does not consist of any mutexes or other concurrency primitives and uses cooperative multitasking. After considering these graphs, we can conclude that high throughput is one of the strengths of the Tarantool database.
We described YCSB and have provided the results of comparing four popular databases, but the most significant idea considered in this paper is the way of choosing the right solution for the current workload. By looking at the plots placed within this article, it is simple to find the most suitable solution with respect to your workload type, database clients count and your expectations.
The links on our VMs images, YCSB with Memcached module and R scripts are specially published so that you can conduct your own tests and verify our results or get results for instances of different configurations (both hardware and software).
Through all tests we executed, Tarantool showed the best result for the count requests per second and for many of tests latency values on any type of examined workloads. Therefore, we can decide that for most of typical projects Tarantool suits them more that popular solutions such as Redis, CouchBase or Memcached. This is the basis of our decision to use Tarantool for our projects here at my.com.Dmitriy Kalugin-Balashov,