Carrier DB alpha the world's most efficient key-value database

big database energy

Carrier DB 2018-09-07

Download for macOS

Download for Linux

Subscribe to receive announcements, release notes, feature updates, progress reports, and dance invites.


Why Carrier DB?

Carrier DB is the most memory efficient, secure, production oriented key-value database ever.

Memory efficiency is an often ignored aspect of data storage. When you have multiple terabytes of expensive RAM (or the opposite case: limited gigabytes of expensive mobile RAM), your budget cries every time you waste resources on unnecessary data structure metadata.

Carrier DB is built considering data efficiency and security first. Using custom memory-efficient data structures, Carrier DB can store average-sized data 2x to 10x more compactly than other in-memory databases.

Carrier DB serves goals other systems ignore, including but not limited to:

  • non-blocking operations
    • Retrieve the middle 200,000 elements of a 3 million long list? No problem. Carrier DB remains responsive to other requests the entire time.
    • Check the intersection of a dozen sets each with 1 million items? Carrier DB will remain responsive the entire time.
    • Save the union of 40 sets each with 2 million items into a new set? Carrier DB remains responsive.
  • security first
    • create multiple security domains by listening on multiple IP addresses
    • enable multiple TLS certificates per server using security domains
    • create multiple access restrictions per security domain
    • retrieve independent network statistics for each listening IP address
    • configure which listening IPs support which server protocol (legacy memcached or legacy redis protocols)
  • memory efficiency
    • store 1 billion values with only 5 GB to 10 GB internal metadata overhead
      • for comparison: memcached wastes about 100 GB in overhead per billion values (around 1 GB overhead per million values stored). redis wastes anywhere from 50 GB to 200 GB per billion values stored depending on which inefficiently written data structure(s) get used.
    • cache efficiency
      • a major benefit of Carrier DB's minimal-overhead data structures is increased CPU cache efficiency. Even with terabytes of RAM, your CPU caches are still just a few dozen KB to a couple MB in size. Every byte saved from needless data structure overhead increases your system performance by keeping more of your data in high speed hardware caches.
  • scale-up/scale-down efficiency and security
    • designed with both massively distributed GB-scale edge computing in mind (IoT, 5G nodes, micro-POPs, CPE) as well as centralized TB-scale global data management, Carrier DB can improve every modern data architecture by reducing your hardware costs through fast, compact memory storage plus adding built-in security throughout your data architecture.

What does Carrier DB support?

legacy memcached protocol

Carrier DB is an extension of our amazing Carrier Cache memcached replacement. Carrier DB supports all memcached commands from Carrier Cache in addition to dozens more features outlined below.

legacy redis protocol

In addition to all legacy memcached commands, Carrier DB also supports redis commands. We are currently increasing the number of supported built-in data structures.

As of right now Carrier DB supports strings, lists, HLLs, and maps/dicts/hashes, with sets and sorted sets next on the implementation schedule.

fully multi-threaded server

Carrier DB is a highly multi-threaded server. Client requests are handled with multiple thread pools for accepting incoming commands, processing commands, then sending replies back to clients. Carrier DB can efficiently grow to hundreds to thousands of cores plus terabytes of memory all from a single process.

non-blocking atomic data structure operations

Carrier DB combines support for complex data structure types with a non-blocking massively multi-threaded architecture so your server doesn't collapse due to large queries.

non-blocking transport encryption and decryption

Encryption is a must-have feature these days. Any servers not giving you the option of connecting over TLS are being professionally irresponsible and opening up your company to increased liability.

Unfortunately, encryption has unavoidable computational costs. Encryption will never be as lightweight as insecure plain text connections, but Carrier DB has designed a high performance fully non-blocking encryption and decryption system to minimize encryption impacts on server performance.

Carrier DB uses independent decryption and encryption thread pools for maximum concurrency to optimize multi-threaded CPU utilization. Carrier DB concurrently decrypts requests and encrypts replies giving you the most flexibility to use encrypted clients at the lowest overhead possible.

most efficient memory usage

When the world first started moving to 64-bit platforms a big concern was pointer overhead. The world went from 4 byte pointers to 8 byte pointers overnight. Many data structures doubled in size, and all those extra wasted bytes clog up CPU caches, waste memory, and end up being noticeable after a while.

Carrier DB has conquered the tyranny of 64-bit platforms.

Carrier DB stores all data in various succinct data structures we've created solely for the purpose of low overhead in-memory data storage capable of growing to dozens of terabytes of RAM.

In fact, Carrier DB implements the smallest data structures possible. No other system comes close to our memory efficiency.

multiple network virtual hosting

Carrier DB enables serving data from multiple IP addresses. Each IP address can be configured with independent options for connecting clients.

Each Carrier DB virtual network can:

  • bind to an IP address
  • enable per-IP TLS encryption with RSA or elliptic curve key and cert
  • specify whether to receive legacy memcached or legacy redis protocol
  • set per-network access levels
    • lock all clients to read-only mode
    • allow clients to run the stats command
    • allow clients full admin access
    • disable all data access and only allow stats—you can expose only the statistics interface to external monitoring services without risking any reading or modification of stored data
    • coming soon per-virtual-network namespaces

Current Status

Carrier DB alpha releases are usable, but as alpha suggests, may not be reliable for all uses yet. Join our mailing list for notification about feature updates and new releases.

Feature Progress

Carrier DBalpha releases are stable(ish). The alpha status denotes not all edge cases have been explored, maybe not all combinations of feature interactions have been explored or checked for error conditions, and maybe some verbose printing still lingers in release builds.

Also, docs and how-tos are still under construction. Alpha releases expect you to have familiarity with legacy redis commands and perform some cross-site recon because Carrier DB is an extension of Carrier Cache, with docs over at Carrier Cache Tech Specs.

Basically, if you use features and they work, you can rely on them for your current alpha release. In future alpha releases anything can change, including but not limited to: commands, configs, arguments, etc. If you experience unexpected Carrier DB output, crashes, or unnecessary difficulty, feel free to let us know about any problems so we can prioritize your fixes.

Preview

Here's a quick preview of what's still cooking for future releases:

  • concurrent persistence without snapshots
    • no forking ever again. fork those forkers who think fork is a realistic way to save data in production environments.
    • Carrier DB's upcoming persistence framework updates incrementally requiring no dirty legacy redis "snapshot" operations.
    • The upcoming persistence framework can also be reloaded with fully threaded throughput.
    • Tired of waiting over 24 hours for legacy redis to restore hundreds of GB from disk? Carrier DB can use all your cores to perform reloads using all your cores at the same time.
  • clustering without compromise
    • All Carrier DB operations are designed to be self-organizing so clients themselves don't need to figure out which servers hold which keys.
    • Multi-key operations can be handled across nodes by any instance with the same isolation guarantees as single node deployments.
  • self-managing failover
    • Carrier DB replication will include automated monitoring, failover, and failover announcement capability so all clients stay updated as data moves around. Unlike with legacy redis, you won't need to run yet-another-untested-unstable-poorly-designed-distributed-system to manage failover of database replicas.
  • ultra-extreme memory efficiency
    • Carrier DB already has world-leading memory efficiency, but it can get even better.
    • You'll be able to pick between ultra-extreme, zero-duplication data storage and regular data storage modes depending on which meets your memory and performance requirements.

Features Currently

Here's a concise list of current Carrier DB features both released and in progress.

Completed

Most of features are unique to Carrier DB and have little to no equivalent in other databases when compared against memory efficiency and security concerns.

  • legacy memcached protocol; all commands
  • legacy redis protocol; growing commands
  • multiple network support
  • TLS support including RSA and fast elliptic curve keys
  • multi-threaded non-blocking TLS encryption and decryption
  • optimized low overhead string storage, including:
    • cas
    • countutf8codepoints
    • incrby accepts signed integers, unsigned integers and floating point all in one command.
  • optimized low overhead set storage:
    • Also implements ssubset to check subset properties and sequal to check if two sets are exactly equal
  • optimized low overhead list storage:
    • lrange implemented as a non-blocking command
  • optimized low overhead HyperLogLog storage:
    • hlladd implemented as a non-blocking command
  • optimized low overhead map/dict/hash/namespace storage:
    • can set expiration times on sub-keys inside namespaces
    • namespaces can nest to an unlimited depth
    • deleting a namespace recursively cleans all sub-keys including sub-namespaces
    • namespaces are inescapable security boundaries
  • extreme memory efficiency throughout
  • multi-threaded non-blocking data access
  • JSON stats reporting
  • JSON client info reporting

In Progress, Unreleased

  • coming soon sorted sets
  • geo types
  • client blocking operations (brpop, blpop, etc)
  • pubsub operations
  • background LRU maint
  • coming soon concurrent persistence
  • coming soon replication+failover
  • clustering
  • more admin flexibility
  • improved feature parity with other DBs

If your favorite command is currently missing, let us know what you need. We prioritize features based on popularity.

Specs for Carrier DB 2018-09-07

Carrier DB 2018-09-07 implements the most memory efficient generic set operations ever created by human or machine hands.

For backwards compatibility, Carrier DB set operations can be accessed using legacy redis protocol commands.

Feature Notes
legacy redis protocol [sets]
command deviations
sadd, srem, sismember, scard, spop [count], srandmember [count], sinter, sinterstore, sunion, sunionstore, sdiff, sdiffstore, smembers

Carrier DB supports all legacy redis set operations with the following changes:

  • smove can accept an unlimited number of elements to move instead of being limited to only one element
  • smembers is just an alias for get since, in Carrier DB, get can act on any type.
  • Carrier DB sorts membership results prior to client reply for more consistent data management.
ssubset

ssubset allows you to check if the first set type key is a subset of the second set type key directly in the server.

Returns 1 if the first key is a subset of the second key, otherwise 0.

sequal

sequal allows you to compare an unlimited number of sets for equality directly in the server.

Server-side compares are much more efficient than client-side checks when dealing with large sets.

Returns 1 if all sets have the same elements, otherwise 0.

Testing for equality is logically equivalent to getting true results for both ssubset A B and ssubset B A.

sequal is more efficient than double ssubset since sequal doesn't need to read all sets twice, plus sequal can compare an unbounded number of sets with just one command.

Specs for Carrier DB 2018-08-24

Feature Notes
legacy memcached protocol

enable by adding protocol dmp to a network config block

For server setup, see: Carrier Cache Tech Specs.

legacy redis protocol

enable by adding protocol drp to a network config block

For server setup, see: Carrier Cache Tech Specs.

legacy redis protocol [namespaces]

Note: namespaces are unique to Carrier DB and have no redis equivalent.

command deviations
ns | namespace

In Carrier DB, maps/hashes/dicts are fully formed namespaces.

A namespace is a key-value map where keys can be of any type.

In legacy redis language, this means you can have hashes with sub-hashes and sub-lists inside of them.

All key-value maps in Carrier DB are namespaces:

  • You can set individual expires timeouts on any value inside any nested namespace.
  • You can use any operation on any value inside a namespace.
    • for example, legacy redis has a very, very limited number of "hash" commands.
    • In legacy redis, even though hash values are just strings, you can't perform all string operations on them. In Carrier DB, you can perform any operation on any value at any level of any namespace without needing customized one-off implementations for sub-containers.

Other useful features of namespaces:

  • automatic prefix compression
    • Instead of storing a billion keys of the form prefix:subtype:subvalue:subsubvalue which would eat 30 GB RAM, you can use ns prefix subtype subvalue subsubvalue and only store those values once for all final-level keys.
  • automatic multi-CPU socket NUMA domain locality
    • The first key in the namespace depth determines a single storage location for all sub-keys (including sub-namespaces).
    • On multi-processor machines, namespaces bind your data to a NUMA domain to reduce data access latency.
    • In cluster deployments, namespaces provide a way to guarantee data will be co-located on the same node for faster operations on related keys.
  • automatic cleanup
    • If you delete a namespace, all nested keys and values of all types are automatically recursively deleted too.
    • These self-cleaning namespace deletes enable easy management of user data. Put each user a unique namespace, then one command can wipe all their nested live data from your DB when needed.
  • security boundary
    • Once a client enters a namespace, they can't escape to higher level namespace unless they are on an enableadmin network (see nsreset).
    • Clients can always select or create deeper namespaces, but they can't go up the namespace hierarchy to parents or even peers once a namespace is entered.

For backwards compatibility with legacy redis, hash commands still work too. Legacy redis hash commands are mapped to enable reading from one deeper namespace than you have selected without entering the namespace itself.

These usages are equivalent:

  • ns users
  • ns unregistered
  • set usersAreFrom unknown
  • ns users
  • hset unregistered usersAreFrom unknown
  • ns users unregistered
  • set usersAreFrom unknown

ns accepts multiple arguments to automatically enter a nested namespace.

If you are in a namespace and you run ns again, you enter a child namespace of your current namespace.

If your namespace can't logically exist because one of your requested elements is already a non-namespace type, you'll get an error when you attempt a data operation. Namespace validity is only checked when data access is attempted, not when ns commands are executed.

hs | hyperspace

hyperspaces are namespaces using the last namespace level to provide data co-location.

Why do we need hyperspace and namespace?

Consider the namespace:

  • ns [tld] [domain] [access time]

You would end up with inefficient storage and performance because all your data will be co-located at one location under the single namespace [tld].

hyperspace fix this common inefficiency by allowing co-location placement based on the last (and hopefully most unique/specific) namespace element.

Consider the hyperspace:

  • hs [tld] [domain] [access time]

Now your data will be co-located across nodes/domains/threads by [access time] instead of by [tld], which should provide better performance since you will have more [access time] values than [tld] values.

cs | customspace

If namespace co-locates data based on the first name entry and hyperspace co-locates based on the last entry, what if we want to co-locate data based on a different namespace position?

customspace allows you to pick any namespace entry as a co-location key.

customspace is the general form of namespace selection.

You can implement namespace as customspace 0 to co-locate based on the first name provided.

You can implement hyperspace as customspace -1 to co-locate on the last name provided.

These pairs show equivalency of customspace:

  • ns [tld] [domain] [access time]
  • cs 0 [tld] [domain] [access time]
  • hs [tld] [domain] [access time]
  • cs 2 [tld] [domain] [access time]
  • hs [tld] [domain] [access time]
  • cs -1 [tld] [domain] [access time]
  • ns [tld] [domain] [access time]
  • cs -3 [tld] [domain] [access time]
  • hs [tld] [domain] [access time]
  • cs 9999 [tld] [domain] [access time]
  • ns [tld] [domain] [access time]
  • cs -100 [tld] [domain] [access time]

If you request values too big or too small, your customspace offset will be clamped to the start or end namespace depending on which direction you overran.

nsreset

Since Carrier DB namespaces are also security boundaries, a client needs admin rights to "escape" from a namespace.

If your network configuration has enableadmin set, you can use nsreset to jump back to the top namespace.

You can run either:

  • nsreset
    • reset your client to the top-level namespace
  • nsreset [ns1] [sub-ns2] ...
    • reset your client to a different namespace starting at the top level
hsreset

Same as nsreset except you are reset into a hyperspace.

csreset

Same as nsreset except you are reset into a customspace.

Takes argument of position before new reset:

  • csreset 0 ...
    • equivalent to nsreset
  • csreset -1 ...
    • equivalent to hsreset
whereami

namespace selection persists after you enter a namespace for the duration of your client connection.

Unless you are on an enableadmin network, you can't escape to a higher level namespace (though you can always create deeper level namespace maps).

If you need a reminder which namespace you're in, run whereami to get your current namespace depth along with the index of which namespace entry is being used for co-locating your data across nodes/domains/threads.

keys

Returns a list of all key names under your current namespace.

If run at the top level, this returns all keys on your server.

Equivalent to being outside of a namespace and running hkeys [namespace].

Order of result is not specified.

vals

Returns a list of all values under your current namespace.

If run at the top level, this returns all values on your server.

NOTE: vals returns all nested values for all types, including lists of any size and further nested namespaces of any depth and all their keys and values.

Equivalent to being outside of a namespace and running hvals [namespace].

Order of result is not specified. May not match keys result order.

getall | keysandvals

Returns alternating key, value list.

See caveats listed under vals so you don't explode your server.

Equivalent to hgetall used from one level outside a namespace.

keysandexpires

Returns alternating key, expires list.

NOTE: keysandexpires iterates all child namespaces and returns their expirations too.

Also available: keysandexpires nice to return readable date time strings.

keysandexpires nice currently has second resolution, but resolution may be increased in future releases.

keysandtypes

Returns alternating key, type list.

NOTE: keysandtypes iterates all child namespaces and returns their types too.

NOTE: Namespaces return nested values indicating they are namespaces.

legacy redis protocol [string]
command deviations
cas

Carrier DB supports a compare-and-set cas command under legacy redis protocol even though redis as refused to add a compare-and-set operation for ten years.

Usage:

  • cas set compare [key] [new value] [old value]

Example of successful cas operation:

  • set hello world
  • cas set compare hello everybody world → return value is 1 on success
  • Now key hello has value everybody because world matched the current value of hello.
  • If you run cas set compare hello everybody world again, the command will return 0 and the value of key hello will not change because your compare value does not match the current value.
set

accepts legacy redis syntax options of:

  • set key value [EX seconds|PX milliseconds] [NX|XX]

Carrier DB does not implement the obsolete setex or psetex commands because they are just aliases for set key value EX sec and set key value PX ms.

Carrier DB does implement setnx because its return value is different than set key value NX (thanks for having a consistent design, redis).

Additionally we match memcached commands by having:

  • set key value NX aliased to add key value
    • insert if key does not exist
  • set key value XX aliased to replace key value
    • insert if key does exist
get

Carrier DB converts all types to return values when using get (instead of only accepting strings).

get some-list returns the entire list content.

get some-hll returns the count of the HLL.

get some-map returns the key-value pairs of the namespace.

mset, msetnx

Works as expected, even across cluster nodes.

If failures encountered, send semi-reproducible test cases to concerns@carrierdb.cloud.

append, prepend

Carrier DB supports the prepend operation from memcached as well. legacy redis does not provide the operation.

strlen | countbytes, countutf8codepoints

strlen is also aliased to the more descriptive countbytes.

Carrier DB provides the countutf8codepoints command to count variable length UTF8 code points.

A single UTF8 code point may not be a full printed character. Due to combiners, zero width spaces, and zero width joiners (which is how emojis get "customized"), a single printed UTF8 character can use anywhere from one to eight code points, and each code points is between one and four bytes long.

del also aliased to delete
incrby, decrby, incrbyfloat

Carrier DB does not implement the obsolete incr or decr commands.

Carrier DB has an intelligent type-based input system for number parsing, so incrbyfloat is just an alias for incrby. Both incrby and decrby will properly increment and convert values as large as 64-bit signed integers, 64-bit unsigned integers, and 64-bit doubles.

decrbyfloor decrbyfloor matches memcached's decr command. memcached's increment and decrement operations are always on unsigned 64-bit integers, so attempting to decrement below zero stores 0 as your result (hence, the floor of the decrement is zero).
legacy redis protocol [list]
command deviations
rpush, lpush, rpushx, lpushx, llen, rpop, lpop, lindex, ltrim, rpoplpush

Works as expected.

If failures encountered, send semi-reproducible test cases to concerns@carrierdb.cloud.

lrem, linsert, sort

Not implemented. These commands either don't seem useful or are needlessly complex.

If you think these list primitives should be included, let us know.

lrange

In Carrier DB, lrange is a non-blocking command.

If you have a list with 3 million elements and you want to use lrange to return the middle million elements, Carrier DB remains responsive to all other clients during your retrieval.

legacy redis protocol [hyperloglog]
command deviations
pfadd | hlladd

pfadd (also aliased to hlladd) is a non-blocking command. If a client adds a million elements using hlladd, Carrier DB will remain responsive without blocking other clients.

pfcount | hllcount

also aliased to hllcount

pfmerge | hllmerge

pfmerge (also aliased to hllmerge) returns the count of the merged HLL target. This deviates from the legacy redis return value where merging HLLs just returns a useless OK status.

legacy redis protocol [hash, dict, map]

Note: redis hash commands are just protocol adapters. All maps in Carrier DB are fully formed namespaces supporting nested types.

command deviations
hset, hsetnx, hmset, hget, hmget, hdel, hkeys, hvals, hgetall

Each legacy hash command reads values inside a namespace without entering the namespace.

These two command paths are equivalent:

  • old hash version
    • hset a-map somekey somevalue
    • hget a-map somekey
  • namespace version
    • ns a-map
    • set somekey somevalue
    • get somekey
    • nsreset
    • hget a-map somekey

legacy redis protocol [server / admin]

command deviations
stats

Tired of legacy redis janky busted hand-written monitoring output where each line needs an independent parser because formatting is completely arbitrary? Have we got the solution for you: JSON statistics!

For details of Carrier Stats format, see Stats For The Modern Age.

If you are testing stats with redis-cli, the output of stats will be ugly because janky legacy redis-cli doesn't support printing newlines in results. You'll want to view pretty formatted output with this shell command:

  • printf "$(redis-cli -p PORT stats visual)"

For retrieval efficiency, individual subsections can be selected per stats request to avoid generating sections you don't need.

For example, you probably don't need to generate the license and startup sections for real time monitoring requests.

  • license
  • process
  • cpu
  • memory
  • startup
  • os
  • log
  • keyspace
  • network
self stats

json output

returns statistics for your client connection.

also accepts the form self stats visual

You Have Reached The End Of The Page

Here at End Of The Page, we want to thank you for reading our page.

If you have any questions about Carrier DB, let us know and let's make the world a little bit better one program at a time.

For updates about new Carrier DB features and releases, sign up for our mailing list below.

This page is the official source for all Carrier DB updates, releases, and feature notes. Check back often.


Subscribe to receive announcements, release notes, feature updates, progress reports, and more!

YOU CONSENT TO THE PLACEMENT OF COOKIES ON ANY SURFACE NEAR YOU FOR THE PURPOSE OF EASILY CONDUCTING CONSUMPTION OF AFOREMENTIONED COOKIES

NO WARRANTY WHATSOEVER, EXPRESS, STATUTORY, OR IMPLIED EXCEPT FOR AN EVERLASTING GUARANTEE OF BIG DATABASE ENERGY