Veeam repositories, both Windows and Linux based, are running a software component responsible for receiving and storing data as they are processed by proxies. One of the most important parameter when sizing a repository is its expected memory consumption. Here are some informations for its proper configuration.
What Veeam repository stores in memory
A Veeam Repository is responsible for the collection of saved blocks coming from proxies and their storage on a disk target, local or remote. In order to speed IO operations on disk, a Repository leverages multiple technologies involving memory. Obviously, there’s always a tradeoff between disk IO and memory IO, so any saved IO on the storage has to be compensated by some additional IO on memory. But given the fact memory is multiple times faster than disk, this is a good tradeoff.
At a first level, a repository uses memory to store incoming blocks. This queue collects all blocks coming from proxies, caches them in memory and after some optimizazions it is flushed to disk. This allows to reduce to a minimum random IO impacting the backup files, while trying to serialize as much as possible the writes operations. The amount of memory consumed by the queue is simple to be calculated: it uses up to 2 GB of memory per active job.
But this is not the only memory consumed by the repository: Veeam backup files contains deduplicated informations of the saved blocks. As in any deduplicated storage, in order to keep track of stored blocks, there are metadata informations stored along the file itself (Remember, in Veeam metadata are compared inside the same file, not as a global deduplication system).
When a new block needs to be written into the repository, for example during an incremental backup, the Veeam datamover component running in the repository (also referred as the target datamover) has to read these metadata, compare the hashes of the stored blocks with the incoming blocks arriving from a proxy, and decide if this block has to be stored because it’s new, or it just needs to update the metadata informations when the block is already stored from a previous write operation. This is extremely important especially in a scale-out design, when multiple proxies are writing data into the same backup file: blocks coming from different proxies might be duplicated, so the right point in the chain to compare them is the target datamover itself.
To improve performances, the target datamover loads dynamically these metadata informations into memory. Before Veeam Backup & Replication v8 Update 2, the cache was used to accelerate writes, while in Update 2 it’s now also used to accelerate read operations; but there are also differences in the way the cache is populated and used. Let’s first see what is the content of the cache: metadata informations are obviously way smaller than the data they refer to, but still they consume some amount of memory. The amount of consumed memory for metadata depends on the selected block size for deduplication:
When both deduplication and encryption are enabled these are the consumption values:
VBK size | Optimization | VBK block size | Memory consumption for VBK metadata |
---|---|---|---|
1 TB | WAN target | 256 KB | 700 MB |
1 TB | LAN target | 512 KB | 350 MB |
1 TB | Local target | 1024 KB | 175 MB |
1 TB | Local target 16+ TB | 8192 KB | 22 MB |
Based on this table, you can easily calculate the amount of consumed memory on a repository based on the size of the backup files you have to deal with; also, you can easily understand why on large backup sets the 8MB block size is to be preferred: a 30TB backup file would consume 5,2 GB of memory, while with the large block consumption is just 660 MB of memory. The tradeoff on the other side is a worse deduplication result because of the larger block size. This is also the block size for deduplication appliances: in this case, the reason is that it’s useless to consume a high amount of memory on the gateway server, while the real deduplication happens in another machine, the deduplication appliance itself.
EDIT: after publishing the article, Veeam developers found a couple of inaccuracies in the cache description. In order to avoid having people reading wrong statements, I’ve temporarily removed the second section; it will be back soon. Sorry for the inconvenience.
Great piece of information!
I have still some difficulty bringing this all together after reading. Could you maybe provide us with an example calculation. I bet the questions i have are answered by such an example.
Hi, I know the explanation is pretty deep because we wanted people to understand how the cache works and what implications it has, but for sizing the memory the calculation is quite simple: sum the entire chain of the backup job, and apply the number of the table.
Say a chain is going to be in total 10TB for example, the maximum amount of memory will be 2GB per the block cache and 1,75 GB for the cache. Total consumption (maximum) will be 3,75 GB as a worst case.
thanks!
so after installing v8 update 2, there’s no cut-off at 1,5GB for write cache? i guess not according to your example
Correct, as stated in the release notes of Update2, expect increased memory consumption.