System design calculations

Back-of-the-envelope Estimation cheat-sheet

This interactive article helps to remind or rewind key components of estimation part in system design process.

Details

Components:

  1. Parameters for basic and advanced calculations
  2. Design recommendations based on calculation result
  3. Scientific calculator links that simplifies unit conversions
    • bytes -> GB
    • RPM * KB * year -> GB
    • etc

Numbat

Numbat is scientific calculator with command line interface, simple unit conversion and web version.

Alternative is calculation by pen using scientific notation or powers of 2.

Support Numbat project with GitHub star

Questions

Please, use discussions for any questions and feedback.

Entity size

Average entity size - base for bitrate and capacity calculation. Sum all fields/columns average size.

Warning

Be careful with cases, when your service integrates into existed system.
For example, video into airbnb room page (`video` entity inside `room` entity)

Do not go straight to estimation. Describe system boundaries.

Your questions:
  • Which part of all pages is expected to be with video's?
  • Which services I need to extend to add my entity to search, show and edit?
  • Can I add columns/fields to existed database?

User activity

Loading profile. How many entities read and written per minute.
Network usage estimation parameters.

100К single read RPM + 10K list read RPM (list len 10) = 200K entities per minute
Network usage GB/s Annual costs TB
Details and recommendations

Advanced

It is not important calculation for system design interview or average projects.

Number of concurrent connections.
Real-time or low-latency applications can have bottleneck problems in number of web-sockets or database connections.
Your question is "How many users doing this action at the same moment"

Peak activity factor.
User activity has big peaks, especially in retail and games.
Peaks can be handled with help of cloud or temporary hardware.

Protocol factor.
Protocols and notation (HTTP, JSON, GRPC, SQL, etc) adds extra size to your data and network usage.
GZIP or other compression can compensate that but not 100%.
If bitrate is important do x2.

Storage capacity

Important parameters to estimate storage billing.

Raw data capacity Total occupied space
Details and recommendations
  • Data storage time.

    Many systems not require to store data forever.
    Or "cold storage" can be used.

  • Indexes and metadata factor.

    Database isn't magic. It stores indexes, not deleted yet entities, pages, blocks, trees and logs data.

    Take at least x2 for simple key-value case.

  • Replication factor.

    Database is reliable, but it fails sometimes. Fail recovery takes time.

    98+% uptime requirements rely on x2 replication factor (one main and one replica).
    x3 for 99.9+.

  • Sharding

    500+ GB database barely fits into one database and require sharding.