Note: This document links directly to relevant areas found in the system design topics to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.
Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
query
- 50 bytestitle
- 20 bytessnippet
- 200 bytesHandy conversion guide:
Outline a high level design with all important components.
Dive into details for each core component.
Popular queries can be served from a Memory Cache such as Redis or Memcached to reduce read latency and to avoid overloading the Reverse Index Service and Document Service. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
Since the cache has limited capacity, we'll use a least recently used (LRU) approach to expire older entries.
The cache can use a doubly-linked list: new items will be added to the head while items to expire will be removed from the tail. We'll use a hash table for fast lookups to each linked list node.
Clarify with your interviewer how much code you are expected to write.
Query API Server implementation:
class QueryApi(object):
def __init__(self, memory_cache, reverse_index_service):
self.memory_cache = memory_cache
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
"""Remove markup, break text into terms, deal with typos,
normalize capitalization, convert to use boolean operations.
"""
...
def process_query(self, query):
query = self.parse_query(query)
results = self.memory_cache.get(query)
if results is None:
results = self.reverse_index_service.process_search(query)
self.memory_cache.set(query, results)
return results
Node implementation:
class Node(object):
def __init__(self, query, results):
self.query = query
self.results = results
LinkedList implementation:
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
...
def append_to_front(self, node):
...
def remove_from_tail(self):
...
Cache implementation:
class Cache(object):
def __init__(self, MAX_SIZE):
self.MAX_SIZE = MAX_SIZE
self.size = 0
self.lookup = {} # key: query, value: node
self.linked_list = LinkedList()
def get(self, query)
"""Get the stored query result from the cache.
Accessing a node updates its position to the front of the LRU list.
"""
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
def set(self, results, query):
"""Set the result for the given query key in the cache.
When updating an entry, updates its position to the front of the LRU list.
If the entry is new and the cache is at capacity, removes the oldest entry
before the new entry is added.
"""
node = self.lookup[query]
if node is not None:
# Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
# Key does not exist in cache
if self.size == self.MAX_SIZE:
# Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
The cache should be updated when:
The most straightforward way to handle these cases is to simply set a max time that a cached entry can stay in the cache before it is updated, usually referred to as time to live (TTL).
Refer to When to update the cache for tradeoffs and alternatives. The approach above describes cache-aside.
Identify and address bottlenecks, given the constraints.
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS as a sample on how to iteratively scale the initial design.
It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers? CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics for main talking points, tradeoffs, and alternatives:
To handle the heavy request load and the large amount of memory needed, we'll scale horizontally. We have three main options on how to store the data on our Memory Cache cluster:
machine = hash(query)
. We'll likely want to use consistent hashing.Additional topics to dive into, depending on the problem scope and time remaining.
Refer to the security section.
See Latency numbers every programmer should know.