Segment Trees (for Concert Tickets)

By Eric Burden | November 1, 2022

One of the benefits of engaging in online communities, at least communities of other coders, is that it gives me a ton of opportunities to learn something new. Since my family and I recently moved to the Upstate area of South Carolina, in addition to great weather and plenty of places to hike, I managed to find a (relatively) new meetup group: Code and Data Science Group of South Carolina. If you’re anywhere near the area, I’d encourage you to come join us for a meetup!

Anyway, the organizer has been posing LeetCode puzzles in the group Slack, and I’m a sucker for some algorithms practice. Recently, this little gem was posted:

A concert hall has n rows numbered from 0 to n - 1, each with m seats, numbered from 0 to m - 1. You need to design a ticketing system that can allocate seats in the following cases:

If a group of k spectators can sit together in a row.

If every member of a group of k spectators can get a seat. They may or may not sit together.

Note that the spectators are very picky. Hence:

They will book seats only if each member of their group can get a seat with row number less than or equal to maxRow. maxRow can vary from group to group.

In case there are multiple rows to choose from, the row with the smallest number is chosen. If there are multiple seats to choose in the same row, the seat with the smallest number is chosen.

Implement the BookMyShow class:

BookMyShow(int n, int m) Initializes the object with n as number of rows and m as number of seats per row.

int[] gather(int k, int maxRow) Returns an array of length 2 denoting the row and seat number (respectively) of the first seat being allocated to the k members of the group, who must sit together. In other words, it returns the smallest possible r and c such that all [c, c + k - 1] seats are valid and empty in row r, and r <= maxRow. Returns [] in case it is not possible to allocate seats to the group.

boolean scatter(int k, int maxRow) Returns true if all k members of the group can be allocated seats in rows 0 to maxRow, who may or may not sit together. If the seats can be allocated, it allocates k seats to the group with the smallest row numbers, and the smallest possible seat numbers in each row. Otherwise, returns false.

Example 1:

Input
["BookMyShow", "gather", "gather", "scatter", "scatter"]
[[2, 5], [4, 0], [2, 0], [5, 1], [5, 1]]
Output
[null, [0, 0], [], true, false]
Explanation:
BookMyShow bms = new BookMyShow(2, 5); // There are 2 rows with 5 seats each 
bms.gather(4, 0); // return [0, 0]
                  // The group books seats [0, 3] of row 0. 
bms.gather(2, 0); // return []
                  // There is only 1 seat left in row 0,
                  // so it is not possible to book 2 consecutive seats. 
bms.scatter(5, 1); // return True
                   // The group books seat 4 of row 0 and seats [0, 3] of row 1. 
bms.scatter(5, 1); // return False
                   // There is only one seat left in the hall.
Constraints:

1 <= n <= 5 * $10^4$

1 <= m, k <= $10^9$

0 <= maxRow <= n - 1

At most 5 * $10^4$ calls in total will be made to gather and scatter.

I, of course, rather glibly assumed that this challenge really couldn’t be that difficult, since it’s just a matter of keeping up with a square matrix of seats and which ones are filled. That was, until I took a closer look at the constraints… Turns out, I needed a much more efficient approach than tracking each seat manually.

Segment Trees

I (not terribly) quickly realized that I didn’t need to keep up with each seat, just how many open seats were in each row (given that all seating started at the first open seat in each row). After a bit of professional Googling, I found that a Segment Tree was the right data structure to efficiently find the first row that could accommodate k more guests (for the gather() operation) as well as find the total number of open seats in the first maxRow rows (for the scatter() operation). Unfortunately, most of the explanations I found were a bit tough for me to follow, which is the inspiration for this blog post. At the most basic level, a Segment Tree is nothing more than a binary tree where each parent ’node’ contains the result of some operation on the child nodes (sum, max, min, mean, etc) AND the tree itself is represented as a flat array where a parent/child relationship is made by a mathematical operation on an index.

That simplification is pretty dense, though, and might be better demonstrated than described. So, given this binary tree where each parent node contains the sum of its child nodes:

           _-  15 -_
          /         \ 
     _- 14 -_        1
    /        \        \
   9          5         1
 /   \      /   \     /   \
5     4    3     2   1     0

You can represent that tree as a list of numbers as shown (with some parent/child relationships shown):

                 _____     ______________
                /___  \   /___________   \
               /    \  \ /            \   \
values:  [15, 14, 1, 9, 5, 1, x, 5, 4, 3,  2,  1,  0,  x,  x]
           \__/  / \______/  / \______________________/   /
            \___/   \_______/   \________________________/
            
indices: [ 0,  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

It is important to note that, as we will represent, the length of the ‘flattened’ array should be $(2^x \times 2) - 1$, where $2^x$ is the first power of 2 greater than or equal to the length of the number of ’leaf’ values being tracked. As show above, a Segment Tree built on top of the array [5, 4, 3, 2, 1, 0] would contain a total of 15 values (some of which may be missing) because the first power of 2 greater than or equal to 6 (length of the input) is 8, and $(8 * 2) - 1 = 15$.

Missing values in the Segment Tree are needed to ensure that the relationships between parent and child indices follows a consistent pattern. In this case, that pattern can be represented as:

$$\begin{eqnarray} &index_{left\, child} &= index_{parent} \times 2 + 1 \\ &index_{right\, child} &= index_{parent} \times 2 + 2 \end{eqnarray}$$

You can check the indices for the relationships indicated above to see that this relationship holds.

An Aside on Ranges

The implementation for our Segment Tree is going to rely on comparing index ranges in a couple of different ways, which isn’t totally supported out of the ‘Box’ in Rust (see what I did there?). To accomplish that, I’ll be using adding the following trait behaviors to the standard library RangeInclusive:

use std::ops::RangeInclusive;

/// Some extra methods on `RangeInclusive`.
trait RangeOps<T> {
    fn encloses(&self, other: &RangeInclusive<T>) -> bool;
    fn overlaps(&self, other: &RangeInclusive<T>) -> bool;
    fn split(&self) -> (RangeInclusive<T>, RangeInclusive<T>);
    fn is_single(&self) -> bool;
}

impl RangeOps<usize> for RangeInclusive<usize> {
    /// Check to see whether `self` fully encloses `other` range.
    fn encloses(&self, other: &RangeInclusive<usize>) -> bool {
        self.start() <= other.start() && self.end() >= other.end()
    }

    /// Check to see if `self` overlaps with `other` range.
    fn overlaps(&self, other: &RangeInclusive<usize>) -> bool {
        self.start() <= other.end() && self.end() >= other.start()
    }

    /// Split `self` range into two ranges of equal size. This wouldn't really
    /// work in general, but all our ranges we split later on will have an
    /// even length. There are better implementations for a more general
    /// approach to this.
    fn split(&self) -> (RangeInclusive<usize>, RangeInclusive<usize>) {
        let mid = (self.start() + self.end()) / 2;
        (*self.start()..=mid, (mid + 1)..=*self.end())
    }

    /// Returns `true` if the range contains a single value, otherwise `false`.
    fn is_single(&self) -> bool {
        self.start() == self.end()
    }
}

These abstractions will make the code in the next sections easier to read and understand.

Building the Tree

Many of the explanations for Segment Trees I found referenced a common set of operations in C. I, however, am not terribly familiar with C, and references to variables that are not passed in or defined kind of throws me off. I’ll be translating this C code to much more verbose Rust. It helps me, okay? For each part, I’ll include the C version I found in a number of different articles as a reference and comparison.

The C Version

// Method to build a segment tree
void build(int node, int start, int end)
{
    if(start == end)
    {
        // Leaf node will have a single element
        tree[node] = A[start];
    }
    else
    {
        int mid = (start + end) / 2;
        // Recurse on the left child
        build(2*node, start, mid);
        // Recurse on the right child
        build(2*node+1, mid+1, end);
        // Internal node will have the sum of both of its children
        tree[node] = tree[2*node] + tree[2*node+1];
    }
}

The Rust Version

We’re going to keep up with two different aggregations in our SegmentTree class. sum_tree will aggregate values such that sum_tree[parent] = sum_tree[left_child] + sum_tree[right_child] and will be used to determine the sum for given ranges of values. max_tree will aggregate values such that max_tree[parent] = max(max_tree[left_child], max_tree[right_child]) and will be used to determine the first value less than or equal to an input value. You’ll also notice that I’m not recursing to build the trees here, mostly because this was straightforward to do in loops.

struct SegmentTree {
    offset: usize,        // The offset of leaf nodes from arr[0]
    sum_tree: Vec<usize>, // Sum-aggregated segment tree
    max_tree: Vec<usize>, // Max-aggregated segment tree
}

impl SegmentTree {
    /// This constructor builds segment trees on top of a array `[m; n]`, or
    /// `n` values that all equal `m`. This is because our puzzle deals with
    /// equal-length rows of seats and we're storing the number of open seats
    /// per row.
    fn new(n: usize, m: usize) -> Self {
        // Determine how large the segment trees need to be to fit all
        // the parent and child nodes.
        let leaves = n.next_power_of_two();
        let capacity = (2 * leaves) - 1;
        let offset = leaves - 1;

        // Create empty vectors for both segment trees. I'm using `0` to fill
        // empty spaces in tree vectors, which can be confusing when `0` is a
        // legitimate value as well. This would be better as `None` in a vector
        // with type `Vec<Option<usize>>`, but that would complicate the code
        // operating on these vectors in a way that might make it more difficult
        // to follow the logic.
        let mut sum_tree = vec![0; capacity];
        let mut max_tree = vec![0; capacity];

        // Fill the 'leaf' slots in both segment trees with input values. In this
        // case, they're all `m`.
        for idx in offset..(offset + n) {
            sum_tree[idx] = m;
            max_tree[idx] = m;
        }

        // Fill the 'parent' slots in both trees using the relevant aggregation
        // for each tree.
        for idx in (0..offset).rev() {
            let left_idx  = (idx * 2) + 1;
            let right_idx = (idx * 2) + 2;

            sum_tree[idx] = sum_tree[left_idx] + sum_tree[right_idx];
            max_tree[idx] = std::cmp::max(max_tree[left_idx], max_tree[right_idx]);
        }

        SegmentTree { offset, sum_tree, max_tree }
    }
}

Getters and Setters

We’ll need to be able to get and set leaf values in the SegmentTree. Since all the original values are shifted to the back half of the ‘flattened’ array, we need to adjust for those indices. We’ll also need to update the parent nodes that depend on any updated leaf value. I don’t have a C reference for the getter, but it is pretty straightforward.

The C Version

void update(int node, int start, int end, int idx, int val)
{
    if(start == end)
    {
        // Leaf node
        A[idx] += val;
        tree[node] += val;
    }
    else
    {
        int mid = (start + end) / 2;
        if(start <= idx and idx <= mid)
        {
            // If idx is in the left child, recurse on the left child
            update(2*node, start, mid, idx, val);
        }
        else
        {
            // if idx is in the right child, recurse on the right child
            update(2*node+1, mid+1, end, idx, val);
        }
        // Internal node will have the sum of both of its children
        tree[node] = tree[2*node] + tree[2*node+1];
    }
}

The Rust Version

impl SegmentTree {

  // ...
  
    /// Set a 'leaf' value and update all parent nodes of that leaf. This
    /// operation runs in O(log n) time.
    fn set_value(&mut self, idx: usize, value: usize) {
        // Convert the given `idx` into the offset index in the trees.
        let mut idx = idx + self.offset;

        // Set the value in both trees
        self.sum_tree[idx] = value;
        self.max_tree[idx] = value;

        // Walk up the parents of the updated node and re-calculate their
        // values in both segment trees.
        while idx > 0 {
            idx = (idx - 1) / 2;
            let left  = (idx * 2) + 1;
            let right = (idx * 2) + 2;

            self.sum_tree[idx] = self.sum_tree[left] + self.sum_tree[right];
            self.max_tree[idx] = std::cmp::max(self.max_tree[left], self.max_tree[right]);
        }
    }

    // Fetch a 'leaf' value. Since we're manually keeping the segment tree leaf
    // values in sync, we only need to fetch from one segment tree.
    fn get_value(&self, idx: usize) -> usize {
        self.sum_tree[self.offset + idx]
    }
    
    // ...
}

Searches

Now for the “fun” part! We need two different types of queries to work on the original puzzle, but the C implementation I found only addresses one type. That’s okay, it can be extended to fit the bill.

The C Version

int query(int node, int start, int end, int l, int r)
{
    if(r < start or end < l)
    {
        // range represented by a node is completely outside the given range
        return 0;
    }
    if(l <= start and end <= r)
    {
        // range represented by a node is completely inside the given range
        return tree[node];
    }
    // range represented by a node is partially inside and partially outside the given range
    int mid = (start + end) / 2;
    int p1 = query(2*node, start, mid, l, r);
    int p2 = query(2*node+1, mid+1, end, l, r);
    return (p1 + p2);
}

The Rust Version

The sum_to_limit() function below is a ‘value’ query, identifying and returning the aggregated value for the range from 0 to limit (in our case, sum_tree is aggregated as the sum of the child nodes). The first_gte() function is an ‘index’ query, identifying and returning the first index that meets our criteria (in this case, with a value at least equal to value and with an index no greater than limit). I won’t delve too deeply here, but the code below represents specific examples of these two kinds of queries, and a clever coder could make them more generic (and thus more generally useful).

impl SegmentTree {

    // ...
    
    /// Calculate the sum of the 'leaf' nodes in the inclusive range from 0
    /// to `limit`.
    fn sum_to_limit(&self, limit: usize) -> usize {
        // Recursive function to search through the `sum_tree` for the sum of
        // a given range of values.
        // 
        // # Arguments
        // - tree: A reference to `sum_tree`
        // - node: The true index in `sum_tree` of the node to check.
        // - node_rng: The inclusive range of 'leaf' nodes subordinate to
        //   `node`. For the root node, this will be the range of all 'leaf'
        //   indices.
        // - query_rng: The inclusive range of 'leaf' nodes to get the sum for.
        fn recurse(
          tree: &[usize], 
          node: usize, 
          node_rng: Range<usize>, 
          query_rng: &Range<usize>
        ) -> usize {
            // Base Case: If the node being examined doesn't include any 'leaf'
            // values in the query range, then this node contributes nothing
            // to the final sum.
            if !query_rng.overlaps(&node_rng) { return 0; }

            // Base Case: If the node being examined only includes 'leaf' values
            // that fall within the query range, then the total for this entire
            // node should be included in the final sum.
            if query_rng.encloses(&node_rng) { return tree[node]; }

            // The remaining code only runs if `node_rng` and `query_rng` partially
            // overlap, searching for the sum total of the _parts_ of `query_rng`
            // that overlap.
            
            // Get indices for left and right child nodes and split `node_rng`
            // into the ranges for the left and right child nodes.
            let left_node  = (node * 2) + 1;
            let right_node = (node * 2) + 2;
            let (left_rng, right_rng) = node_rng.split();

            // Get the contributions to the final sum from the left and right
            // child nodes, then add them together and return the total.
            let left_value  = recurse(tree, left_node, left_rng, query_rng);
            let right_value = recurse(tree, right_node, right_rng, query_rng);
            left_value + right_value
        }

        // Start the search on the root node.
        let root = 0;
        let node_rng = 0..=self.offset;
        let query_rng = 0..=limit;
        recurse(&self.sum_tree, root, node_rng, &query_rng)
    }

    /// Search for and return the first 'leaf' index in the range from 0 to 
    /// `limit` where the value is greater than or equal to `value`.
    fn first_gte(&self, value: usize, limit: usize) -> Option<usize> {
        // Recursive function to search through the `max_tree` for the first
        // index whose value is at least equal to `value`.
        // 
        // # Arguments
        // - tree: A reference to `sum_tree`
        // - value: The maximum value that can be held in the index we're
        //   searching for.
        // - node: The true index in `sum_tree` of the node to check.
        // - node_rng: The inclusive range of 'leaf' nodes subordinate to
        //   `node`. For the root node, this will be the range of all 'leaf'
        //   indices.
        // - query_rng: The inclusive range of 'leaf' nodes to get the sum for.
        fn recurse(
          tree: &[usize], 
          value: usize, 
          node: usize, 
          node_rng: Range<usize>, 
          query_rng: &Range<usize>
        ) -> Option<usize> {
            // Base Case: If the node being examined doesn't include any 'leaf'
            // values in the query range, then this node cannot be the one we're
            // searching for.
            if !query_rng.overlaps(&node_rng) { return None; }

            // Base Case: If the node being examined has a value less than `value`,
            // then neither this node nor its children can be the one we're searching
            // for.
            if tree[node] < value { return None; }

            // Base Case: If the node being examined represents a range of only
            // one value (aka, it's a leaf), then it must be the node we're
            // searching for. The previous if clause means that this leaf value
            // must be at least equal to `value`.
            if node_rng.is_single() { return Some(node); }
            
            // The remaining code only runs if `node_rng` and `query_rng` partially
            // overlap, searching for the first index in the overlapping part of
            // the ranges.

            // Get indices for left and right child nodes and split `node_rng`
            // into the ranges for the left and right child nodes.
            let left_node  = (node * 2) + 1;
            let right_node = (node * 2) + 2;
            let (left_rng, right_rng) = node_rng.split();

            // Since we're looking for the _first_ leaf with value at least equal
            // to `value`, we search down the left child node first. If the desired
            // node is not found, then we search down the right node and return
            // the result.
            match recurse(tree, value, left_node, left_rng, query_rng) {
                Some(n) => Some(n),
                None => recurse(tree, value, right_node, right_rng, query_rng),
            }
        }

        // Start the search on the root node.
        let root = 0;
        let node_rng = 0..=self.offset;
        let query_rng = 0..=limit;
        
        // Since the index returned is for the `max_tree`, we need to adjust it
        // to be a 'leaf' index.
        recurse(&self.max_tree, value, root, node_rng, &query_rng).map(|i| i - self.offset)
    }
}

Solving the Puzzle

With an (admittedly specialized) SegmentTree in place, I’m ready to actually solve the puzzle. From the description, there are three methods that BookMyShow needs to support:

new(i32, i32) -> BookMyShow takes the number of rows and the number of seats in each row and returns a BookMyShow struct. Most of the construction is handled by the call to SegmentTree::new(), but note that I’m also keeping up with the number of total seats in each row (row_size) and the number of rows that have been completely filled (filled_rows, 0 to start).
gather(&mut self, k: i32, max_row: i32) -> Vec<i32> seats k guests in the first row with at least k seats, but not past the max_row row. If these guests can be seated, then the segment tree needs to be updated and the function returns a vector with two values, the row number and the seat number of the first seat filled by that group. If no such row is available, then an empty vector is returned. Yes, an Option<[usize; 2]> would make more sense as a return value.
scatter(&mut self, k: i32, max_row: i32) -> bool seats k guests in any available seat in any row up to max_row so long as there are enough open seats. If these guests are seated, the segment tree is updated and the function returns true. Otherwise, it returns false.

struct BookMyShow {
    row_size: usize,
    filled_rows: usize,
    tree: SegmentTree,
}

impl BookMyShow {
    fn new(n: i32, m: i32) -> Self {
        let n = n as usize;
        let m = m as usize;
        let tree = SegmentTree::new(n, m);

        BookMyShow { row_size: m, filled_rows: 0, tree }
    }

    /// Helper function that updates a value in the segment tree and 
    /// increases the number of filled rows any time the number of 
    /// guests seated on a particular row leaves no more open seats
    /// on that row.
    /// I'm passing `open_seats` here to avoid needing to look up the
    /// number of open seats in `row` again (another O(log n) operation).
    fn seat(&mut self, row: usize, open_seats: usize, guests: usize) {
        self.tree.set_value(row, open_seats - guests);
        if open_seats == guests { self.filled_rows += 1; }
    }

    fn gather(&mut self, k: i32, max_row: i32) -> Vec<i32> {
        let guests = k as usize;
        let limit = max_row as usize;

        if self.row_size < guests { return vec![]; }

        match self.tree.first_gte(guests, limit) {
            Some(row) => {
                let open_seats = self.tree.get_value(row);
                self.seat(row, open_seats, guests);
                
                // Total seats - open seats (before seating) gives the number 
                // of taken seats in a row, wich non-coincidentally is the same 
                // as the index of the first seat a guest was seated in.
                vec![row as i32, (self.row_size - open_seats) as i32]
            }
            None => vec![]
        }
    }

    fn scatter(&mut self, k: i32, max_row: i32) -> bool {
        let mut guests = k as usize;
        let limit = max_row as usize;

        // If there aren't enough seats in the theatre, no need to do
        // the lookup.
        if self.row_size * (limit + 1) < guests { return false; }

        // If the number of available seats in rows 0 - `limit` is insufficient
        // to seat all the guests, then they can't be seated.
        let open_seats = self.tree.sum_to_limit(limit);
        if open_seats < guests { return false; }

        // Here, I'm filling up rows from 0 - `limit` (as many as needed, may
        // not be all). The number of `filled_rows` is, non-coincidentally, the
        // index of the first row with available seating.
        while guests > 0 {
            let first_row_with_seats = self.filled_rows;
            let open_seats_in_row = self.tree.get_value(first_row_with_seats);
            
            // Either seat as many guests as the row will hold or all the
            // remaining guests, if there is room.
            let guests_to_seat = guests.min(open_seats_in_row);
            self.seat(first_row_with_seats, open_seats_in_row, guests_to_seat);
            
            // Remove the seated guests from the total number of guests remaining.
            guests -= guests_to_seat;
        }

        true
    }
  }
}

Afterthoughts

I do a fair number of coding puzzles, as you might be able to tell from my series of posts from the past two years of Advent of Code. When working on these puzzles, I love finding a new data structure or algorithm to sink my teeth into and learn about, because it almost always stretches my brain to think about coding in new ways. When I do find something new (to me), I tend to bump my head against it for a while, until I’ve got a solution that works at least a bit, before I take to Google to see how folks smarter than me have approached this problem before. It’s an absolute joy when I can find well-written, explanatory documentation, blog posts, or tutorials that help me understand this new thing.Unfortunately, I couldn’t find anything that quite got me there when I went looking for the Segment Tree used in this challenge. So, in the spirit of “paying it forward”, I decided the internet needed this blog post. If you’re here, and there’s something that I didn’t explain well (i.e., it’s not clear to you what’s going on), then please feel free to comment and ask. You can comment here or find me on Twitter at @ericburden. My hope is that this post spreads the joy of a new discovery to someone else. Happy coding!

P.S.: If you’re the sort of person who writes coding blog posts, and you use single letters as variable names without clearly explaining what they represent, then I hope you sneeze at time when it will be moderately embarrassing. You deserve it.

Segment Trees

An Aside on Ranges

Building the Tree

The C Version

The Rust Version

Getters and Setters

The C Version

The Rust Version

Searches

The C Version

The Rust Version

Solving the Puzzle

Afterthoughts

Search

Categories

Tags