Advent of Code 2020 - Day 21

By Eric Burden | December 22, 2020

In the spirit of the holidays (and programming), I’ll be posting my solutions to the Advent of Code 2020 puzzles here, at least one day after they’re posted (no spoilers!). I’ll be implementing the solutions in R because, well, that’s what I like! What I won’t be doing is posting any of the actual answers, just the reasoning behind them.

Also, as a general convention, whenever the puzzle has downloadable input, I’m saving it in a file named input.txt.

Day 21 - Allergen Assessment

Find the problem description HERE.

Part One - Inferring Ingredients

According to today’s puzzle, we are either allergic to most everything or we are a very young child who needs to avoid exposure to allergens in case of unknown interactions. Probably the first one, right? Either way, it would be helpful if we spoke the language, but at least we’re pretty good at identifying patterns! Essentially, we’re looking for ingredients that do not appear in every recipe containing a particular allergen, for example, if there are two recipes that contain ‘dairy’ and one of them contains nhms and the other doesn’t, then you know nhms is not your dairy allergen.

# Setup ------------------------------------------------------------------------

test_input <- c(
  "mxmxvkd kfcds sqjhc nhms (contains dairy, fish)",
  "trh fvjkl sbzzf mxmxvkd (contains dairy)",
  "sqjhc fvjkl (contains soy)",
  "sqjhc mxmxvkd sbzzf (contains fish)"
real_input <- readLines('input.txt')


# Functions --------------------------------------------------------------------

# Helper function, given a line number `line_no` and a list where the first
# element is a space-separated vector of ingredients and the second element
# is a space-separated vector of allergens `split_line`, returns a data frame
# with one row for each combination of `line_no` (as 'recipe'), ingredient, 
# and allergen.
line_to_df <- function(line_no, split_line) {
  parts <- strsplit(split_line, ' ')
    recipe = line_no, 
    ingredient = parts[[1]], 
    allergen = parts[[2]], 
    stringsAsFactors = F

# Helper function, given a list of lines from the puzzle input, returns a 
# data frame containing one row per line number (recipe), ingredient, and
# allergen
parse_input <- function(input) {
  line_count <- length(input)
  only_words <- gsub('[\\(\\),]', '', input)
  split_lines <- strsplit(only_words, 'contains ')
  dfs <- mapply(line_to_df, 1:line_count, split_lines, SIMPLIFY = F), dfs)

# Processing -------------------------------------------------------------------

recipe_list <- parse_input(real_input)  # Parse the input

# For each type of allergen, return a data frame consisting of that allergen
# and all ingredients that appear in all recipes containing that allergen.
possible_allergens <- recipe_list %>% 
  group_by(allergen, recipe) %>% 
  summarise(ingredients = list(ingredient)) %>% 
  summarise(ingredient = reduce(ingredients, intersect))

# Remove all rows containing ingredients that definitely contain an allergen
# (though we're not sure which one), then count the number of ingredient
# and recipe combinations remaining
answer1 <- recipe_list %>% 
  filter(!(ingredient %in% possible_allergens$ingredient)) %>% 
  distinct(recipe, ingredient) %>% 

Hey, we used a few more pipes! Also, notice that summarise(x = list(y)) %>% summarise(z = reduce(x, intersect)) pattern. This is how we create a column containing all the ingredient that appear in every recipe for each allergen. This is a pretty uncommon pattern (I found a couple of StackOverflow questions that either get this wrong or make it really complicated), so I encourage you to try it out and see how it works.

Part Two - Dietary Detective

Once we remove the ingredients we know won’t cause a reaction, we can figure out which ingredients contain which allergens. Maybe we’ve just got one really severe allergy after all. If you recall Day 16, Part 2, this is practically the same thing.

# Setup ------------------------------------------------------------------------


# Processing -------------------------------------------------------------------

# Setup, count the number of ingredients that contain an allergen and create an
# empty data frame to add allergen/ingredient combinations to as we confirm
num_allergen_ingredients <- length(unique(possible_allergens$ingredient))
confirmed_allergens <- data.frame(
  allergen = character(0), 
  ingredient = character(0)

# Until the number of confirmed allergens is as large as the number of allergens...
while (nrow(confirmed_allergens) < num_allergen_ingredients) {
  # Remove all the previously confirmed allergens from the table of possible
  # allergens, then identify which ingredients are now associated with a 
  # single allergen
  newly_confirmed_allergens <- possible_allergens %>% 
    filter(!(allergen %in% confirmed_allergens$allergen)) %>% 
    group_by(ingredient) %>% 
    mutate(n = n()) %>% 
    filter(n == 1) %>% 
  # Add the newly confirmed allergens to the ongoing list of confirmed 
  # allergens
  confirmed_allergens <- bind_rows(

# Sort the table of confirmed allergen/ingredient combinations by allergen name,
# then collapse the ingredient names with commas between
answer2 <- confirmed_allergens %>% 
  arrange(allergen) %>% 
  pull(ingredient) %>% 
  paste(collapse = ',')

Thankfully, we’ve got a pretty good pattern to follow for this kind of problem.


For once, I’m glad to be catching up a bit on AoC challenges, which means I’ve tackled Days 16 and 21 less than five days apart, so it was still pretty fresh on my mind. These types of ‘data manipulation’ heavy challenges are a nice change because they’re a bit more comfortable for me and offer a nice ‘break’ from some of the more ‘fast implementation’ problems that I don’t think about very often. The flip side, though, is I don’t feel like I learn as much from them, but I find the mix of problem types in AoC to be pretty great.

If you found a different solution (or spotted a mistake in one of mine), please drop me a line!

comments powered by Disqus