Fork me on GitHub

Project Notes

#371 Grouping ChangeLog Edits

Using ruby to group data in 10-minute buckets; cassidoo’s interview question of the week (2025-10-06).

Notes

The interview question of the week (2025-10-06):

You’re building a tool that tracks component edits and groups them into a changelog. Given an array of edit actions, each with a timestamp and a component name, return an array of grouped changelog entries. Edits to the same component within a 10-minute window should be merged into one changelog entry, showing the component name and the range of timestamps affected.

Example:

const edits = [
  { timestamp: "2025-10-06T08:00:00Z", component: "Header" },
  { timestamp: "2025-10-06T08:05:00Z", component: "Header" },
  { timestamp: "2025-10-06T08:20:00Z", component: "Header" },
  { timestamp: "2025-10-06T08:07:00Z", component: "Footer" },
  { timestamp: "2025-10-06T08:15:00Z", component: "Footer" },
];

> groupChangelogEdits(edits)
> [
    {
        "component": "Footer",
        "start": "2025-10-06T08:07:00Z",
        "end": "2025-10-06T08:15:00Z"
    },
    {
        "component": "Header",
        "start": "2025-10-06T08:00:00Z",
        "end": "2025-10-06T08:05:00Z"
    },
    {
        "component": "Header",
        "start": "2025-10-06T08:20:00Z",
        "end": "2025-10-06T08:20:00Z"
    }
]

Thinking about the problem

I appears quite a simple problem, but grouping “edits to the same component within a 10-minute window” presents a number of possibilities. In the real world, there’s probably a conversation required to drill down on the essential requirement. Here are some possibilities:

The most straight-forward and naïve approach: changes are grouped with the earliest change possible i.e. we just run through the changes chronologically. This can however create some unrealistic groupings. Consider the following example:

  • changes at the following times:
    • 8:05:00, 8:06:00, 8:14:00, 8:16:00, 8:16:10, 8:16:30
  • The naïve approach would yield two groups:
    • (8:05:00, 8:06:00, 8:14:00) and
    • (8:16:00, 8:16:10, 8:16:30)
  • However, if we are trying to group changes that happened “in the same burst of activity”, the change at 8:14:00 is more closely associated with the second group, i.e. we should group as follows:
    • (8:05:00, 8:06:00) and
    • (8:14:00, 8:16:00, 8:16:10, 8:16:30)

In other words, a second approach would group changes to minimise the distance (time) between grouped elements.

However, both of these approaches produce groupings for the different components that are not going to be aligned. In a real-world scenario, it may important to be able to align and compare changes across components within the same time buckets. This would suggest an approach that would align the 10-minute buckets to a constant clock time. While this may arise IRL, the first example grouping 08:07:00 to 08:15:00 demonstrates this is not the case in this question.

Initial Approach

I’ll keep it simple: the naïve approach that just starts trying to bucket for every successive change.

The core implementation looks a bit busy:

  • it first groups the data by component
  • then we sort and process the components alphabetically (as it seems to have been done in the example)
  • make sure the timestamps are sorted before grouping
  • it groups by looking ahead to the next log entry. This avoids having to deal with a tail-end-charlie a the end of the loop
  def initial_solution
    result = []
    logs_by_component = input.group_by { |entry| entry['component'] }
    logs_by_component.keys.sort.each do |key|
      log_times = logs_by_component[key].map { |entry| Time.parse(entry['timestamp']) }.sort
      current_log_start = nil
      log_times.size.times do |i|
        current_log_start ||= log_times[i]
        if log_times[i + 1].nil? || (log_times[i + 1] - current_log_start) > 10 * 60
          result << { 'component' => key, 'start' => current_log_start.utc.iso8601, 'end' => log_times[i].utc.iso8601 }
          current_log_start = nil
        end
      end
    end
    result
  end

Running with the example data set provided:

$ ./example.rb ./data_eg1.json
Using algorithm: initial_solution
Input array: [
  {
    "timestamp": "2025-10-06T08:00:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:05:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:20:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:07:00Z",
    "component": "Footer"
  },
  {
    "timestamp": "2025-10-06T08:15:00Z",
    "component": "Footer"
  }
]
Result: [
  {
    "component": "Footer",
    "start": "2025-10-06T08:07:00Z",
    "end": "2025-10-06T08:15:00Z"
  },
  {
    "component": "Header",
    "start": "2025-10-06T08:00:00Z",
    "end": "2025-10-06T08:05:00Z"
  },
  {
    "component": "Header",
    "start": "2025-10-06T08:20:00Z",
    "end": "2025-10-06T08:20:00Z"
  }
]

And a second result using the example of clustered timings:

$ ./example.rb data_eg2.json
Using algorithm: initial_solution
Input array: [
  {
    "timestamp": "2025-10-06T08:05:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:06:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:14:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:16:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:16:10Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:16:30Z",
    "component": "Header"
  }
]
Result: [
  {
    "component": "Header",
    "start": "2025-10-06T08:05:00Z",
    "end": "2025-10-06T08:14:00Z"
  },
  {
    "component": "Header",
    "start": "2025-10-06T08:16:00Z",
    "end": "2025-10-06T08:16:30Z"
  }
]

Example Code

Final code is in example.rb:

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'json'
require 'time'

class ChangeLogGrouper
  attr_reader :input
  attr_reader :logging

  def initialize(input, logging = false)
    @input = input
    @logging = logging
  end

  def initial_solution
    result = []
    logs_by_component = input.group_by { |entry| entry['component'] }
    logs_by_component.keys.sort.each do |key|
      log_times = logs_by_component[key].map { |entry| Time.parse(entry['timestamp']) }.sort
      current_log_start = nil
      log_times.size.times do |i|
        current_log_start ||= log_times[i]
        if log_times[i + 1].nil? || (log_times[i + 1] - current_log_start) > 10 * 60
          result << { 'component' => key, 'start' => current_log_start.utc.iso8601, 'end' => log_times[i].utc.iso8601 }
          current_log_start = nil
        end
      end
    end
    result
  end
end

if __FILE__==$PROGRAM_NAME
  (puts "Usage: ruby #{$0} (json-file) (algorithm)"; exit) unless ARGV.length > 0
  json_file_name = ARGV[0]
  algorithm = ARGV[1] || 'initial_solution'
  begin
    input = JSON.parse(File.read(json_file_name))
  rescue
    puts "Error: Invalid JSON file - #{json_file_name}"
    exit
  end
  puts "Using algorithm: #{algorithm}"
  calculator = ChangeLogGrouper.new(input, true)
  puts "Input array: #{JSON.pretty_generate(calculator.input)}"
  puts "Result: #{JSON.pretty_generate(calculator.send(algorithm))}"
end

With tests in test_example.rb:

$ ./test_example.rb
Run options: --seed 38607

# Running:

..

Finished in 0.000544s, 3676.4706 runs/s, 3676.4706 assertions/s.

2 runs, 2 assertions, 0 failures, 0 errors, 0 skips

Credits and References

About LCK#371 rubycassidoo

This page is a web-friendly rendering of my project notes shared in the LittleCodingKata GitHub repository.

Project Source on GitHub Return to the LittleCodingKata Catalog
About LittleCodingKata

LittleCodingKata is my collection of programming exercises, research and code toys broadly spanning things that relate to programming and software development (languages, frameworks and tools).

These range from the trivial to the complex and serious. Many are inspired by existing work and I'll note credits and references where applicable. The focus is quite scattered, as I variously work on things new and important in the moment, or go back to revisit things from the past.

This is primarily a personal collection for my own edification and learning, but anyone who stumbles by is welcome to borrow, steal or reference the work here. And if you spot errors or issues I'd really appreciate some feedback - create an issue, send me an email or even send a pull-request.

Follow the Blog follow projects and notes as they are published in your favourite feed reader