#371 Grouping ChangeLog Edits

Using ruby to group data in 10-minute buckets; cassidoo’s interview question of the week (2025-10-06).

Notes

The interview question of the week (2025-10-06):

You’re building a tool that tracks component edits and groups them into a changelog. Given an array of edit actions, each with a timestamp and a component name, return an array of grouped changelog entries. Edits to the same component within a 10-minute window should be merged into one changelog entry, showing the component name and the range of timestamps affected.

Example:

const edits = [
  { timestamp: "2025-10-06T08:00:00Z", component: "Header" },
  { timestamp: "2025-10-06T08:05:00Z", component: "Header" },
  { timestamp: "2025-10-06T08:20:00Z", component: "Header" },
  { timestamp: "2025-10-06T08:07:00Z", component: "Footer" },
  { timestamp: "2025-10-06T08:15:00Z", component: "Footer" },
];

> groupChangelogEdits(edits)
> [
    {
        "component": "Footer",
        "start": "2025-10-06T08:07:00Z",
        "end": "2025-10-06T08:15:00Z"
    },
    {
        "component": "Header",
        "start": "2025-10-06T08:00:00Z",
        "end": "2025-10-06T08:05:00Z"
    },
    {
        "component": "Header",
        "start": "2025-10-06T08:20:00Z",
        "end": "2025-10-06T08:20:00Z"
    }
]

Thinking about the problem

I appears quite a simple problem, but grouping “edits to the same component within a 10-minute window” presents a number of possibilities. In the real world, there’s probably a conversation required to drill down on the essential requirement. Here are some possibilities:

The most straight-forward and naïve approach: changes are grouped with the earliest change possible i.e. we just run through the changes chronologically. This can however create some unrealistic groupings. Consider the following example:

changes at the following times:
- 8:05:00, 8:06:00, 8:14:00, 8:16:00, 8:16:10, 8:16:30
The naïve approach would yield two groups:
- (8:05:00, 8:06:00, 8:14:00) and
- (8:16:00, 8:16:10, 8:16:30)
However, if we are trying to group changes that happened “in the same burst of activity”, the change at 8:14:00 is more closely associated with the second group, i.e. we should group as follows:
- (8:05:00, 8:06:00) and
- (8:14:00, 8:16:00, 8:16:10, 8:16:30)

In other words, a second approach would group changes to minimise the distance (time) between grouped elements.

However, both of these approaches produce groupings for the different components that are not going to be aligned. In a real-world scenario, it may important to be able to align and compare changes across components within the same time buckets. This would suggest an approach that would align the 10-minute buckets to a constant clock time. While this may arise IRL, the first example grouping 08:07:00 to 08:15:00 demonstrates this is not the case in this question.

Initial Approach

I’ll keep it simple: the naïve approach that just starts trying to bucket for every successive change.

The core implementation looks a bit busy:

it first groups the data by component
then we sort and process the components alphabetically (as it seems to have been done in the example)
make sure the timestamps are sorted before grouping
it groups by looking ahead to the next log entry. This avoids having to deal with a tail-end-charlie a the end of the loop

  def initial_solution
    result = []
    logs_by_component = input.group_by { |entry| entry['component'] }
    logs_by_component.keys.sort.each do |key|
      log_times = logs_by_component[key].map { |entry| Time.parse(entry['timestamp']) }.sort
      current_log_start = nil
      log_times.size.times do |i|
        current_log_start ||= log_times[i]
        if log_times[i + 1].nil? || (log_times[i + 1] - current_log_start) > 10 * 60
          result << { 'component' => key, 'start' => current_log_start.utc.iso8601, 'end' => log_times[i].utc.iso8601 }
          current_log_start = nil
        end
      end
    end
    result
  end

Running with the example data set provided:

$ ./example.rb ./data_eg1.json
Using algorithm: initial_solution
Input array: [
  {
    "timestamp": "2025-10-06T08:00:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:05:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:20:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:07:00Z",
    "component": "Footer"
  },
  {
    "timestamp": "2025-10-06T08:15:00Z",
    "component": "Footer"
  }
]
Result: [
  {
    "component": "Footer",
    "start": "2025-10-06T08:07:00Z",
    "end": "2025-10-06T08:15:00Z"
  },
  {
    "component": "Header",
    "start": "2025-10-06T08:00:00Z",
    "end": "2025-10-06T08:05:00Z"
  },
  {
    "component": "Header",
    "start": "2025-10-06T08:20:00Z",
    "end": "2025-10-06T08:20:00Z"
  }
]

And a second result using the example of clustered timings:

$ ./example.rb data_eg2.json
Using algorithm: initial_solution
Input array: [
  {
    "timestamp": "2025-10-06T08:05:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:06:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:14:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:16:00Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:16:10Z",
    "component": "Header"
  },
  {
    "timestamp": "2025-10-06T08:16:30Z",
    "component": "Header"
  }
]
Result: [
  {
    "component": "Header",
    "start": "2025-10-06T08:05:00Z",
    "end": "2025-10-06T08:14:00Z"
  },
  {
    "component": "Header",
    "start": "2025-10-06T08:16:00Z",
    "end": "2025-10-06T08:16:30Z"
  }
]

Example Code

Final code is in example.rb:

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'json'
require 'time'

class ChangeLogGrouper
  attr_reader :input
  attr_reader :logging

  def initialize(input, logging = false)
    @input = input
    @logging = logging
  end

  def initial_solution
    result = []
    logs_by_component = input.group_by { |entry| entry['component'] }
    logs_by_component.keys.sort.each do |key|
      log_times = logs_by_component[key].map { |entry| Time.parse(entry['timestamp']) }.sort
      current_log_start = nil
      log_times.size.times do |i|
        current_log_start ||= log_times[i]
        if log_times[i + 1].nil? || (log_times[i + 1] - current_log_start) > 10 * 60
          result << { 'component' => key, 'start' => current_log_start.utc.iso8601, 'end' => log_times[i].utc.iso8601 }
          current_log_start = nil
        end
      end
    end
    result
  end
end

if __FILE__==$PROGRAM_NAME
  (puts "Usage: ruby #{$0} (json-file) (algorithm)"; exit) unless ARGV.length > 0
  json_file_name = ARGV[0]
  algorithm = ARGV[1] || 'initial_solution'
  begin
    input = JSON.parse(File.read(json_file_name))
  rescue
    puts "Error: Invalid JSON file - #{json_file_name}"
    exit
  end
  puts "Using algorithm: #{algorithm}"
  calculator = ChangeLogGrouper.new(input, true)
  puts "Input array: #{JSON.pretty_generate(calculator.input)}"
  puts "Result: #{JSON.pretty_generate(calculator.send(algorithm))}"
end

With tests in test_example.rb:

$ ./test_example.rb
Run options: --seed 38607

# Running:

..

Finished in 0.000544s, 3676.4706 runs/s, 3676.4706 assertions/s.

2 runs, 2 assertions, 0 failures, 0 errors, 0 skips

Credits and References