Fork me on GitHub

Project Notes

CSV with Ruby

All about reading and writing CSV with Ruby, including large file handling.

Notes

CSV support is implemented in the CSV standard library. There are some gems that provide additional CVS processing features, but in most cases the stadnard library is just fine.

Writing Large Files

The large_file_write.rb example demonstrates 2 ways of writing large files:

  • using CSV.generate to build a CSV data structure that is then written to file
  • using CSV.generate_line to generate a file line-by-line

Using CSV.generate, results show a relatively large peak memory use:

$ ./large_file_write.rb generate
writing 2000000 rows using generate...
Max memory usage: 91.15625Mb

Using CSV.generate_line, results show minimal peak memory use:

$ ./large_file_write.rb generate_line
writing 2000000 rows using generate_line...
Max memory usage: 32.765625Mb

Reading Large Files

The large_file_read.rb example demonstrates 2 ways of reading large files.

  • using CSV.read to load the CSV file into memory for processing
  • using CSV.foreach to process the CSV as an I/O stream, line-by-line.

Note: run large_file_write.rb first to generate the csv file to be read.

Using CSV.read, results show excessive memory usage:

$ ./large_file_read.rb read
reading using read...
#<CSV::Row "col1":"row 1999999" "col2":"1999999">
Max memory usage: 1493.1875Mb

Using CSV.foreach, results show minimal peak memory use:

$ ./large_file_read.rb foreach
reading using foreach...
#<CSV::Row "col1":"row 1999999" "col2":"1999999">
Max memory usage: 37.703125Mb

Credits and References

About LCK#286 ruby
Project Source on GitHub Return to the Project Catalog

This page is a web-friendly rendering of my project notes shared in the LittleCodingKata GitHub repository.

LittleCodingKata is my collection of programming exercises, research and code toys broadly spanning things that relate to programming and software development (languages, frameworks and tools).

These range from the trivial to the complex and serious. Many are inspired by existing work and I'll note credits and references where applicable. The focus is quite scattered, as I variously work on things new and important in the moment, or go back to revisit things from the past.

This is primarily a personal collection for my own edification and learning, but anyone who stumbles by is welcome to borrow, steal or reference the work here. And if you spot errors or issues I'd really appreciate some feedback - create an issue, send me an email or even send a pull-request.