Fork me on GitHub

Project Notes

#125 Optimization

Looking at how C compilers optimize loops.

Notes

C compilers usually supprot a range of loop optimizations, including:

  • Loop Unroll
  • Loop Unswitching
  • Loop Interchange
  • Detection of memcpy, memset idioms
  • Delete side-effect free loops
  • Loop Distribution
  • Loop Vectorization

Examples

example1.c is based on a question on stackoverflow. It creates an array, loops and does a pointless operation on each element, then discards the array:

int main(void) {
  double *a;
  a = calloc(SIZE, sizeof *a);
  for (size_t i = 0; i < SIZE; ++i) {
    a[i] = a[i];
  }
  free(a);
}

The question is: would this be so obviously pointless and without side-effect as to be optimized away?

With clang-1000.11.45.5, it turns out that yes, the compiler is smart enough to figure this is a “side-effect free loop”, as long as optimization is enabled.

  • with no optimizations enabled (gcc -std=c17 -O0 -S example1.c), the generated code includes the loop.
  • with level 1 optimizations enabled (gcc -std=c17 -O1 -S example1.c), the generated code excludes the loop.

Optimizing the loop away may not always be desirable, especially in embedded situations where a NOP loop might be there for timing reasons. example2.c demonstrates one way of forcing the loop to remain - declaring the affected variable as volatile. The loop remains in the compiled code even with optimization turned up to level 3.

Running the Examples

Easy to do by hand, but here’s a Makefle to do it instead:

$ make
gcc -Wall -std=c17 -S -O0 -o example1-O0.s example1.c
gcc -Wall -std=c17 -S -O1 -o example1-O1.s example1.c
gcc -Wall -std=c17 -S -O1 -o example2-O1.s example2.c
tail -n+1 example1-O0.s example1-O1.s example2-O1.s
==> example1-O0.s <==
  .section  __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 13
  .globl  _main                   ## -- Begin function main
  .p2align  4, 0x90
_main:                                  ## @main
  .cfi_startproc
## %bb.0:
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset %rbp, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register %rbp
  subq  $32, %rsp
  movl  $100, %eax
  movl  %eax, %edi
  movl  $8, %eax
  movl  %eax, %esi
  movl  $0, -4(%rbp)
  callq _calloc
  movq  %rax, -16(%rbp)
  movq  $0, -24(%rbp)
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
  cmpq  $100, -24(%rbp)
  jae LBB0_4
## %bb.2:                               ##   in Loop: Header=BB0_1 Depth=1
  movq  -16(%rbp), %rax
  movq  -24(%rbp), %rcx
  movsd (%rax,%rcx,8), %xmm0    ## xmm0 = mem[0],zero
  movq  -16(%rbp), %rax
  movq  -24(%rbp), %rcx
  movsd %xmm0, (%rax,%rcx,8)
## %bb.3:                               ##   in Loop: Header=BB0_1 Depth=1
  movq  -24(%rbp), %rax
  addq  $1, %rax
  movq  %rax, -24(%rbp)
  jmp LBB0_1
LBB0_4:
  movq  -16(%rbp), %rax
  movq  %rax, %rdi
  callq _free
  movl  -4(%rbp), %eax
  addq  $32, %rsp
  popq  %rbp
  retq
  .cfi_endproc
                                        ## -- End function

.subsections_via_symbols

==> example1-O1.s <==
  .section  __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 13
  .globl  _main                   ## -- Begin function main
  .p2align  4, 0x90
_main:                                  ## @main
  .cfi_startproc
## %bb.0:
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset %rbp, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register %rbp
  xorl  %eax, %eax
  popq  %rbp
  retq
  .cfi_endproc
                                        ## -- End function

.subsections_via_symbols

==> example2-O1.s <==
  .section  __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 13
  .globl  _main                   ## -- Begin function main
  .p2align  4, 0x90
_main:                                  ## @main
  .cfi_startproc
## %bb.0:
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset %rbp, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register %rbp
  movl  $100, %edi
  movl  $8, %esi
  callq _calloc
  movq  $-800, %rcx             ## imm = 0xFCE0
  .p2align  4, 0x90
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
  movsd 800(%rax,%rcx), %xmm0   ## xmm0 = mem[0],zero
  movsd %xmm0, 800(%rax,%rcx)
  addq  $8, %rcx
  jne LBB0_1
## %bb.2:
  movq  %rax, %rdi
  callq _free
  xorl  %eax, %eax
  popq  %rbp
  retq
  .cfi_endproc
                                        ## -- End function

.subsections_via_symbols

Credits and References

About LCK#125 c
Project Source on GitHub Return to the Project Catalog

LittleCodingKata is my collection of programming exercises, research and code toys broadly spanning things that relate to programming and software development (languages, frameworks and tools).

These range from the trivial to the complex and serious. Many are inspired by existing work and I'll note credits and references where applicable. The focus is quite scattered, as I variously work on things new and important in the moment, or go back to revisit things from the past.

This is primarily a personal collection for my own edification and learning, but anyone who stumbles by is welcome to borrow, steal or reference the work here. And if you spot errors or issues I'd really appreciate some feedback - create an issue, send me an email or even send a pull-request.

LittleArduinoProjects LittleModelArt More on my blog