1000 Forms of Bunnies victor's tech art blog.

Assembly Insight - Part 1

Just came across this Gem when browsing cat videos on Youtube. After watching it I decided to run through the case by myself on my mac to see how a C program is working under the hood. I also want to be a bit geeky this time - just use Vim to write raw C program, and use command line to compile the program and also run the executable. Everything in the terminal.

Compile and Run C Program

First, here is the C program for testing - the same as the testing case in the video. Save it as fib.c.

#include <stdio.h>
int main(void)
{
    int x, y, z;
    x = 0;
    y = 1;
    do
    {
        printf("%d\n", x);
        z = x + y;
        x = y;
        y = z;
    } while (x < 255);
}

Secondly, use gcc to compile the .c program - type command gcc fib.c. GCC is GNU Compiler Collection, a compiler system produced by the GNU Project supporting various programming languages. Also we can use clang - type command clang fib.c. Clang is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, OpenMP, OpenCL, and CUDA, designed to act as a drop-in replacement for the GNU Compiler Collection.

This will compile the program and generate a.out file. To run the executable we have to use command ./a.out.

a.out is a file format used in older versions of Unix-like computer OS for executables, object code, and, in the later systems, shared libraries. This is an abbreviated form of “assembler output”. Also we can use clang -o fib fib.c to generate file fib instead of the default a.out :)

Assembly Insight

Then use command otool -tv fib to observe the corresponding assembly code of our compiled C program. otool is object file displaying tool, -t is to print the text section and -v is to disassemble it to make it readable.

And we got:

fib:
(__TEXT,__text) section
_main:
0000000100000f30        pushq   %rbp
0000000100000f31        movq    %rsp, %rbp
0000000100000f34        subq    $0x20, %rsp
0000000100000f38        movl    $0x0, -0x4(%rbp)

0000000100000f3f        movl    $0x0, -0x8(%rbp) # x = 0;
# movl (move long) moves value $0x0 into address offset in the memory -0x8 AKA x
# %rbp (stack base pointer)
0000000100000f46        movl    $0x1, -0xc(%rbp) # y = 1;

// start of loop
0000000100000f4d        leaq    0x5a(%rip), %rdi
0000000100000f54        movl    -0x8(%rbp), %esi
0000000100000f57        movb    $0x0, %al
0000000100000f59        callq   0x100000f8c # Call printf()

0000000100000f5e        movl    -0x8(%rbp), %esi
# x is moved into %esi register
0000000100000f61        addl    -0xc(%rbp), %esi
# y is added into %esi register
0000000100000f64        movl    %esi, -0x10(%rbp) # z = x + y;
# %esi is moved into z
0000000100000f67        movl    -0xc(%rbp), %esi
0000000100000f6a        movl    %esi, -0x8(%rbp) # x = y;
# y is moved into x
0000000100000f6d        movl    -0x10(%rbp), %esi
0000000100000f70        movl    %esi, -0xc(%rbp) # y = z;
# z is moved into y
0000000100000f73        movl    %eax, -0x14(%rbp)
# %eax contains the return value of the printf call
0000000100000f76        cmpl    $0xff, -0x8(%rbp) # while(x < 255);
# cmpl (compare long) compare x with 255($0xff)
0000000100000f7d        jl      0x100000f4d
# jump to address 0x100000f4d
0000000100000f83        movl    -0x4(%rbp), %eax
0000000100000f86        addq    $0x20, %rsp
0000000100000f8a        popq    %rbp
0000000100000f8b        retq

Assembly Syntax Explanation

movl vs movq

(Source) When using GAS(GNU Assembler) assembly language mnemonics for an Intel CPU, then:

  • movq moves a quadword (64-bits) from source to destination.
  • movl moves a long (32-bits) from source to destination.

Note that, in the Intel x86 world, a word is 16 bits, a doubleword is 32 bits (two 16-bit words), and a quadword is 64 bits (four 16-bit words).

GAS syntax uses these suffixes (e.g., q, l, b, etc.) on the instruction mnemonics to indicate how much data is being moved.

The Intel syntax doesn’t use suffixes, but instead uses the register name to deduce the amount of data, since each register has a known size. So, in Intel syntax, both instructions would use the same mnemonic: mov.

Also note that, in GAS syntax, the source operand appears first, and the destination operand appears second. This is the reverse order of the Intel syntax, where the destination operand appears first, and the source operand appears second.

TBC

comments powered by Disqus