Setting up an assembler

Setting up an assembler

The time comes in every programmer’s life where they look back on their career and they stop and they think, “It’s time to learn some assembler.”

This time has come for me.

First I need to choose an assembler. It seems there are a few available. I decided to go with nasm for no reason other than I needed something. If the assembler bug hits me big, I’ll step back and spend some time comparing the different options somewhere.

All my assembly here will be run on a Linux based x86_64 machine.

Greetings earthling

Let’s start with a basic Hello World.

First install nasm. It’s likely available in all distributions package managers. Create a folder somewhere for the project. Then create and edit a file called greetings.asm.

We’ll start with a comment:

; Here we do amazing things, like say hello.

Now, onto the code.

Data

An executable has several sections inside it. Most compilers for other languages handle this for us, however in assembler we need to set them up ourselves.

The first section is the data section. Here we setup any normally static data that we may want to use during the running of the program. The section is defined like:

section .data

We want to store some text in this section that we can use to offer some form of salutation to the kind person who wishes to run our program. To do this we need to use the pseudo instruction db. A pseudo instruction is not an intruction that gets compiled to machine code, it is an instruction to the assembler to do something - in this case store some bytes at this position in the executable.

  greetings db "Greetings earthlings!", 10

The first word greetings is a label we can use to reference the pointer to the position of this text in memory.

The text we are storing is the text “Greetings earthlings!”, followed by the character 10 - which is a line feed.

Text

The next section we need is text. This is the main part that contains the code to run.

section .text

The first thing we have to do is create a global symbol that will be used to tell the linker what part of our code we should run first:

  global _start

Then we need to insert the label into our code:

_start:

You can insert labels throughout your code, which is useful when you want to tell the assembler to start running code from that position. The _start label is the only one that you absolutely have to insert - and it has to be called _start.

So how do we output text to stdout? We need to run a system call. System calls are basically instructions to the operating system to get it to do stuff. In our case we want to get it to write some text to standard out for us.

Each syscall is identified by a number. You can find the list of available syscalls together with their number in unistd_64.h most likely found in your filesystem under /usr/include/asm. For example, we want to write some data, so this line is of interest:

#define __NR_write 1

We want the write syscall, which is # 1.

At this point it may be worth running man syscall to get a lowdown on syscalls, in particular the section on Architecture calling conventions. Calling syscalls is slightly different depending on your architecture. Of note, for us on x86_64, this is what we need:

Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                  call #  val  val2
───────────────────────────────────────────────────────────────────
x86-64      syscall               rax     rax  rdx  -        5

This tells us that we need to pass the system call number in the rax register. There are a number of different registers an the x86_64. A register is a tiny storage location that the CPU can access very quickly. If accessing main memory was a summer trip around the world, accessing a register would be popping over to your neighbours to feed their cat. So we try to pass as much data around as possible via registers.

So, we know we need to store the syscall number (1) in the rax register to make our syscall. Let’s write a proper line of code!

  mov rax, 1

This line says ‘move the value 1 into the rax register’.

We want to pass more information to the write syscall.

Have a look at the man syscall page again. This shows us how to pass more parameters to syscalls.

Arch/ABI      arg1  arg2  arg3  arg4  arg5  arg6  arg7  Notes
──────────────────────────────────────────────────────────────
x86-64        rdi   rsi   rdx   r10   r8    r9    -

The first parameter goes in the rdi register, second in rsi and so on.

What is the first parameter to write? Syscalls are documented in Section 2 of man. We can look up more details by running man 2 write. Here we see the following C function definition:

ssize_t write(int fd, const void *buf, size_t count);

Seeing the C definition is as good as it gets for us assembler programmers, it seems!

We can see the first parameter is the file descripton. We know that the fd of stdout is 1. So we can set that to rdi - our first parameter as indicated above.

  mov rdi, 1

The next parameter is a pointer to the buffer to write. We set this pointer up in the data section and gave it the label greetings. We can use that here.

  mov rsi, greetings

The final parameter is the number of bytes we want to write. We can count this - “Greetings earthlings!" has 22 characters (including the newline).

  mov rdx, 22

Yes! All our parameters are setup. Lets make the syscall:

  syscall

Exit

We have to make one more syscall to tell the OS that we are now safely and happily exiting. This is done with the exit syscall, defined as:

#define __NR_exit 60

and man 2 exit:

    noreturn void _exit(int status);

Typically a status of 0 means we are exiting successfully. So we know what to do:

  mov rax, 60
  mov rdi, 0
  syscall

The full code

; Here we do amazing things, like say hello.
section .data
  greetings db "Greetings earthlings!", 10

section .text
  global _start

_start:
  mov rax, 1          ; write system call
  mov rdi, 1          ; fd for stdout
  mov rsi, greetings  ; pointer to our text to write
  mov rdx, 22         ; 21 is the file length
  syscall

  mov rax, 60         ; exit system call
  mov rdi, 0          ; exit code 0
  syscall

Compiling

We need to build this thing. Building comes in two stages, compiling and linking. nasm handles the compiling for us.

Run this to compile:

nasm -f elf64 -g -F stabs greetings.asm

This creates the object file - greetings.o, which is not quite executable. We need to link it. For this we can use ld:

ld greetings.o -o greetings

This should produce a beautiful executable called greetings that we can run:

╰─$ ./greetings
Greetings earthlings!

To make things a tiny bit easier we can put the build instructions into a Makefile:

all: greetings.o
        ld greetings.o -o greetings

greetings.o: greetings.asm
        nasm -f elf64 -g -F stabs greetings.asm

Now, instead of running two commands to build your changes you can just run make.

This assembly stuff seems pretty easy so far…