Fork me on GitHub

6. Linker

While writing a multi-file program, each file is assembled individually into object files. The linker combines these object files to form the final executable.

Figure 2. Role of the Linker

Role of the Linker

While combining the object files together, the linker performs the following operations.

  1. Symbol Resolution
  2. Relocation

We will look into these operations, in detail, in this section.

6.1. Symbol Resolution

In a single file program, while producing the object file, all references to labels are replaced by their corresponding addresses by the assembler. But in a multi-file program, if there are any references to labels defined in another file, the assembler marks these references as "unresolved". When these object files are passed to the linker, the linker determines the values for these references from the other object files, and patches the code with the correct values.

The sum of array example is split into two files, to demonstrate the symbol resolution performed by the linker. The two files will be assembled and their symbol tables examined to show the presence of unresolved references.

The file sum-sub.s contains the sum subroutine, and the file main.s invokes the subroutine with the required arguments. The source of the files is shown below.

Listing 4. main.s - Subroutine Invocation

        .text
        b start                 @ Skip over the data
arr:    .byte 10, 20, 25        @ Read-only array of bytes
eoa:                            @ Address of end of array + 1

        .align
start:
        ldr   r0, =arr          @ r0 = &arr
        ldr   r1, =eoa          @ r1 = &eoa

        bl    sum               @ Invoke the sum subroutine

stop:   b stop

Listing 5. sum-sub.s - Subroutine Definition

        @ Args
        @ r0: Start address of array
        @ r1: End address of array
        @
        @ Result
        @ r3: Sum of Array

        .global sum

sum:    mov   r3, #0            @ r3 = 0
loop:   ldrb  r2, [r0], #1      @ r2 = *r0++    ; Get array element
        add   r3, r2, r3        @ r3 += r2      ; Calculate sum
        cmp   r0, r1            @ if (r0 != r1) ; Check if hit end-of-array
        bne   loop              @    goto loop  ; Loop
        mov   pc, lr            @ pc = lr       ; Return when done

A word on the .global directive is in order. In C, all variables declared outside functions are visible to other files, until explicitly stated as static. In assembly, all labels are static AKA local (to the file), until explicitly stated that they should be visible to other files, using the .global directive.

The files are assembled, and the symbol tables are dumped using the nm command.

$ arm-none-eabi-as -o main.o main.s
$ arm-none-eabi-as -o sum-sub.o sum-sub.s
$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
         U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop
00000000 T sum

For now, focus on the letter in the second column, which specifies the symbol type. A t indicates that the symbol is defined, in the text section. A u indicates that the symbol is undefined. A letter in uppercase indicates that the symbol is .global.

It is evident that the symbol sum is defined in sum-sub.o and is not resolved yet in main.o. When the linker is invoked the symbol references will be resolved, and the executable will be produced.

6.2. Relocation

Relocation is the process of changing addresses already assigned to labels. This will also involve patching up all label references to reflect the newly assigned address. Primarily, relocation is performed for the following two reasons:

  1. Section Merging
  2. Section Placement

To understand the process of relocation, an understanding of the concept of sections is essential.

Code and data have different run time requirements. For example code can be placed in read-only memory, and data might require read-write memory. It would be convenient, if code and data is not interleaved. For this purpose, programs are divided into sections. Most programs have at least two sections, .text for code and .data for data. Assembler directives .text and .data, are used to switch back and forth between the two sections.

It helps to imagine each section as a bucket. When the assembler hits a section directive, it puts the code/data following the directive in the selected bucket. Thus the code/data that belong to particular section appear in contiguous locations. The following figures show how the assembler re-arranges data into sections.

Figure 3. Sections

Sections

Now that we have an understanding of sections, let us look into the primary reasons for which relocation is performed.

6.2.1. Section Merging

When dealing with multi-file programs, the sections with the same name (example .text) might appear, in each file. The linker is responsible for merging sections from the input files, into sections of the output file. By default, the sections, with the same name, from each file is placed contiguously and the label references are patched to reflect the new address.

The effects of section merging can be seen by looking at the symbol table of the object files and the corresponding executable file. The multi-file sum of array program can be used to illustrate section merging. The symbol table of the object files main.o and sum-sub.o and the symbol table of the executable file sum.elf is shown below.

$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
         U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop ❶
00000000 T sum
$ arm-none-eabi-ld -Ttext=0x0 -o sum.elf main.o sum-sub.o
$ arm-none-eabi-nm sum.elf
...
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
00000028 t loop ❷
00000024 T sum

The loop symbol has address 0x4 in sum-sub.o, and 0x28 in sum.elf, since the .text section of sum-sub.o is placed right after the .text section of main.o.

6.2.2. Section Placement

When a program is assembled, each section is assumed to start from address 0. And thus labels are assigned values relative to start of the section. When the final executable is created, the section is placed at some address X. And all references to the labels defined within the section, are incremented by X, so that they point to the new location.

The placement of each section at a particular location in memory and the patching of all references to the labels in the section, is done by the linker.

The effects of section placement can be seen by looking at the symbol table of the object file and the corresponding executable file. The single file sum of array program can be used to illustrate section placement. To make things clearer, we will place the .text section at address 0x100.

$ arm-none-eabi-as -o sum.o sum.s
$ arm-none-eabi-nm -n sum.o
00000000 t entry ❶
00000004 t arr
00000007 t eoa
00000008 t start
00000014 t loop
00000024 t stop
$ arm-none-eabi-ld -Ttext=0x100 -o sum.elf sum.o ❷
$ arm-none-eabi-nm -n sum.elf
00000100 t entry ❸
00000104 t arr
00000107 t eoa
00000108 t start
00000114 t loop
00000124 t stop
...

The address for labels are assigned starting from 0 within a section.

When the executable is created the linker is instructed to place the text section at address 0x100.

The address for labels in the .text section are re-assigned starting from 0x100, and all label references will be patched to reflect this.

The process of section merging and placement is shown in the following figure.

Figure 4. Section Merging and Placement

Section Merging and Placement