While writing a multi-file program, each file is assembled individually into object files. The linker combines these object files to form the final executable.
While combining the object files together, the linker performs the following operations.
We will look into these operations, in detail, in this section.
In a single file program, while producing the object file, all references to labels are replaced by their corresponding addresses by the assembler. But in a multi-file program, if there are any references to labels defined in another file, the assembler marks these references as "unresolved". When these object files are passed to the linker, the linker determines the values for these references from the other object files, and patches the code with the correct values.
The sum of array example is split into two files, to demonstrate the symbol resolution performed by the linker. The two files will be assembled and their symbol tables examined to show the presence of unresolved references.
The file sum-sub.s
contains the
sum
subroutine, and the file
main.s
invokes the subroutine with the
required arguments. The source of the files is shown below.
Listing 4. main.s
- Subroutine Invocation
.text
b start @ Skip over the data
arr: .byte 10, 20, 25 @ Read-only array of bytes
eoa: @ Address of end of array + 1
.align
start:
ldr r0, =arr @ r0 = &arr
ldr r1, =eoa @ r1 = &eoa
bl sum @ Invoke the sum subroutine
stop: b stop
Listing 5. sum-sub.s
- Subroutine Definition
@ Args
@ r0: Start address of array
@ r1: End address of array
@
@ Result
@ r3: Sum of Array
.global sum
sum: mov r3, #0 @ r3 = 0
loop: ldrb r2, [r0], #1 @ r2 = *r0++ ; Get array element
add r3, r2, r3 @ r3 += r2 ; Calculate sum
cmp r0, r1 @ if (r0 != r1) ; Check if hit end-of-array
bne loop @ goto loop ; Loop
mov pc, lr @ pc = lr ; Return when done
A word on the .global
directive is
in order. In C, all variables declared outside functions are
visible to other files, until explicitly stated as static
. In assembly, all labels are static
AKA local (to the file), until explicitly
stated that they should be visible to other files, using the
.global
directive.
The files are assembled, and the symbol tables are dumped using
the nm
command.
$ arm-none-eabi-as -o main.o main.s
$ arm-none-eabi-as -o sum-sub.o sum-sub.s
$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop
00000000 T sum
For now, focus on the letter in the second column, which
specifies the symbol type. A t
indicates that the symbol is defined, in the text section. A
u
indicates that the symbol is
undefined. A letter in uppercase indicates that the symbol is
.global
.
It is evident that the symbol sum
is defined in sum-sub.o
and is not
resolved yet in main.o
. When the
linker is invoked the symbol references will be resolved, and the
executable will be produced.
Relocation is the process of changing addresses already assigned to labels. This will also involve patching up all label references to reflect the newly assigned address. Primarily, relocation is performed for the following two reasons:
To understand the process of relocation, an understanding of the concept of sections is essential.
Code and data have different run time requirements. For example
code can be placed in read-only memory, and data might require
read-write memory. It would be convenient, if code and data is
not interleaved. For
this purpose, programs are divided into sections. Most programs
have at least two sections, .text
for
code and .data
for data. Assembler
directives .text
and .data
, are used to switch back and forth between
the two sections.
It helps to imagine each section as a bucket. When the assembler hits a section directive, it puts the code/data following the directive in the selected bucket. Thus the code/data that belong to particular section appear in contiguous locations. The following figures show how the assembler re-arranges data into sections.
Now that we have an understanding of sections, let us look into the primary reasons for which relocation is performed.
When dealing with multi-file programs, the sections with the
same name (example .text
) might
appear, in each file. The linker is responsible for merging
sections from the input files, into sections of the output file. By
default, the sections, with the same name, from each file is placed
contiguously and the label references are patched to reflect the
new address.
The effects of section merging can be seen by looking at the
symbol table of the object files and the corresponding executable
file. The multi-file sum of array program can be used to illustrate
section merging. The symbol table of the object files main.o
and sum-sub.o
and the symbol table of the executable file sum.elf
is shown below.
$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop ❶
00000000 T sum
$ arm-none-eabi-ld -Ttext=0x0 -o sum.elf main.o sum-sub.o
$ arm-none-eabi-nm sum.elf
...
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
00000028 t loop ❷
00000024 T sum
When a program is assembled, each section is assumed to start from address 0. And thus labels are assigned values relative to start of the section. When the final executable is created, the section is placed at some address X. And all references to the labels defined within the section, are incremented by X, so that they point to the new location.
The placement of each section at a particular location in memory and the patching of all references to the labels in the section, is done by the linker.
The effects of section placement can be seen by looking at the
symbol table of the object file and the corresponding executable
file. The single file sum of array program can be used to
illustrate section placement. To make things clearer, we will place
the .text
section at address
0x100
.
$ arm-none-eabi-as -o sum.o sum.s
$ arm-none-eabi-nm -n sum.o
00000000 t entry ❶
00000004 t arr
00000007 t eoa
00000008 t start
00000014 t loop
00000024 t stop
$ arm-none-eabi-ld -Ttext=0x100 -o sum.elf sum.o ❷
$ arm-none-eabi-nm -n sum.elf
00000100 t entry ❸
00000104 t arr
00000107 t eoa
00000108 t start
00000114 t loop
00000124 t stop
...
The address for labels are assigned
starting from 0 within a section. |
|
When the executable is created the
linker is instructed to place the text section at address
0x100 . |
|
The address for labels in the
.text section are re-assigned starting
from 0x100 , and all label references
will be patched to reflect this. |
The process of section merging and placement is shown in the following figure.