How does the linker resolve global symbols defined in multiple locations?

At compile time, the compiler exports each global symbol as a strong symbol or a weak symbol to the assembler, which implicitly encodes this information in the symbol table of the relocatable object file. Functions and initialized global variables get strong symbols. Uninitialized global variables get weak symbols.
For the following example program, buf, bufp0, main, and swap are strong symbols and bufp1 is weak symbols.

/* main.c */
void swap();
int buf[2] = {1, 2};
int main()
{
swap();
return 0;
}

/* swap.c */
extern int buf[];

int *bufp0 = &buf[0];
int *bufp1;

void swap()
{
int temp;

bufp1 = &buf[1];
temp = *bufp0;
*bufp0 = *bufp1;
*bufp1 = temp;
}

Considering the concepts of strong and weak symbols, Unix linker uses the following rules to deal with multiple defined symbols:
Rule 1: multiple strong symbols with the same variable name are not allowed.
Rule 2: given a strong symbol and multiple weak symbols, select a strong symbol.
Rule 3: given multiple weak symbols, select any weak symbol.
For example, suppose we try to compile and link the following two C modules:

/* foo1.c */	
int main()		
{				
return 0;	
}				

/* bar1.c */
int main()
{
return 0;
}

In this case, the linker will generate an error message because the strong symbol main is defined more than once (rule 1):

$ gcc foo1.c bar1.c
/tmp/cca015022.o: In function 'main':
/tmp/cca015022.o(.text+0x0): multiple definition of 'main'
/tmp/cca015021.o(.text+0x0): first defined here

Similarly, the linker will generate an error message for the following modules because the strong symbol x is defined twice (rule 1):

/* foo2.c */
int x = 15213;
int main()
{
return 0;
}

/* bar2.c */
int x = 15213;
void f()
{
}

However, if x is not initialized in one module, the linker will quietly select the strong symbol defined in another module (Rule 2), as shown in the following procedure:

/* foo3.c */
#include <stdio.h>
void f(void);
int x = 15213;
int main()
{
f();
printf("x = %d\n", x);
return 0;
}

/* bar3.c */
int x;
void f()
{
x = 15212;
}

At runtime, the function f () changes the value of x from 15213 to 15212, which may be a shock to the author of the function main! Note that the x linker will not detect multiple links.

$ gcc -o gfg foo3.c bar3.c
$ ./gfg
x = 15212

If x has two weak definitions, the same thing happens (Rule 3):

/*a.c*/
#include <stdio.h>
void b(void);

int x;
int main()
{
	x = 2016;
	b();
	printf("x = %d ",x);
	return 0;
}
/*b.c*/
#include <stdio.h>

int x;

void b()
{
	x = 2017;
}

The application of rules 2 and 3 may introduce some hidden runtime errors that cannot be understood by careless programmers, especially when repeated symbol definitions have different types.
Example: "x" is defined as int in one module and double in another module.

/*a.c*/
#include <stdio.h>
void b(void);

int x = 2016;
int y = 2017;
int main()
{
	b();
	printf("x = 0x%x y = 0x%x \n", x, y);
	return 0;
}
/*b.c*/
double x;

void b()
{
	x = -0.0;
}

Execution:

$ gcc a.c b.c -o geeksforgeeks
$ ./geeksforgeeks
x = 0x0 y = 0x80000000

This is a subtle and annoying error, especially because it occurs silently, without warning from the compilation system, and because it usually appears later in program execution and away from where the error occurred. In large systems with hundreds of modules, such errors are very difficult to fix, especially because many programmers do not understand how linkers work. If in doubt, call the linker with a flag such as the GCC - fno common flag, which will trigger an error if multiple defined global symbols are encountered.

Reference documents

[1]Sahil Rajput.How Linkers Resolve Global Symbols Defined at Multiple Places?[EB/OL].https://www.geeksforgeeks.org/how-linkers-resolve-multiply-defined-global-symbols/,2019-01-04.

Keywords: C

Added by pbjpb on Fri, 04 Mar 2022 05:45:21 +0200