There are two different concepts: Which "section" a variable goes into and its "visibility"
For comparison, I've add a .bss
section variable:
char global_int = 11;
char nondata_int;
int
main(int argc, char *argv[])
{
}
Compiling with cc -S
produces:
.file "fix1.c"
.text
.globl global_int
.data
.type global_int, @object
.size global_int, 1
global_int:
.byte 11
.comm nondata_int,1,1
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",@progbits
Note the .data
to put the global_int
variable in the data section. And, .comm
to put nondata_int
into the .bss
section
Also, note the .globl
to make the variables have global visibility (i.e. can be seen by other .o
files).
Loosely, .data
and/or .bss
are the sections that the variables are put into. And, global [.globl
] are the visibility. If you did:
static int foobar = 63;
Then, foobar
would go into the .data
section but be local. In the nm
output below, instead of D
, it would be d
to indicate local/static visibility. Other .o
files would not be able to see this [or link to it].
An nm
of the .o
program produces:
0000000000000000 D global_int
0000000000000000 T main
0000000000000001 C nondata_int
And, an nm -g
of the final executable produces:
000000000040401d B __bss_start
0000000000404018 D __data_start
0000000000404018 W data_start
0000000000401050 T _dl_relocate_static_pie
0000000000402008 R __dso_handle
000000000040401d D _edata
0000000000404020 B _end
0000000000401198 T _fini
000000000040401c D global_int
w __gmon_start__
0000000000401000 T _init
0000000000402000 R _IO_stdin_used
0000000000401190 T __libc_csu_fini
0000000000401120 T __libc_csu_init
U __libc_start_main@@GLIBC_2.2.5
0000000000401106 T main
000000000040401e B nondata_int
0000000000401020 T _start
0000000000404020 D __TMC_END__
UPDATE:
thanks for this answer. Regarding And, .comm
to put nondata_int
into the .bss
section. Could you please explain that a bit? I don't see any reference to .bss so how are those two related?
Sure. There's probably a more rigorous explanation, but loosely, when you do:
int nondata_int;
You are defining a "common" section variable [the historical origin is from Fortran's common].
When linking [to create the final executable], if no other .o
[or .a
] has declared a value for it, it will be put into the .bss
section as a B
symbol.
But, if another .o
has defined it (e.g. define_it.c
):
int nondata_int = 43;
There, define_it.o
will put it in the .data
section as a D
symbol
Then, when you link the two:
gcc -o executable fix1.o define_it.o
Then, in executable
, it will go to the .data
section as a D
symbol.
So, .o
files have/use .comm
[the assembler directive] and C
common section.
Executables have only .data
, and .bss
. So, given the .o
files a common symbol goes to [is promoted to] .bss
if it has never been initialized and .data
if any .o
has initialized it.
Loosely, .comm
/C is a suggestion and .data
and .bss
is a "commitment"
This is a nicety of sorts. Technically, in fix1.c
, if we knew beforehand that we were going to be linked with define_it.o
, we would [probably] want to do:
extern char nondata_int;
Then, in fix1.o
, the would be marked as an "undefined" symbol (i.e. nm
would show U
).
But, then, if fix1.o
were not linked to anything that defined the symbol, the linker would complain about an undefined symbol.
The common symbol allows us to have multiple .o
files that each do:
int nondata_int;
They all produce C
symbols. The linker combines all to produce a single symbol.
So, again common C
symbols are:
I want a global named X and I want it to be the same X as found in any other .o
files, but don't complain about the symbol being multiply defined. If one [and only one] of those .o
files gives it an initialized value, I'd like to benefit from that value.
Historically ...
IIRC [and I could be wrong about this], common was added [to the linker] to support Fortran COMMON
declarations/variables.
That is, all fortran .o
files just declared a symbol as common [its concept of global], but the fortran linker was expected to combine them.
Classic/old fortran could only specify a variable as COMMON
(i.e. in C, equivalent to int val;
) but fortran did not have global initializers (i.e. it did not have extern int val;
or int val = 1;
)
This common was useful for C, so, at some point it was added.
In the good old days (tm), the common linker type did not exist and one had to have an explicit extern
in all but one .o
file and one [and only one] that declared it. That .o
that declared it could define it with a value (e.g.) int val = 1;
or without (e.g.) int val;
but all other .o
files had to use extern int val;