The basis of code execution starts with stack.
Stacks
{High Mem Address}
- Stack [-W-] Handled by Compiler
- Dynamic Data[-W-] (Heap- malloc/calloc/new variables) Handled by Coder
- Static Data [-W-] (Including global variables in C) Init at process start
- Literals [R--] (strings that does not change) Init at process start
- Instructions (text) [R-X] Init at process start
Legends: R- ReadOnly, W-Writeable, E-Executable
Each architecture has certain registers, like %eax, %ebx. Some of them are general purpose registers, others are special registers (like %esp, %ebp).
IA32: Call Stack grows down to low address, Dynamic data moves reverse of stack.
Push Putting more things to stack, address move lower. (%esp=%esp-4)
pushl src=> `) Fetch value at src, Decrement %esp by 4, Place value at src at address given by %esp
Pop As we take off stack, address move higher (%esp=%esp+4)
popl dst => Load value at %esp to dst, Increment %esp by 4.
Why 4? Push long has size 4 Bytes.
Exercise: At start, %eax=0x40, %ebx=0x10, %esp=0x1000
pushl %eax -> %esp=0x996. %esp contains 0x40
popl %ebx -> %ebx placed with 0x40 %esp=0x1000.
Procedure Calls and Return address/value
Caller
save registers
call
clean up arg
restore registers
find return value
Callee
setup return value
destroy local var
restore registers
return
Convention:
Caller Save: caller saves temp value in its stack frame before calling. Convention: %eax, %edx, %ecx saved by caller. %eax also used to get return value, so caller has to save it if it was using before making call.
Callee Save: Callee saves temp values in its stack frame before using those registers. Convention: %ebx, %esi, %edi callee saves if it wants to use them.
Special: %esp, %ebp: Callee save and restored upon exit.
Procedure Call Linkage: The convention where to leave/find things
call address: 1) Push return address to stack, 2) Jump to address
ret : 1) Pop return address from stack, 2) Jump to address
Read above diagram, Left to Right, as call flow progress from calling function to return.
By convention, return values are placed ix %eax. Caller must make sure to save this register before calling callee. Can only put 4 bytes as size of %eax is 4B.
Stack Frame contains local variables, function arguments, return information and any other temp space. Space for stack frame allocated when procedure invoked and deleted on return.
%esp: stack start pointer. %ebp: Frame pointer.
{HighMem}[stack frame 1] [stack frame 2]..{LowMemory} Each [], [%ebp...%esp].
Exercise:
fun1( ) { fun2( ); ... fun2( ); }
fun2( ) { fun3( ); }
fun3( ) { …fun4();.. }
fun4() { … }
Call Chain: fun1 -> fun2 -> fun3 -> fun4
-> fun2 -> fun3 -> fun4
So total 7 times stack frame needs to be instantiated.
Linux Stack Frame
Demonstration:
#include "stdio.h"
int func1(char a)
{ int l; return 10;}
main()
{func1('1');}
# gcc -c elf.c -o elf.o
# objdump -d a.out
0000000000400474:
400474: 55 push %rbp #Savebase of stack frame
400475: 48 89 e5 mov %rsp,%rbp #Setup new base pointer
400478: 89 f8 mov %edi,%eax
40047a: 88 45 fc mov %al,-0x4(%rbp)
40047d: c9 leaveq
40047e: c3 retq
000000000040047f
:
40047f: 55 push %rbp
400480: 48 89 e5 mov %rsp,%rbp
400483: bf 31 00 00 00 mov $0x31,%edi
400488: e8 e7 ff ff ff callq 400474
40048d: c9 leaveq
40048e: c3 retq
40048f: 90 nop
leaveq is special combined instruction for [movl %ebp, %ebx] [pop %ebp]. [*for the moment ignore r in rbp]. If we skip movl instruction from leaveq, we would corrupt frame pointer and jump to wrong address.
*ebx=32-bit (IA-32), rbx =64 bit(x86_64)
-0x4(%rbp),%edx: Go back 4 Bytes in %rbp and place that value in %edx.
When function is called, arguments are pushed in reverse order, i.e. 2nd arg first, then 1st arg.
What happens in x86_64?
stack is still required on 64-bit if function has many local variables, arrays/structs, uses address of (&) for local variable address, calls function that has more than 6 arguments,
Array
Arrays are contiguous allocation of memory, no bounds checked.
Array of Type and size N is represented as: Type name_of_arr[N];
It is contiguously allocated region of N spaces each of size=sizeof(Type).
E.g. int arr[5]. int: arr[5], *(arr+2) etc. int*: arr, arr+1, &arr[1], arr+i
arr[-1]: valid instruction, however is not guaranteed to be present in memory address space.
typedef int pent_arr[5];
pent_arr a = {1,2,3,4,5}; pent_arr b={6,7,8,9,10}; pent_arr c={11,12,13,14,15};
Nested Arrays: Also known as, 2-D arrays.
pent_arr aa[2]= { {1,2,3,4,5}, {6,7,8,9,10} };
int arr[10][20];
MultiLevel Arrays: Not really 2-D array. The three sub-arrays a, b, c need not be contiguous in memory.
int *multilevel[3] = {a, b, c}; //array of 3 pointers, each pointing to three 5-element arrays. These 3 pointers need not be pointing to a,b,c in that order.
multilevel[0][5], multilevel[0][-1]: Can be accessed but not guaranteed to fetch valid value.
The access syntax look similar (arr[a][b]] in C code, but actually access differs:
Nested Array: arr[a][b] : Memory[ arr + 20*a + 4*b]
Multi-Level Array: arr[a][b] = Memory [ Memory [multilevel + 4*a] + 4*b]
Computation:Two way approach to access multi-level array:
First get pointer to row address Memory [multilevel + 4*a]
then access element within that array Memory [ Memory [multilevel + 4*a] + 4*b]
Structure
Contiguous memory region, contain different types of members accessible by names.
struct s_name { int member1; int member2[10]; };
struct s_name instance; struct s_name *ptr=&instance
instance.member1 is same as (*ptr).member1 is same as ptr->member
Assembly: %eax=val %edx=ptr movl %eax, (%edx) #Memory[ptr]=val
Access array member:
ptr->member2[index] //direct access
int *get(struct s_name *ptr, int index) //get pointer to member array at index element.
{ return &ptr->member2[index]; }
Alignment: If Primitive data type requires N-bytes, then address should be multiple of N.
Why: Physical memory accessed by aligned chunks of 4/8-Byte. Inefficient to access elements residing at address crossing word boundary.
1-Byte (char) address (in binary) no restriction.
2-Byte (short) address (in binary) should end in 0 (div by 2)
4-Byte (int) address (in binary) should end in 00 (div by 4)
Control Flow:
Jump(conditional and unconditional) and call/return change control flow [program state]
Processor also reacts to change in system state [e.g. User hit ctrl+c, data arrived from network adapter, instruction of div by 0, system timer expired]
Exception is transfer of control to OS in response to some event. This changes processor state. [div by 0, sigfault, Ctrol+c, I/O completed, page fault]. OS has small code to handle exception. After exception handled, any of below can happen:
- re-execute current instruction,
- resume execution from next instruction
- abort process that caused exception
Asynchronous Exception: Caused by events external to processor indicated by processor's interrupt pin. E.g. I/O INT(Ctrl+C, packet on network, click mouse),
Synchronous Exception: Caused by event occurring as a result of executing some instruction.
e.g.
Traps -Intentional [syscall, breakpoints] return control to next instruction
e.g. open (filename, options) int 0x80 transfer to OS -> open file -> return
Faults- Unintentional [page fault]. Either re-execute faulting instruction or abort
Recoverable Faults: Page Fault
UnRecoverable Faults: Seg Fault, Int div by 0.
e.g. page fault -> create page and load to memory -> return and mem access is retried
Aborts- Unintentional and unrecoverable [e.g. parity error, machine check]. Aborts current program
e.g. int a[10000]; Invalid memory access, signal SIGSEGV and process aborts.
Q: How system finds which exception handler to execute?
A: Interrupt Vector maps exception to the code handling exception. This is table indexed on exception, each event has unique exception number.
Process
fork() copies current process
execve() replaces process's code and address space with the code for different program
Two process program: pid_t pid=fork(); if(pid==0) child else parent and pid contains child's pid..
Two different programs: pid_t pid=fork(); if(pid==0) execve else parent code.
fork is called once, returns twice. Can't say first parent or child executes.
Parent and child start same state, but after fork() have private copy of state(cariable, call stack, fds etc.)
Type "ls -al" on bash:
int execve(char *executablefilename, char *argv[], char *envp[]) //execv() return only in case of error. It overwrites code, data and stack.
int wait(int *child_status)
suspend current process until one of child terminates. return value is pid of terminating child process.
If child_status!=NULL, it points to status indicating cause of child termination. special macro to interpret this status.
waitpid() can be used to wait for specific child termination.
int exit(int status) //status=0 normal exit, nonzero abnormal exit.
Creates zonbie process, Parent should be alive to get child exit status (called reaping) and then kernel clears zombie.
If parent terminates without reaping child, child is reaped by Init process (pid=1) of kernel.
call after fork() atexit(cleanup); // if something needs to be executed at cleanup at exit
Represent virtual memory address: movl (%ecx), %eax, (%ecx)
A virtual address can be mapped to physical memory or disk
References
Lean C with assembly
Coursera.org: The Hardware Software Interface [Excellent Prof, superb material]
Each architecture has certain registers, like %eax, %ebx. Some of them are general purpose registers, others are special registers (like %esp, %ebp).
IA32: Call Stack grows down to low address, Dynamic data moves reverse of stack.
Push Putting more things to stack, address move lower. (%esp=%esp-4)
pushl src=> `) Fetch value at src, Decrement %esp by 4, Place value at src at address given by %esp
Pop As we take off stack, address move higher (%esp=%esp+4)
popl dst => Load value at %esp to dst, Increment %esp by 4.
Why 4? Push long has size 4 Bytes.
Exercise: At start, %eax=0x40, %ebx=0x10, %esp=0x1000
pushl %eax -> %esp=0x996. %esp contains 0x40
popl %ebx -> %ebx placed with 0x40 %esp=0x1000.
Procedure Calls and Return address/value
Caller
save registers
call
clean up arg
restore registers
find return value
Callee
setup return value
destroy local var
restore registers
return
Convention:
Caller Save: caller saves temp value in its stack frame before calling. Convention: %eax, %edx, %ecx saved by caller. %eax also used to get return value, so caller has to save it if it was using before making call.
Callee Save: Callee saves temp values in its stack frame before using those registers. Convention: %ebx, %esi, %edi callee saves if it wants to use them.
Special: %esp, %ebp: Callee save and restored upon exit.
Procedure Call Linkage: The convention where to leave/find things
call address: 1) Push return address to stack, 2) Jump to address
ret : 1) Pop return address from stack, 2) Jump to address
Read above diagram, Left to Right, as call flow progress from calling function to return.
By convention, return values are placed ix %eax. Caller must make sure to save this register before calling callee. Can only put 4 bytes as size of %eax is 4B.
Stack Frame contains local variables, function arguments, return information and any other temp space. Space for stack frame allocated when procedure invoked and deleted on return.
%esp: stack start pointer. %ebp: Frame pointer.
{HighMem}[stack frame 1] [stack frame 2]..{LowMemory} Each [], [%ebp...%esp].
Exercise:
fun1( ) { fun2( ); ... fun2( ); }
fun2( ) { fun3( ); }
fun3( ) { …fun4();.. }
fun4() { … }
Call Chain: fun1 -> fun2 -> fun3 -> fun4
-> fun2 -> fun3 -> fun4
So total 7 times stack frame needs to be instantiated.
Linux Stack Frame
Demonstration:
#include "stdio.h"
int func1(char a)
{ int l; return 10;}
main()
{func1('1');}
# gcc -c elf.c -o elf.o
# objdump -d a.out
0000000000400474
400474: 55 push %rbp #Savebase of stack frame
400475: 48 89 e5 mov %rsp,%rbp #Setup new base pointer
400478: 89 f8 mov %edi,%eax
40047a: 88 45 fc mov %al,-0x4(%rbp)
40047d: c9 leaveq
40047e: c3 retq
000000000040047f
40047f: 55 push %rbp
400480: 48 89 e5 mov %rsp,%rbp
400483: bf 31 00 00 00 mov $0x31,%edi
400488: e8 e7 ff ff ff callq 400474
40048d: c9 leaveq
40048e: c3 retq
40048f: 90 nop
#The middle value is relative offset to get to the address.
E.g. e7 ff ff ff (Little Endian) is added to next instruction (0x40048d) to get address of func1 i.e. 0x400474(=0x40048d+0xFFFFFFE7).
*ebx=32-bit (IA-32), rbx =64 bit(x86_64)
-0x4(%rbp),%edx: Go back 4 Bytes in %rbp and place that value in %edx.
When function is called, arguments are pushed in reverse order, i.e. 2nd arg first, then 1st arg.
What happens in x86_64?
- Double registers (16 General Purpose registers, each 8 Byte), so can use less stack.
- First 6 args stored in registers, local variables can also go in registers
- callq stores 64-bit return address on stack, decrement to %rsp is by 8, not by 4.
- No frame pointer(%ebp/rbp). All references for stack frame wrt %rsp. %ebp/rbp are general purpose registers on this.
- Functions can access memory upto 128 Bytes+%rsp.
- Registers still caller/callee saved.
stack is still required on 64-bit if function has many local variables, arrays/structs, uses address of (&) for local variable address, calls function that has more than 6 arguments,
Array
Arrays are contiguous allocation of memory, no bounds checked.
Array of Type and size N is represented as: Type name_of_arr[N];
It is contiguously allocated region of N spaces each of size=sizeof(Type).
arr[-1]: valid instruction, however is not guaranteed to be present in memory address space.
typedef int pent_arr[5];
pent_arr a = {1,2,3,4,5}; pent_arr b={6,7,8,9,10}; pent_arr c={11,12,13,14,15};
Nested Arrays: Also known as, 2-D arrays.
pent_arr aa[2]= { {1,2,3,4,5}, {6,7,8,9,10} };
int arr[10][20];
MultiLevel Arrays: Not really 2-D array. The three sub-arrays a, b, c need not be contiguous in memory.
int *multilevel[3] = {a, b, c}; //array of 3 pointers, each pointing to three 5-element arrays. These 3 pointers need not be pointing to a,b,c in that order.
multilevel[0][5], multilevel[0][-1]: Can be accessed but not guaranteed to fetch valid value.
The access syntax look similar (arr[a][b]] in C code, but actually access differs:
Nested Array: arr[a][b] : Memory[ arr + 20*a + 4*b]
Multi-Level Array: arr[a][b] = Memory [ Memory [multilevel + 4*a] + 4*b]
Computation:Two way approach to access multi-level array:
First get pointer to row address Memory [multilevel + 4*a]
then access element within that array Memory [ Memory [multilevel + 4*a] + 4*b]
Structure
Contiguous memory region, contain different types of members accessible by names.
struct s_name { int member1; int member2[10]; };
struct s_name instance; struct s_name *ptr=&instance
instance.member1 is same as (*ptr).member1 is same as ptr->member
Assembly: %eax=val %edx=ptr movl %eax, (%edx) #Memory[ptr]=val
Access array member:
ptr->member2[index] //direct access
int *get(struct s_name *ptr, int index) //get pointer to member array at index element.
{ return &ptr->member2[index]; }
Alignment: If Primitive data type requires N-bytes, then address should be multiple of N.
Why: Physical memory accessed by aligned chunks of 4/8-Byte. Inefficient to access elements residing at address crossing word boundary.
1-Byte (char) address (in binary) no restriction.
2-Byte (short) address (in binary) should end in 0 (div by 2)
4-Byte (int) address (in binary) should end in 00 (div by 4)
8-Byte (double) address (in binary) should end in 000 (div by 4) . On Linux, end in 00 as it treats double as 4-Byte primitive data type.
12-Bytes (long double): lowest 2 bits be 00.
structure needs to be aligned to N-byte. N is largest alignment of any member. Initial address and length must be multiple of N. Win/x86: N=8, Linux N=4 (as it treats double as 4-Byte primitive data type).
Array of Struct: struct s_name { int member; . . }s_instance[10];
k=sizeof(struct s_name); each s_instance[i] starts at s_instance+i*k
Next add member offset within that to access member.
s_instance[index].member (s_instance+index)->member
Virtual memory tricky as it spans 2 Pages.
Control Flow:
Jump(conditional and unconditional) and call/return change control flow [program state]
Processor also reacts to change in system state [e.g. User hit ctrl+c, data arrived from network adapter, instruction of div by 0, system timer expired]
Exception is transfer of control to OS in response to some event. This changes processor state. [div by 0, sigfault, Ctrol+c, I/O completed, page fault]. OS has small code to handle exception. After exception handled, any of below can happen:
- re-execute current instruction,
- resume execution from next instruction
- abort process that caused exception
Asynchronous Exception: Caused by events external to processor indicated by processor's interrupt pin. E.g. I/O INT(Ctrl+C, packet on network, click mouse),
Synchronous Exception: Caused by event occurring as a result of executing some instruction.
e.g.
Traps -Intentional [syscall, breakpoints] return control to next instruction
e.g. open (filename, options) int 0x80 transfer to OS -> open file -> return
Faults- Unintentional [page fault]. Either re-execute faulting instruction or abort
Recoverable Faults: Page Fault
UnRecoverable Faults: Seg Fault, Int div by 0.
e.g. page fault -> create page and load to memory -> return and mem access is retried
Aborts- Unintentional and unrecoverable [e.g. parity error, machine check]. Aborts current program
e.g. int a[10000]; Invalid memory access, signal SIGSEGV and process aborts.
Q: How system finds which exception handler to execute?
A: Interrupt Vector maps exception to the code handling exception. This is table indexed on exception, each event has unique exception number.
Process
fork() copies current process
execve() replaces process's code and address space with the code for different program
Two process program: pid_t pid=fork(); if(pid==0) child else parent and pid contains child's pid..
Two different programs: pid_t pid=fork(); if(pid==0) execve else parent code.
fork is called once, returns twice. Can't say first parent or child executes.
Parent and child start same state, but after fork() have private copy of state(cariable, call stack, fds etc.)
Type "ls -al" on bash:
int execve(char *executablefilename, char *argv[], char *envp[]) //execv() return only in case of error. It overwrites code, data and stack.
int wait(int *child_status)
suspend current process until one of child terminates. return value is pid of terminating child process.
If child_status!=NULL, it points to status indicating cause of child termination. special macro to interpret this status.
waitpid() can be used to wait for specific child termination.
int exit(int status) //status=0 normal exit, nonzero abnormal exit.
Creates zonbie process, Parent should be alive to get child exit status (called reaping) and then kernel clears zombie.
If parent terminates without reaping child, child is reaped by Init process (pid=1) of kernel.
call after fork() atexit(cleanup); // if something needs to be executed at cleanup at exit
Represent virtual memory address: movl (%ecx), %eax, (%ecx)
A virtual address can be mapped to physical memory or disk
References
Lean C with assembly
Coursera.org: The Hardware Software Interface [Excellent Prof, superb material]
No comments:
Post a Comment