CS计算机代考程序代写 prolog Java assembly data structure compiler 2020 – Winter Term 2
2020 – Winter Term 2
Unit 1e – Procedures and Stack
1
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
▪ Reading: Companion: 2.8; Textbook: 3.7, 3.12
▪ Learning Goals
▪ explain when local variables are allocated and freed
▪ distinguish a procedure’s return address from its return argument
▪ describe why activation frames are allocated on the stack and not on the heap
▪ explain how to use the stack pointer to access local variables and arguments
▪ given an arbitrary C procedure, describe the format of its stack activation frame
▪ explain the role of each of the caller and callee prologues and epilogues
▪ explain the tradeoffs involved in deciding whether to pass arguments on the stack or in registers
▪ describe the necessary conditions for not saving the return address on the stack and the benefit of not doing so
▪ write assembly code for procedure call arguments passed on the stack or in registers, with and without a return value
▪ write assembly code for a procedure with and without local variables, with arguments pass on the stack or in registers, with and without a return value
▪ write assembly code to access a local scalar, local “static” array, local “dynamic” array, local “static” struct, and local “dynamic” struct
▪ describe how a buffer-overflow, stack-smash attack occurs
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 2
void b (int a0, int a1) {
int l0 = 0;
int l1 = 1;
…
}
▪ Can l0, l1, a0, a1 be allocated statically?
A. Yes,always
B. Yes, but only if b doesn’t call itself directly
C. Yes, but only if foo doesn’t call any functions
D. No, none of these can be allocated statically at all
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 3
▪ How many different versions of n are there? void a (int n) {
if (n == 0) return;
a(n – 1);
printf(“%d
”, n);
}
▪ What if there is no apparent recursion? void b (int n) {
int l = n * n;
c(l – 1);
}▪What if c( ) calls b( )?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 4
▪Scope
▪Local variables are only accessible within declaring procedure ▪Each execution has its own private copy
▪Lifetime
▪Allocated when procedure starts
▪“Freed” when procedure returns (in most languages)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 5
▪Activation: execution of a procedure
▪Starts with procedure is called, ends when it returns
▪There can be many activations of same procedure alive at once
▪Activation Frame
▪memory that stores activation’s state ▪Includes local variables and arguments
▪Should we allocate activation frames from the heap? ▪Call malloc() to create frame on procedure call?
▪Call free() on procedure return?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 6
▪Order of frame allocation and deallocation is special ▪freed in reverse order of allocation
▪Simple allocation for frames:
▪Reserve big chunk of memory for all frames
▪Initial address known
▪Simple, cheap allocation: add or subtract from a pointer
▪Questions
▪What data structure is this like?
▪What restriction do we place on lifetime of local variables?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 7
r5
▪Stack of activation frames
▪Stored in memory, grows up from bottom
▪Stack pointer
▪General purpose register (we will use r5)
▪Stores base address (address of first byte) of current frame
▪Current frame is the “top” of the stack
▪First activation is the “bottom” or “base” of the stack
▪Local variables of initial function (e.g., main)
Stack Top (current frame)
Code
Static data
Heap
Stack
Stack Bottom (first frame)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
8
r5
▪Value of the stack pointer is dynamic
▪Local variables and arguments
▪Size of each frame is (usually) static ▪Offset from stack pointer is (usually) static
▪Each frame is like a struct
▪Base of frame is in r5 (stack pointer)
▪Each variable in procedure is a member of the struct
Stack Top (current frame)
Code
Static data
Heap
Stack
Stack Bottom (first frame)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
9
▪Local variables ▪Arguments
▪Some architectures use registers for arguments
▪Return address (ra)
▪Register r6 works for one call, but not for multiple calls
▪Saved only if needed (i.e., only if function calls other functions)
▪Other saved registers
▪Called function may change register values ▪Values that must be kept after call are saved
▪Also, if function needs more registers than available
saved registers…
local variables…
return address
arguments…
saved registers…
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 10
▪Order of storage
▪Decided by the compiler
▪Based on order convenient for stack creation
▪Static offset to any member from its base (frame pointer) ▪There will need to be some convention for interoperability
▪Example:
void b (int a0, int a1) {
int i0 = 0; int i1 = 1; c();
}
r[5]+0x00: i0 0x04: i1
0x08: return address 0x0c: a0
0x10: a1
saved registers…
local variables…
return address
arguments…
saved registers…
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
11
▪Frame is stored and accessed similarly to a struct ▪Base is in r5 (by convention)
▪Offset is known statically (order by convention)
▪Example: int i0 = 0; int i1 = 1;
ld $0, r0
str0,(r5) #i0=0 ld $1, r0
str0,4(r5) #i1=1
r[5]+0x00: i0 0x04: i1
0x08: return address
0x0c: a0
0x10: a1
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
12
▪What would be the code for the following? A. ld 16(r5), r0
i1 = a1;
r[5]+0x00: i0 0x04: i1
0x08: return address
0x0c: a0
0x10: a1
st r0, 4(r5)
B. ld 10(r5), r0 st r0, 0(r5)
C. ld 16(r5), r0 st r0, 0(r5)
D. ld 10(r5),r0 st r0, 4(r5)
E. Follow the white rabbit, Neo
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
13
▪What is the value of l in foo when it is active?
void goo() { int x = 3; } void foo() { int l; }
goo(); foo();
A. 0
B. 3
C. 42
D. NULL
E. no way to know
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
14
▪What is wrong with this?
int *foo() {
int l;
return &l; }
void bar() {
int *l = malloc(100);
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
15
▪Latest version of C allows local arrays to have dynamic size: void foo(int n) {
int a[n]; int b;
…
b = 0;
}
▪ What code does the compiler generate for the last statement?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 16
▪The allocation of the stack is relatively simple: ▪Decrement the stack pointer to allocate ▪Increment the stack pointer to free
▪Whose responsibility is it to allocate in the stack? ▪Caller doesn’t [need to] know the size of callee’s stack ▪However, caller needs to copy the arguments
▪Division of labour:
▪Caller allocates space for arguments ▪Callee allocates space for the rest
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 17
▪Code that executes just before procedure starts ▪Part in caller before call
▪Part in callee at beginning of call
▪Generated by the compiler for procedure call
▪Allocates activation frame and changes stack pointer ▪Subtract frame size from stack pointer r5
▪Possibly saves register values ▪Assigns values to allocated arguments
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 18
▪Code that executes when procedure ends ▪Part in callee just before return
▪Part in caller just after returning
▪Deallocates activation frame and restores stack pointer ▪Add frame size to stack pointer r5
▪Possibly restores some saved register values
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 19
▪Caller prologue (before call) ▪Allocate stack space for arguments ▪Save actual argument values to stack
▪Callee prologue (start of function code)
▪Allocate stack space for return address and local variables ▪Save return address to stack, if needed
▪Callee epilogue (end of function code)
▪Load return address from stack if stored there
▪Deallocate stack space of return address and local variables
▪Caller epilogue (after return from call) ▪Deallocate stack space of arguments
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 20
void foo() { b(0, 1);
}
void b(int a0, int a1) {
int i0 = a0; int i1 = a1; c();
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 21
foo: deca r5 # allocate foo’s stack void foo() {
st r6, 0(r5) # save ra on stack
}
ld $-8, r0 b(0, 1);
add r0, r5 ld $0, r0 st r0, 0(r5) ld $1, r0
# size of b’s arguments # allocate b’s arguments
# a0 = 0
void b(int a0, int a1) { st r0, 4(r5) # a1 = 1
Caller Prologue
Call Caller Epilogue
}
inca r5 # free foo’s stack j 0(r6) # return
gpc $6, r6 # set return address int i0 = a0;
j b # call function b
inldt i$81, r=0 a1;# size of b’s arguments
add r0, r5 # free b’s arguments
c();
ld 0(r5), r6 # load return address from stack
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 22
void foo() {
b(0, 1);
}
void b(int a0, int a1) {
int i0 = a0; int i1 = a1; c();
}
b: ld $-12, r0 void foo() {
# size of b’s frame Callee Prologue # allocate b’s frame
# store return address in stack
add r0, r5
b(0, 1); st r6, 8(r5)
ld 12(r5), r0
}
voidld b(16i(nr5t), ar0, int a1) {
# i1 = a1
# set return address
# call c()
# load return address from stack # size of b’s frame
# free b’s frame
# return
}
Callee Epilogue
st r0, 0(r5)
# i0 = a0
st r0, 4(r5)
int i0 = a0; gpc $6, r6
j c
int i1 = a1;
ld 8(r5), r6
c();
ld $12, r0
add r0, r5 j 0(r6)
Callee’s Body
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
23
void foo() {
b(0, 1);
}
void b(int a0, int a1) {
int i0 = a0; int i1 = a1; c();
}
▪Every program thread starts with a hidden procedure ▪Name varies, usually start or entrypoint, sometimes crt0
▪The start procedure:
▪Allocates memory for stack
▪Initializes the stack pointer
▪Calls the thread’s first procedure (e.g., main())
start: ld $stackBottom, r5
inca r5
gpc $6, r6
j main
halt
.pos 0x1000
stackTop:
.long 0x0
.long 0x0
…
stackBottom: .long 0x0
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
24
▪ What is the value of r5 in three(), assuming we begin with a call to foo()?
void three() {
int i;
int j; int k;
}
void two() {
int i; int j; three();
void one() {
int i;
two(); }
void foo() { // r5 = 2000 one();
}
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
25
▪Number and size of arguments is statically determined
▪Value of arguments is dynamically determined
▪Compiler may choose to put arguments on the stack
▪Caller pushes argument values onto stack, callee reads arguments
from stack
▪Sometimes compiler chooses to use registers for arguments
▪Often used in Intel-based programs
▪Caller places argument values in pre-determined registers (convention), callee reads arguments from these registers
▪Why is this done? When is it a good idea? When is this not a good idea?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 26
▪Return value
▪In C and Java, procedures/methods can return only a single value ▪C compilers use a designated register (e.g., r0) for this
▪Return address does not always need to be saved to the stack
▪When is this possible?
▪Local variables sometimes don’t need to be in the stack
▪Why? When is this possible?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 27
int s;
void foo() {
s = add(1, 2);
}
void add(int a, int b) {
return a + b;
}
foo: deca r5
st r6, (r5)
ld $-8, r0 add r0, r5 ld $1, r0 st r0, 0(r5) ld $2, r0 st r0, 4(r5) gpc $6, r6
j add
ld $8, r1 add r1, r5 ld $s, r1 st r0, (r1) ld (r5), r6 inca r5
j (r6)
foo: deca r5
st r6, (r5)
ld $1, r0
ld $2, r1
gpc $6, r6 j add
ld $s, r1
st r0, (r1)
ld (r5), r6
inca r5
j (r6)
add:
Prologue Epilogue
add: ld ld
0(r5), r0
4(r5), r1
r1, r0
Argument loading
add
j (r6)
add r1, r0 j (r6)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
28
▪Stack is managed by code generated by compiler
▪Grows from bottom up, allocated by subtracting stack pointer
▪Local variables and arguments accessed with static offset from stack pointer
▪Caller prologue
▪Allocates space on stack (or registers) for arguments, stores their values
▪Callee prologue
▪Allocates space on stack for local variables and saved registers
▪Callee epilogue
▪Deallocates stack frame, restores saved registers
▪Caller epilogue
▪Deallocates arguments on stack, get return value from r0
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 29
proc:
deca r5
st r6, (r5) ld 8(r5), r1 beq r1, L0 dec r1
ld 4(r5), r2 deca r5
deca r5
st r2, (r5) st r1, 4(r5) gpc $6, r6
j proc
inca r5
inca r5
ld 4(r5), r2
ld 8(r5), r1 dec r1
ld (r2,r1,4), r1 add r1, r0
br L1
L0: ld $0, r0 L1: ld (r5), r6
inca r5 j (r6)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
30
▪Can you spot the bug?
int main(int argc, char **argv) { u.is_admin = 0;
printf(“Hello, %s
”, argv[1]); strcpy(u.username, argv[1]); if(u.is_admin) {
struct user {
char username[16]; char is_admin;
} u;
printf(“Hello admin: %c!!
”, u.is_admin);
system(“/bin/sh”); } else {
printf(“Bye %s!
”, u.username);
}
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 31
▪What about here?
void getInput (void) { char buf[1000]; for(int i=0; ; i++) {
char ch = getchar(); if(ch == ‘
’) break; buf[i] = ch;
}
printf(“Read %s.
”, buf); }
int main (void) {
puts(“Starting.”);
getInput();
puts(“Done.”);
}
CPSC 213 2018 WT1 © 2018 Jonatan Schroeder
32
▪Program code is just bytes in memory
▪Program data is just bytes in memory
▪This means we can include data that can be interpreted as code
▪Injected code can do anything
▪Most attackers want an interactive shell – inject
“shellcode”
▪If the input is longer than 1000 bytes, the loop will write into memory beyond the end of buf
▪If an attacker can provide the input, it can determine what values are written into memory beyond the end of buf
buf[0] buf[1] buf[2] buf[3] buf[4] buf[5] buf[6] buf[7] … buf[999] ra
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
33
▪The return address is
▪stored in memory on the stack
▪the address of instruction that executes after return ▪changing it changes the program’s control flow
▪Attacker’s goal
▪introduce code into the program from outside ▪trick program into running it
▪Changing the return address
▪allows attacker to change control flow
▪if it points to data, then that data becomes the program
buf[0] buf[1] buf[2] buf[3] buf[4] buf[5] buf[6] buf[7] … buf[999] ra
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
34
▪Hard parts
▪determining location of return address in attack string ▪determining address to change return address to
▪Making it easier
▪approximate return address value
▪start string with nop instructions (padding) ▪ aka “nop slide”, “nop sled” or “nop ramp”
▪finally include the code for the worm ▪Insert many copies of return address
▪ One of them may hit the right spot
▪Works if return address guess within nop slide
nop nop nop nop nop nop nop nop … nop nop nop nop Get shell … 0x1234 0x1234 0x1234 0x1234 0x1234
buf[0] buf[1] buf[2] buf[3] buf[4] buf[5] buf[6] buf[7] … buf[993] buf[994] buf[995] buf[996] buf[997] buf[998] buf[999] ra
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
35
▪This attack exploited buffer overflow in IIS (Microsoft’s Web Server)
▪Infected 359,000 machines within14 hours
▪Estimated cost: $2.6B
▪Displayed string: “HELLO! Welcome to http://www.worm.com! Hacked by Chinese!”
▪Was to launch DOS attack on whitehouse.gov, but detected; White House changed IP address to avoid attack
GET /default.ida?NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN%u9090%u6858%ucbd 3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003% u8b00%u531b%u53ff%u0078%u0000%u00=a
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 36
▪Buffer boundary check ▪What if stack grew down?
▪Active frame at the highest addresses
▪Modern protections
▪Non-executable stack
▪Stack Canaries
▪ASLR: Randomized memory (and stack) addresses
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 37
▪CPU instruction that signals OS to do something
▪Typically something that regular processes don’t have permission
to do
▪Examples: read/write terminal, read/write file, execute programs ▪Details: CPSC 313
▪Like function calls
▪Arguments passed in registers r0, r1, r2
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 38
▪New requirement: system call
▪ Instruction includes which system call to use
▪ Remaining arguments passed by registers r0, r1, r2 ▪ Return value saved in r0
▪New instruction:
▪sys $0: read(fd, buffer, size) – read data from fd (0 = stdin)
▪ returns number of read bytes or -1 on error
▪sys $1: write(fd, buffer, size) – write data to fd (1 = stdout)
▪ returns number of written bytes or -1 on error ▪sys $2: exec(buffer, size) – execute program
▪ returns 0 if successful or -1 on error
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 39
Name
Semantics
Assembly
Machine
system call
system call #n
sys $n
f1nn
.pos 0x1000
ld $1, r0
ld $str, r1 ld $12, r2 sys $1
halt
.pos 0x2000 str:
.long 0x68656c6c # hell .long 0x6f20776f # o wo .long 0x726c640a # rld
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 40
▪In assignment 7, you will write a real exploit
▪You will use a program that has a vulnerability
▪You will also write malicious code
▪You will have the vulnerable program execute your code
▪Attacker input must include the code
▪You will enter machine code as data in input string
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 41
▪Meets every Tuesday at 5pm on Discord ▪https://ubcctf.github.io/ (invite there)
▪Can email Robert Xiao for more info: brx@cs.ubc.ca
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 42
▪Recursive functions have “tail recursion” if the recursion is the last operation of the function:
void foo(int a, int b) { if (b == 0)
return a; else
return foo(a+1, b-1);
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 43
▪Careful, this is not tail recursion: void fact(int n) {
if (n == 0) return 1;
return n * fact(n-1); }
▪But this is:
void fact(int n, int acc) {
if (n == 0) return acc;
return fact(n-1, n*acc); }
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 44
▪A tail recursion can be converted to a loop: void fact(int n, int acc) {
if (n == 0) return acc;
return fact(n-1, n*acc); }
void fact_it(int n, int acc) {
while(1) {
if (n == 0) return acc;
n = n-1; acc = n*acc;
}
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 45
sum:
deca r5
st r6, (r5)
ld 0(r5), r1 # r1=n ld 4(r5), r1 # r1=n
ld 4(r5), r0 # r0=acc
ld 8(r5), r0 # r0=acc
void sum(int n, int acc) { if (n == 0) return acc; return sum(n-1, n+acc);
}
L1:
beq r1, L0
add r1, r0
dec r1
deca r5
deca r5
# if n==0 return #r0=n+acc #r1=n-1
st r1, 0(r5) # arg0 = n-1
st r0, 4(r5) # arg1 = n+acc
gpc $6, r6
br L1 j sum
inca r5
# call sum
inca r5 L0:
ld (r5), r6 # restore r.a. inca r5
j (r6) # return
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
46
sum: ld 0(r5), r1 # r1=n
ld 4(r5), r0 # r0=acc
L1: beq r1, L0 # if n==0 return
add r1, r0 dec r1
br L1
L0: j (r6)
# r0 = n+acc # r1 = n-1
# return
void sum(int n, int acc) { if (n == 0) return acc; return sum(n-1, n+acc);
}
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
47
▪Activation frame for caller replaced with callee’s, function call replaced with regular jump or branch
▪Less use of stack
▪Fewer memory accesses
▪Fewer calls to indirect jumps (important for pipelining, see CPSC 313) ▪Less instructions overall
▪Can be done with tail call to other functions ▪Adjustments may need to be made to activation frame size
▪Optimization may be done by the compiler itself ▪Only works if it’s an actual tail call
▪No operation performed after the call
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 48
▪ Global variables:
▪ address known statically, instruction uses address explicitly
▪ Reference variables (pointers): ▪ variable stores address of value ▪ may be allocated dynamically
▪ Arrays
▪ elements named by numeric index
▪ address of element is base plus index times size of element ▪ base and index can be static or dynamic; size is static
▪Instance variables
▪ Offset to variable from start of object/struct known statically ▪ Start of object/struct may be dynamic
▪ Local variables and arguments
▪ Offset to variable from start of activation frame known statically ▪ Address of stack frame (stack pointer) is dynamic
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 49