From aab647a0cc135a0510ded3b65b812dfd110321a5 Mon Sep 17 00:00:00 2001 From: Andreas Baumann Date: Sun, 14 Jan 2024 19:40:25 +0100 Subject: updated links and documentation --- doc/developer_ibm_com_articles_l_gas_nasm.txt | 544 ++++++++++++++++++++++++++ 1 file changed, 544 insertions(+) create mode 100644 doc/developer_ibm_com_articles_l_gas_nasm.txt (limited to 'doc/developer_ibm_com_articles_l_gas_nasm.txt') diff --git a/doc/developer_ibm_com_articles_l_gas_nasm.txt b/doc/developer_ibm_com_articles_l_gas_nasm.txt new file mode 100644 index 0000000..8d0850d --- /dev/null +++ b/doc/developer_ibm_com_articles_l_gas_nasm.txt @@ -0,0 +1,544 @@ +Linux assemblers: A comparison of GAS and NASM +A side-by-side look at GNU Assembler (GAS) and Netwide Assembler (NASM) + Save + Like +By Ram Narayan +Published October 17, 2007 + +Introduction +Unlike other languages, assembly programming involves understanding the +processor architecture of the machine that is being programmed. Assembly +programs are not at all portable and are often cumbersome to maintain and +understand, and can often contain a large number of lines of code. But with +these limitations comes the advantage of speed and size of the runtime binary +that executes on that machine. + +Though much information is already available on assembly level programming on +Linux, this article aims to more specifically show the differences between +syntaxes in a way that will help you more easily convert from one flavor of +assembly to the another. The article evolved from my own quest to improve at +this conversion. + +This article uses a series of program examples. Each program illustrates some +feature and is followed by a discussion and comparison of the syntaxes. +Although it’s not possible to cover every difference that exists between +NASM and GAS, I do try to cover the main points and provide a foundation for +further investigation. And for those already familiar with both NASM and GAS, +you might still find something useful here, such as macros. + +This article assumes you have at least a basic understanding of assembly +terminology and have programmed with an assembler using Intel® syntax, +perhaps using NASM on Linux or Windows. This article does not teach how to +type code into an editor or how to assemble and link. You should be familiar +with the Linux operating system (any Linux distribution will do; I used Red +Hat and Slackware) and basic GNU tools such as gcc and ld, and you should be +programming on an x86 machine. + +Now I’ll describe what this article does and does not cover. + +Building the examples + +Assembling: +GAS: +as –o program.o program.s + +NASM: +nasm –f elf –o program.o program.asm + +Linking (common to both kinds of assembler): +ld –o program program.o + +Linking when an external C library is to be used: +ld –-dynamic-linker /lib/ld-linux.so.2 –lc –o program program.o + +This article covers: + +Basic syntactical differences between NASM and GAS +Common assembly level constructs such as variables, loops, labels, and macros +A bit about calling external C routines and using functions +Assembly mnemonic differences and usage +Memory addressing methods +This article does not cover: + +The processor instruction set +Various forms of macros and other constructs particular to an assembler +Assembler directives peculiar to either NASM or GAS +Features that are not commonly used or are found only in one assembler but not +in the other +For more information, refer to the official assembler manuals (see resources +section in the right for links), as those are the most complete sources of +information. + +Basic structure +Listing 1 shows a very simple program that simply exits with an exit code of +2. This little program describes the basic structure of an assembly program +for both GAS and NASM. + +Line NASM GAS +001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 ; Text segment +begins section .text global _start ; Program entry point _start: ; Put the +code number for system call mov eax, 1 ; Return value mov ebx, 2 ; Call the OS +int 80h # Text segment begins .section .text .globl _start # Program entry +point _start: # Put the code number for system call movl $1, %eax /* Return +value */ movl $2, %ebx # Call the OS int $0x80 +Listing 1. A program that exits with an exit code of 2 + +Now for a bit of explanation. + +One of the biggest differences between NASM and GAS is the syntax. GAS uses +the AT&T syntax, a relatively archaic syntax that is specific to GAS and some +older assemblers, whereas NASM uses the Intel syntax, supported by a majority +of assemblers such as TASM and MASM. (Modern versions of GAS do support a +directive called .intel_syntax, which allows the use of Intel syntax with GAS.) + +The following are some of the major differences summarized from the GAS manual: + +AT&T and Intel syntax use the opposite order for source and destination +operands. For example: + +Intel: mov eax, 4 +AT&T: movl $4, %eax +In AT&T syntax, immediate operands are preceded by $; in Intel syntax, +immediate operands are not. For example: + +Intel: push 4 +AT&T: pushl $4 +In AT&T syntax, register operands are preceded by %; in Intel syntax, they are +not. +In AT&T syntax, the size of memory operands is determined from the last +character of the opcode name. Opcode suffixes of b, w, and l specify byte +(8-bit), word (16-bit), and long (32-bit) memory references. Intel syntax +accomplishes this by prefixing memory operands (not the opcodes themselves) +with byte ptr, word ptr, and dword ptr. Thus: + +Intel: mov al, byte ptr foo +AT&T: movb foo, %al +Immediate form long jumps and calls are lcall/ljmp $section, $offset in AT&T +syntax; the Intel syntax is call/jmp far section:offset. The far return +instruction is lret $stack-adjust in AT&T syntax, whereas Intel uses ret far +stack-adjust. +In both the assemblers, the names of registers remain the same, but the syntax +for using them is different as is the syntax for addressing modes. In +addition, assembler directives in GAS begin with a “.”, but not in NASM. + +The .text section is where the processor begins code execution. The global +(also .globl or .global in GAS) keyword is used to make a symbol visible to +the linker and available to other linking object modules. On the NASM side of +Listing 1, global _start marks the symbol _start as a visible identifier so +the linker knows where to jump into the program and begin execution. As with +NASM, GAS looks for this _start label as the default entry point of a program. +A label always ends with a colon in both GAS and NASM. + +Interrupts are a way to inform the OS that its services are required. The int +instruction in line 16 does this job in our program. Both GAS and NASM use the +same mnemonic for interrupts. GAS uses the 0x prefix to specify a hex number, +whereas NASM uses the h suffix. Because immediate operands are prefixed with $ +in GAS, 80 hex is $0x80. + +int $0x80 (or 80h in NASM) is used to invoke Linux and request a service. The +service code is present in the EAX register. A value of 1 (for the Linux exit +system call) is stored in EAX to request that the program exit. Register EBX +contains the exit code (2, in our case), a number that is returned to the OS. +(You can track this number by typing echo $? at the command prompt.) + +Finally, a word about comments. GAS supports both C style (/* */), C++ style +(//), and shell style (#) comments. NASM supports single-line comments that +begin with the “;” character. + +Variables and accessing memory +This section begins with an example program that finds the largest of three +numbers. + +Line NASM GAS +001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 +020 021 022 023 024 025 026 027 028 029 030 031 ; Data section begins section +.data var1 dd 40 var2 dd 20 var3 dd 30 section .text global _start _start: ; +Move the contents of variables mov ecx, [var1] cmp ecx, [var2] jg +check_third_var mov ecx, [var2] check_third_var: cmp ecx, [var3] jg _exit mov +ecx, [var3] _exit: mov eax, 1 mov ebx, ecx int 80h // Data section begins +.section .data var1: .int 40 var2: .int 20 var3: .int 30 .section .text .globl +_start _start: # move the contents of variables movl (var1), %ecx cmpl (var2), +%ecx jg check_third_var movl (var2), %ecx check_third_var: cmpl (var3), %ecx +jg _exit movl (var3), %ecx _exit: movl $1, %eax movl %ecx, %ebx int $0x80 +Listing 2. A program that finds the maximum of three numbers + +You can see several differences above in the declaration of memory variables. +NASM uses the dd, dw, and db directives to declare 32-, 16-, and 8-bit +numbers, respectively, whereas GAS uses the .long, .int, and .byte for the +same purpose. GAS has other directives too, such as .ascii, .asciz, and +.string. In GAS, you declare variables just like other labels (using a colon), +but in NASM you simply type a variable name (without the colon) before the +memory allocation directive (dd, dw, etc.), followed by the value of the +variable. + +Line 18 in Listing 2 illustrates the memory indirect addressing mode. NASM +uses square brackets to dereference the value at the address pointed to by a +memory location: [var1]. GAS uses a circular brace to dereference the same +value: (var1). The use of other addressing modes is covered later in this +article. + +Using macros +Listing 3 illustrates the concepts of this section; it accepts the user’s +name as input and returns a greeting. + +Line NASM GAS +001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 +020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 +039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 +058 059 060 061 062 section .data prompt_str db 'Enter your name: ' ; $ is +the location counter STR_SIZE equ $ - prompt_str greet_str db 'Hello ' +GSTR_SIZE equ $ - greet_str section .bss ; Reserve 32 bytes of memory buff +resb 32 ; A macro with two parameters ; Implements the write system call +%macro write 2 mov eax, 4 mov ebx, 1 mov ecx, %1 mov edx, %2 int 80h %endmacro +; Implements the read system call %macro read 2 mov eax, 3 mov ebx, 0 mov ecx, +%1 mov edx, %2 int 80h %endmacro section .text global _start _start: write +prompt_str, STR_SIZE read buff, 32 ; Read returns the length in eax push eax ; +Print the hello text write greet_str, GSTR_SIZE pop edx ; edx = length +returned by read write buff, edx _exit: mov eax, 1 mov ebx, 0 int 80h +.section .data prompt_str: .ascii "Enter Your Name: " pstr_end: .set STR_SIZE, +pstr_end - prompt_str greet_str: .ascii "Hello " gstr_end: .set GSTR_SIZE, +gstr_end - greet_str .section .bss // Reserve 32 bytes of memory .lcomm buff, +32 // A macro with two parameters // implements the write system call .macro +write str, str_size movl $4, %eax movl $1, %ebx movl \str, %ecx movl +\str_size, %edx int $0x80 .endm // Implements the read system call .macro read +buff, buff_size movl $3, %eax movl $0, %ebx movl \buff, %ecx movl \buff_size, +%edx int $0x80 .endm .section .text .globl _start _start: write $prompt_str, +$STR_SIZE read $buff, $32 // Read returns the length in eax pushl %eax // +Print the hello text write $greet_str, $GSTR_SIZE popl %edx // edx = length +returned by read write $buff, %edx _exit: movl $1, %eax movl $0, %ebx int $0x80 +Listing 3. A program to read a string and display a greeting to the user + +The heading for this section promises a discussion of macros, and both NASM +and GAS certainly support them. But before we get into macros, a few other +features are worth comparing. + +Listing 3 illustrates the concept of uninitialized memory, defined using the +.bss section directive (line 14). BSS stands for “block storage segment” +(originally, “block started by symbol”), and the memory reserved in the +BSS section is initialized to zero during the start of the program. Objects in +the BSS section have only a name and a size, and no value. Variables declared +in the BSS section don’t actually take space, unlike in the data segment. + +NASM uses the resb, resw, and resd keywords to allocated byte, word, and dword +space in the BSS section. GAS, on the other hand, uses the .lcomm keyword to +allocate byte-level space. Notice the way the variable name is declared in +both versions of the program. In NASM the variable name precedes the resb (or +resw or resd) keyword, followed by the amount of space to be reserved, whereas +in GAS the variable name follows the .lcomm keyword, which is then followed by +a comma and then the amount of space to be reserved. This shows the difference: + +NASM: varname resb size + +GAS: .lcomm varname, size + +Listing 2 also introduces the concept of a location counter (line 6). NASM +provides a special variable (the $ and $$ variables) to manipulate the +location counter. In GAS, there is no method to manipulate the location +counter and you have to use labels to calculate the next storage location +(data, instruction, etc.). + +For example, to calculate the length of a string, you would use the following +idiom in NASM: + +prompt_str db 'Enter your name: ' STR_SIZE equ $ - prompt_str ; $ is the +location counter + +The $ gives the current value of the location counter, and subtracting the +value of the label (all variable names are labels) from this location counter +gives the number of bytes present between the declaration of the label and the +current location. The equ directive is used to set the value of the variable +STR_SIZE to the expression following it. A similar idiom in GAS looks like +this: + +prompt_str: .ascii "Enter Your Name: " pstr_end: .set STR_SIZE, pstr_end - +prompt_str + +The end label (pstr_end) gives the next location address, and subtracting the +starting label address gives the size. Also note the use of .set to initialize +the value of the variable STR_SIZE to the expression following the comma. A +corresponding .equ can also be used. There is no alternative to GAS’s set +directive in NASM. + +As I mentioned, Listing 3 uses macros (line 21). Different macro techniques +exist in NASM and GAS, including single-line macros and macro overloading, but +I only deal with the basic type here. A common use of macros in assembly is +clarity. Instead of typing the same piece of code again and again, you can +create reusable macros that both avoid this repetition and enhance the look +and readability of the code by reducing clutter. + +NASM users might be familiar with declaring macros using the %beginmacro +directive and ending them with an %endmacro directive. A %beginmacro directive +is followed by the macro name. After the macro name comes a count, the number +of macro arguments the macro is supposed to have. In NASM, macro arguments are +numbered sequentially starting with 1. That is, the first argument to a macro +is %1, the second is %2, the third is %3, and so on. For example: + +%beginmacro macroname 2 mov eax, %1 mov ebx, %2 %endmacro + +This creates a macro with two arguments, the first being %1 and the second +being %2. Thus, a call to the above macro would look something like this: + +macroname 5, 6 + +Macros can also be created without arguments, in which case they don’t +specify any number. + +Now let’s take a look at how GAS uses macros. GAS provides the .macro and +.endm directives to create macros. A .macro directive is followed by a macro +name, which may or may not have arguments. In GAS, macro arguments are given +by name. For example: + +.macro macroname arg1, arg2 movl \arg1, %eax movl \arg2, %ebx .endm + +A backslash precedes the name of each argument of the macro when the name is +actually used inside a macro. If this is not done, the linker would treat the +names as labels rather then as arguments and will report an error. + +Functions, external routines, and the stack +The example program for this section implements a selection sort on an array +of integers. + +Line NASM GAS +001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 +020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 +039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 +058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 +077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 +096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 +115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 +134 135 136 137 138 139 140 141 142 143 144 145 section .data array db 89, 10, +67, 1, 4, 27, 12, 34, 86, 3 ARRAY_SIZE equ $ - array array_fmt db " %d", 0 +usort_str db "unsorted array:", 0 sort_str db "sorted array:", 0 newline db +10, 0 section .text extern puts global _start _start: push usort_str call puts +add esp, 4 push ARRAY_SIZE push array push array_fmt call print_array10 add +esp, 12 push ARRAY_SIZE push array call sort_routine20 ; Adjust the stack +pointer add esp, 8 push sort_str call puts add esp, 4 push ARRAY_SIZE push +array push array_fmt call print_array10 add esp, 12 jmp _exit extern printf +print_array10: push ebp mov ebp, esp sub esp, 4 mov edx, [ebp + 8] mov ebx, +[ebp + 12] mov ecx, [ebp + 16] mov esi, 0 push_loop: mov [ebp - 4], ecx mov +edx, [ebp + 8] xor eax, eax mov al, byte [ebx + esi] push eax push edx call +printf add esp, 8 mov ecx, [ebp - 4] inc esi loop push_loop push newline call +printf add esp, 4 mov esp, ebp pop ebp ret sort_routine20: push ebp mov ebp, +esp ; Allocate a word of space in stack sub esp, 4 ; Get the address of the +array mov ebx, [ebp + 8] ; Store array size mov ecx, [ebp + 12] dec ecx ; +Prepare for outer loop here xor esi, esi outer_loop: ; This stores the min +index mov [ebp - 4], esi mov edi, esi inc edi inner_loop: cmp edi, ARRAY_SIZE +jge swap_vars xor al, al mov edx, [ebp - 4] mov al, byte [ebx + edx] cmp byte +[ebx + edi], al jge check_next mov [ebp - 4], edi check_next: inc edi jmp +inner_loop swap_vars: mov edi, [ebp - 4] mov dl, byte [ebx + edi] mov al, byte +[ebx + esi] mov byte [ebx + esi], dl mov byte [ebx + edi], al inc esi loop +outer_loop mov esp, ebp pop ebp ret _exit: mov eax, 1 mov ebx, 0 int +80h .section .data array: .byte 89, 10, 67, 1, 4, 27, 12, 34, 86, 3 +array_end: .equ ARRAY_SIZE, array_end - array array_fmt: .asciz " %d" +usort_str: .asciz "unsorted array:" sort_str: .asciz "sorted array:" newline: +.asciz "\n" .section .text .globl _start _start: pushl $usort_str call puts +addl $4, %esp pushl $ARRAY_SIZE pushl $array pushl $array_fmt call +print_array10 addl $12, %esp pushl $ARRAY_SIZE pushl $array call +sort_routine20 # Adjust the stack pointer addl $8, %esp pushl $sort_str call +puts addl $4, %esp pushl $ARRAY_SIZE pushl $array pushl $array_fmt call +print_array10 addl $12, %esp jmp _exit print_array10: pushl %ebp movl %esp, +%ebp subl $4, %esp movl 8(%ebp), %edx movl 12(%ebp), %ebx movl 16(%ebp), %ecx +movl $0, %esi push_loop: movl %ecx, -4(%ebp) movl 8(%ebp), %edx xorl %eax, +%eax movb (%ebx, %esi, 1), %al pushl %eax pushl %edx call printf addl $8, %esp +movl -4(%ebp), %ecx incl %esi loop push_loop pushl $newline call printf addl +$4, %esp movl %ebp, %esp popl %ebp ret sort_routine20: pushl %ebp movl %esp, +%ebp # Allocate a word of space in stack subl $4, %esp # Get the address of +the array movl 8(%ebp), %ebx # Store array size movl 12(%ebp), %ecx decl %ecx +# Prepare for outer loop here xorl %esi, %esi outer_loop: # This stores the +min index movl %esi, -4(%ebp) movl %esi, %edi incl %edi inner_loop: cmpl +$ARRAY_SIZE, %edi jge swap_vars xorb %al, %al movl -4(%ebp), %edx movb (%ebx, +%edx, 1), %al cmpb %al, (%ebx, %edi, 1) jge check_next movl %edi, -4(%ebp) +check_next: incl %edi jmp inner_loop swap_vars: movl -4(%ebp), %edi movb +(%ebx, %edi, 1), %dl movb (%ebx, %esi, 1), %al movb %dl, (%ebx, %esi, 1) movb +%al, (%ebx, %edi, 1) incl %esi loop outer_loop movl %ebp, %esp popl %ebp ret +_exit: movl $1, %eax movl $0, %ebx int $0x80 +Listing 4. Implementation of selection sort on an integer array + +Listing 4 might look overwhelming at first, but in fact it’s very simple. +The listing introduces the concept of functions, various memory addressing +schemes, the stack and the use of a library function. The program sorts an +array of 10 numbers and uses the external C library functions puts and printf +to print out the entire contents of the unsorted and sorted array. For +modularity and to introduce the concept of functions, the sort routine itself +is implemented as a separate procedure along with the array print routine. +Let’s deal with them one by one. + +After the data declarations, the program execution begins with a call to puts +(line 31). The puts function displays a string on the console. Its only +argument is the address of the string to be displayed, which is passed on to +it by pushing the address of the string in the stack (line 30). + +In NASM, any label that is not part of our program and needs to be resolved +during link time must be predefined, which is the function of the extern +keyword (line 24). GAS doesn’t have such requirements. After this, the +address of the string usort_str is pushed onto the stack (line 30). In NASM, a +memory variable such as usort_str represents the address of the memory +location itself, and thus a call such as push usort_str actually pushes the +address on top of the stack. In GAS, on the other hand, the variable usort_str +must be prefixed with $, so that it is treated as an immediate address. If +it’s not prefixed with $, the actual bytes represented by the memory +variable are pushed onto the stack instead of the address. + +Since pushing a variable essentially moves the stack pointer by a dword, the +stack pointer is adjusted by adding 4 (the size of a dword) to it (line 32). + +Three arguments are now pushed onto the stack, and the print_array10 function +is called (line 37). Functions are declared the same way in both NASM and GAS. +They are nothing but labels, which are invoked using the call instruction. + +After a function call, ESP represents the top of the stack. A value of esp + 4 +represents the return address, and a value of esp + 8 represents the first +argument to the function. All subsequent arguments are accessed by adding the +size of a dword variable to the stack pointer (that is, esp + 12, esp + 16, +and so on). + +Once inside a function, a local stack frame is created by copying esp to ebp +(line 62). You can also allocate space for local variables as is done in the +program (line 63). You do this by subtracting the number of bytes required +from esp. A value of esp – 4 represents a space of 4 bytes allocated for a +local variable, and this can continue as long as there is enough space in the +stack to accommodate your local variables. + +Listing 4 illustrates the base indirect addressing mode (line 64), so called +because you start with a base address and add an offset to it to arrive at a +final address. On the NASM side of the listing, [ebp + 8] is one such example, +as is [ebp – 4] (line 71). In GAS, the addressing is a bit more terse: +4(%ebp) and -4(%ebp), respectively. + +In the print_array10 routine, you can see another kind of addressing mode +being used after the push_loop label (line 74). The line is represented in +NASM and GAS, respectively, like so: + +NASM: mov al, byte [ebx + esi] + +GAS: movb (%ebx, %esi, 1), %al + +This addressing mode is the base indexed addressing mode. Here, there are +three entities: one is the base address, the second is the index register, and +the third is the multiplier. Because it’s not possible to determine the +number of bytes to be accessed from a memory location, a method is needed to +find out the amount of memory addressed. NASM uses the byte operator to tell +the assembler that a byte of data is to be moved. In GAS the same problem is +solved by using a multiplier as well as using the b, w, or l suffix in the +mnemonic (for example, movb). The syntax of GAS can seem somewhat complex when +first encountered. + +The general form of base indexed addressing in GAS is as follows: + +%segment:ADDRESS (, index, multiplier) + +or + +%segment:(offset, index, multiplier) + +or + +%segment:ADDRESS(base, index, multiplier) + +The final address is calculated using this formula: + +ADDRESS or offset + base + index * multiplier. + +Thus, to access a byte, a multiplier of 1 is used, for a word, 2, and for a +dword, 4. Of course, NASM uses a simpler syntax. Thus, the above in NASM would +be represented like so: + +Segment:[ADDRESS or offset + index * multiplier] + +A prefix of byte, word, or dword is used before this memory address to access +1, 2, or 4 bytes of memory, respectively. + +Leftovers +Line NASM GAS +001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 +020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 +039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 +058 059 060 061 section .data ; Command table to store at most ; 10 command +line arguments cmd_tbl: %rep 10 dd 0 %endrep section .text global _start +_start: ; Set up the stack frame mov ebp, esp ; Top of stack contains the ; +number of command line arguments. ; The default value is 1 mov ecx, [ebp] ; +Exit if arguments are more than 10 cmp ecx, 10 jg _exit mov esi, 1 mov edi, 0 +; Store the command line arguments ; in the command table store_loop: mov eax, +[ebp + esi * 4] mov [cmd_tbl + edi * 4], eax inc esi inc edi loop store_loop +mov ecx, edi mov esi, 0 extern puts print_loop: ; Make some local space sub +esp, 4 ; puts function corrupts ecx mov [ebp - 4], ecx mov eax, [cmd_tbl + esi +* 4] push eax call puts add esp, 4 mov ecx, [ebp - 4] inc esi loop print_loop +jmp _exit _exit: mov eax, 1 mov ebx, 0 int 80h .section .data // Command +table to store at most // 10 command line arguments cmd_tbl: .rept 10 .long 0 +.endr .section .text .globl _start _start: // Set up the stack frame movl +%esp, %ebp // Top of stack contains the // number of command line arguments. +// The default value is 1 movl (%ebp), %ecx // Exit if arguments are more than +10 cmpl $10, %ecx jg _exit movl $1, %esi movl $0, %edi // Store the command +line arguments // in the command table store_loop: movl (%ebp, %esi, 4), %eax +movl %eax, cmd_tbl( , %edi, 4) incl %esi incl %edi loop store_loop movl %edi, +%ecx movl $0, %esi print_loop: // Make some local space subl $4, %esp // puts +functions corrupts ecx movl %ecx, -4(%ebp) movl cmd_tbl( , %esi, 4), %eax +pushl %eax call puts addl $4, %esp movl -4(%ebp), %ecx incl %esi loop +print_loop jmp _exit _exit: movl $1, %eax movl $0, %ebx int $0x80 +Listing 5. A program that reads command line arguments, stores them in memory, +and prints them + +Listing 5 shows a construct that repeats instructions in assembly. Naturally +enough, it’s called the repeat construct. In GAS, the repeat construct is +started using the .rept directive (line 6). This directive has to be closed +using an .endr directive (line 8). .rept is followed by a count in GAS that +specifies the number of times the expression enclosed inside the .rept/.endr +construct is to be repeated. Any instruction placed inside this construct is +equivalent to writing that instruction count number of times, each on a +separate line. + +For example, for a count of 3: + +.rept 3 movl $2, %eax .endr + +This is equivalent to: + +movl $2, %eax movl $2, %eax movl $2, %eax + +In NASM, a similar construct is used at the preprocessor level. It begins with +the %rep directive and ends with %endrep. The %rep directive is followed by an +expression (unlike in GAS where the .rept directive is followed by a count): + +%rep nop %endrep + +There is also an alternative in NASM, the times directive. Similar to %rep, it +works at the assembler level, and it, too, is followed by an expression. For +example, the above %rep construct is equivalent to this: + +times nop + +And this: + +%rep 3 mov eax, 2 %endrep + +is equivalent to this: + +times 3 mov eax, 2 + +and both are equivalent to this: + +mov eax, 2 mov eax, 2 mov eax, 2 + +In Listing 5, the .rept (or %rep) directive is used to create a memory data +area for 10 double words. The command line arguments are then accessed one by +one from the stack and stored in the memory area until the command table gets +full. + +As for command line arguments, they are accessed similarly with both +assemblers. ESP or the top of the stack stores the number of command line +arguments supplied to a program, which is 1 by default (for no command line +arguments). esp + 4 stores the first command line argument, which is always +the name of the program that was invoked from the command line. esp + 8, esp + +12, and so on store subsequent command line arguments. + +Also watch the way the memory command table is being accessed on both sides in +Listing 5. Here, memory indirect addressing mode (line 33) is used to access +the command table along with an offset in ESI (and EDI) and a multiplier. +Thus, [cmd_tbl + esi * 4] in NASM is equal to cmd_tbl(, %esi, 4) in GAS. + +Conclusion +Even though the differences between these two assemblers are substantial, +it’s not that difficult to convert from one form to another. You might find +that the AT&T syntax seems at first difficult to understand, but once +mastered, it’s as simple as the Intel syntax. + -- cgit v1.2.3-54-g00ecf