miniany/doc/blog.jeff.over.bz_assembly_compilers_jit_2017_01_15_x86-assembler.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679

   #[1]256LOL RSS Feed

[2][headshot-circle.png]

   256 LINES OR LESS
   (jeff overbey's blog)
     * [3]22 Apr 2018 Building a Go Doctor Refactoring
     * [4]09 Sep 2017 Lexical Analysis
     * [5]01 Jun 2017 On Performance Improvements
     * [6]30 Mar 2017 Executing Dynamically Generated Machine Code: The
       Start of a JIT
     * [7]15 Feb 2017 Finding Machine Language Encodings
     * [8]15 Jan 2017 An x86 Assembler in 256 LOC
     * [9]15 Dec 2016 256 Lines or Less (the joverblog?)

   [10]RSS Feed o [11]My Web Site
   [sidebar-button.png]
   [empty.jpg]

An x86 Assembler in 256 LOC

   15 Jan 2017

   For the first "real" post in this blog, we'll build an x86 assembler in
   less than 256 lines of C code. Obviously, it won't implement every x86
   instruction, but it will implement a surprisingly useful subset: data
   movement, control flow, integer arithmetic, bitwise operations, and
   function calls. We won't be able to run the generated machine code yet
   (that's coming in a later blog post), but we'll be in a good position
   to do so.

   I'll assume you're already familiar with x86 assembly language
   (hopefully the table below will serve as a brief refresher), although I
   won't assume you know about their machine language encodings. I'll also
   assume that you're familiar with hexadecimal representation and
   arithmetic (e.g., 9 + 1 = A and 10 - 1 = F).

1. Which instructions will we support?

   By the time we finish, we'll have an assembler that supports all of the
   following x86 instructions (yes, I'm serious):
   Instruction    Example    Description of the Example
   nop   nop   No operation (do nothing)
           -- Data Movement --
   mov register, immediate   mov eax, 0F00Dh   Place the value F00D
   (hexadecimal) in EAX
   mov register, register   mov eax, ebx   Copy the value from the EBX
   register into EAX
   mov register, [register]   mov eax, [ebx]   Treat EBX as pointer; load
   32-bit value from memory into EAX
   mov [register], register   mov [eax], ebx   Treat EAX as pointer; store
   32-bit value from EBX in memory
           -- Arithmetic --
   add register, register   add eax, ebx   EAX = EAX + EBX
   cdq   cdq   Sign-extend EAX into EDX in preparation for idiv
   dec register   dec eax   EAX = EAX - 1
   div register   div ebx   Unsigned division: EDX:EAX � EBX,
   setting EAX = quotient, EDX = remainder
   idiv register   idiv ebx   Signed division: EDX:EAX � EBX,
   setting EAX = quotient, EDX = remainder
   imul register   imul ebx   Signed multiplication: EDX:EAX = EAX � EBX
   inc register   inc eax   EAX = EAX + 1
   neg register   neg eax   EAX = -EAX
   mul register   mul ebx   Unsigned multiplication: EDX:EAX = EAX � EBX
   sub register, register   sub eax, ebx   EAX = EAX - EBX
           -- Bitwise Operations --
   and register, register   and eax, ebx   EAX = EAX & EBX
   not register   not eax   EAX = ~EAX
   or register, register   or eax, ebx   EAX = EAX | EBX
   sar register, immediate   sar eax, 2   Shift EAX right by 2 bits
   (sign-fill)
   sar register, cl   sar eax, cl   Shift EAX right by CL bits (sign-fill)
   shl register, immediate   shl eax, 2   Shift EAX left by 2 bits
   shl register, cl   shl eax, cl   Shift EAX left by number of bits in CL
   shr register, immediate   shr eax, 2   Shift EAX right by 2 bits
   (zero-fill)
   shr register, cl   shr eax, cl   Shift EAX right by CL bits (zero-fill)
   xor register, register   xor eax, ebx   EAX = EAX ^ EBX
           -- Comparison --
   cmp register, register   cmp eax, ebx   Compare EAX to EBX, setting
   flags for conditional jump
           -- Control Flow --
   jmp bytes   jmp -10   Jump -10 bytes, i.e., move EIP backward by 10
   bytes
   ja bytes   ja -10   Jump if above (>, unsigned)
   jae bytes   jae -10   Jump if above or equal (>=, unsigned)
   jb bytes   jb -10   Jump if below (<, unsigned)
   jbe bytes   jbe -10   Jump if below or equal (<=, unsigned)
   je bytes   je -10   Jump if equal
   jg bytes   jg -10   Jump if greater (>, signed)
   jge bytes   jge -10   Jump if greater or equal (>=, signed)
   jl bytes   jl -10   Jump if less (<, signed)
   jle bytes   jle -10   Jump if less or equal (<=, signed)
   jne bytes   jne -10   Jump if not equal
           -- Function Calls --
   call register   call eax   Call function at pointer stored in EAX
   push register   push eax   Push value of EAX onto the stack
   pop register   pop eax   Pop a value from the stack into EAX
   ret immediate   ret 4   Return from function, removing 4 bytes of stack
   arguments

2. The API: x86asm.h

   The header file, x86asm.h, defines the API that we intend for clients
   to use. It provides
     * an enumeration of the x86's 32-bit registers (reg32_t), and
     * one function for each instruction form we can assemble.

   Here's the header in its entirety. (There's more explanation in the
   next section, but it will be helpful to read through the header file
   first.)
// x86 Subset Assembler - API
//-----------------------------------------------------------------------------
// Copyright (C) 2017 Jeffrey L. Overbey.  Use of this source code is governed
// by a BSD-style license posted at http://blog.jeff.over.bz/license/

#ifndef X86ASM_H
#define X86ASM_H

#include <stdint.h> // uint8_t, unint32_t

typedef enum { EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI } reg32_t;

uint8_t *nop(uint8_t *buf);

uint8_t *mov_immediate(reg32_t dest, int32_t value, uint8_t *buf);
uint8_t * mov_from_ptr(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *   mov_to_ptr(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *          mov(reg32_t dest, reg32_t src, uint8_t *buf);

uint8_t *  add(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *  sub(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *  and(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *   or(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *  xor(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *  cmp(reg32_t dest, reg32_t src, uint8_t *buf);
uint8_t *  inc(reg32_t reg, uint8_t *buf);
uint8_t *  dec(reg32_t reg, uint8_t *buf);
uint8_t *  not(reg32_t reg, uint8_t *buf);
uint8_t *  neg(reg32_t reg, uint8_t *buf);
uint8_t *  mul(reg32_t reg, uint8_t *buf);
uint8_t * imul(reg32_t reg, uint8_t *buf);
uint8_t * div_(reg32_t reg, uint8_t *buf);
uint8_t * idiv(reg32_t reg, uint8_t *buf);
uint8_t *  cdq(uint8_t *buf);

uint8_t *   shl(reg32_t reg, uint8_t bits, uint8_t *buf);
uint8_t *shl_cl(reg32_t reg, uint8_t *buf);
uint8_t *   shr(reg32_t reg, uint8_t bits, uint8_t *buf);
uint8_t *shr_cl(reg32_t reg, uint8_t *buf);
uint8_t *   sar(reg32_t reg, uint8_t bits, uint8_t *buf);
uint8_t *sar_cl(reg32_t reg, uint8_t *buf);

uint8_t *push(reg32_t reg, uint8_t *buf);
uint8_t * pop(reg32_t reg, uint8_t *buf);
uint8_t *call(reg32_t reg, uint8_t *buf);
uint8_t * ret(uint16_t bytes, uint8_t *buf);

uint8_t * jmp(int32_t relative_bytes, uint8_t *buf);
uint8_t *  jb(int32_t relative_bytes, uint8_t *buf);
uint8_t * jae(int32_t relative_bytes, uint8_t *buf);
uint8_t *  je(int32_t relative_bytes, uint8_t *buf);
uint8_t * jne(int32_t relative_bytes, uint8_t *buf);
uint8_t * jbe(int32_t relative_bytes, uint8_t *buf);
uint8_t *  ja(int32_t relative_bytes, uint8_t *buf);
uint8_t *  jl(int32_t relative_bytes, uint8_t *buf);
uint8_t * jge(int32_t relative_bytes, uint8_t *buf);
uint8_t * jle(int32_t relative_bytes, uint8_t *buf);
uint8_t *  jg(int32_t relative_bytes, uint8_t *buf);

#endif

3. The demo program: demo.c

   Before delving into the implementation of the assembler, it's probably
   helpful to show how this API is used.

   Each function in our API takes a uint8_t pointer buf, writes the
   byte(s) of machine code for a single assembly language instruction to
   memory starting at that address, then returns a pointer to the next
   byte after the instruction that was just assembled.

   For example, the instruction mov eax, 12345678h is assembled into five
   bytes of machine code: b8 78 56 34 12. Calling mov_immediate(EAX,
   0x12345678, buf) stores these five bytes into memory at the location
   pointed to by buf, then it returns buf+5, which is presumably where
   you'll want to store the next instruction.

   For example, suppose you want to assemble the following
   three-instruction program.
mov eax, 120h
add eax, ecx
shl eax, 4

   The following program illustrates how to assemble this sequence of
   three instructions, then write the byte values of the resulting machine
   code to standard output:
#include "x86asm.h"
#include <stdio.h>

int main() {
        uint8_t bytes[64];
        uint8_t *cur = bytes;
        cur = mov_immediate(EAX, 0x120, cur);  // mov eax, 120h
        cur = add(EAX, ECX, cur);              // add eax, ecx
        cur = shl(EAX, 4, cur);                // shl eax, 4

        for (uint8_t *p = bytes; p < cur; p++) {
                printf("%02x ", *p);
        }
        printf("\n");
        return 0;
}

   When you run this, the output is:
b8 20 01 00 00 03 c1 c1 e0 04

4. The implementation: x86asm.c

   Now, we'll start implementing this API. For each instruction, I'll
   describe its machine language encoding, and then the C function that
   implements it.

   The definitive, official reference for the x86 instruction set and its
   machine language encoding is Volume 2 of the [12]Intel� 64 and IA-32
   Architectures Software Developer Manuals. Unfortunately, Intel's
   documentation is not easy to read, so for this small assembler, it will
   be sufficient to simply describe the encodings by example.

No operation - nop

   The nop instruction assembles to a single byte of machine code: 90h.
uint8_t *nop(uint8_t *buf) {
        *buf++ = 0x90;
        return buf;
}

Increment and decrement - inc, dec

   The inc instruction adds 1 to a value in a register; dec subtracts 1.
   Recall from the header file above (x86asm.h) that we defined an
   enumeration with all of the x86's 32-bit registers.
typedef enum { EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI } reg32_t;

   There's a reason we listed the registers in this specific order: when
   instructions take register operands, the encodings tend to follow this
   same order. Notice the pattern in the encodings of the inc and dec
   instructions:
   Instruction    Encoding (hex)          Instruction    Encoding (hex)
   inc eax        40                      dec eax        48
   inc ecx        41                      dec ecx        49
   inc edx        42                      dec edx        4A
   ...                                    ...
   inc edi        47                      dec edi        4F

   Since our reg32_t enum assigns an integer value to each register name
   (EAX=0, ECX=1, EDX=2, etc.), this means we can encode inc register by
   simply adding the register number to hexadecimal 40.
uint8_t *inc(reg32_t reg, uint8_t *buf) {
        *buf++ = 0x40 + reg;
        return buf;
}

uint8_t *dec(reg32_t reg, uint8_t *buf) {
        *buf++ = 0x48 + reg;
        return buf;
}

   (It's more conventional to describe encodings in terms of which bits in
   the encoding represent the operand register. For example, see Volume 2,
   Appendix B of the Intel documentation referenced above. From that
   perspective, it might make more sense to build encodings using bitwise
   operations. However, I'm writing this blog post from the perspective of
   "look at the pattern and implement it;" adding values seems more
   intuitive and produces the same result.)

Move immediate value to register - mov reg, imm

   The following table shows the encodings for mov reg, 1 and
   mov reg, 12345678h. Notice the pattern?
   Instruction    Encoding (hex)             Instruction        Encoding (hex)
   mov eax, 1     B8 01 00 00 00          mov eax, 12345678h    B8 78 56 34 12
   mov ecx, 1     B9 01 00 00 00          mov ecx, 12345678h    B9 78 56 34 12
   mov edx, 1     BA 01 00 00 00          mov edx, 12345678h    BA 78 56 34 12
   ...                                    ...
   mov edi, 1     BF 01 00 00 00          mov edi, 12345678h    BF 78 56 34 12

   While the inc and dec instructions had 1-byte encodings, the encoding
   here is always 5 bytes. The first byte of the encoding is B8 + the
   register number. The next four bytes are the immediate value in little
   endian order, i.e., with the low-order byte first. Assuming the
   assembler will be run on an x86/x64 processor, which uses little endian
   byte ordering natively, nothing special needs to be done to reorder the
   bytes--storing a 32-bit value in memory will store the bytes in little
   endian order.
uint8_t *mov_immediate(reg32_t dest, int32_t value, uint8_t *buf) {
        *buf++ = 0xB8 + dest;
        *((int32_t *)buf) = value; buf += sizeof(int32_t);
        return buf;
}

Load value from memory - mov reg, DWORD PTR [reg]

   So far, our instructions have had straightforward encodings with
   reasonably obvious patterns. This one gets a bit more interesting.
         Instruction           Encoding (hex)
   mov eax, DWORD PTR [eax]    8B 00
   mov eax, DWORD PTR [ecx]    8B 01
   mov eax, DWORD PTR [edx]    8B 02
   mov eax, DWORD PTR [ebx]    8B 03
   mov eax, DWORD PTR [esp]    8B 04 24
   mov eax, DWORD PTR [ebp]    8B 45 00
   mov eax, DWORD PTR [esi]    8B 06
   mov eax, DWORD PTR [edi]    8B 07

   mov ecx, DWORD PTR [eax]    8B 08
   mov ecx, DWORD PTR [ecx]    8B 09
   mov ecx, DWORD PTR [edx]    8B 0A
   mov ecx, DWORD PTR [ebx]    8B 0B
   mov ecx, DWORD PTR [esp]    8B 0C 24
   mov ecx, DWORD PTR [ebp]    8B 4D 00
   mov ecx, DWORD PTR [esi]    8B 0E
   mov ecx, DWORD PTR [edi]    8B 0F

   mov edx, DWORD PTR [eax]    8B 10
   mov edx, DWORD PTR [ecx]    8B 11
   ...
   mov edi, DWORD PTR [edi]    8B 3F

   This form of the mov instruction has a two-byte encoding with a fairly
   obvious pattern, except when the source operand is ESP or EBP... then
   it's a three-byte encoding with a not-so-obvious pattern.^1
uint8_t *mov_from_ptr(reg32_t dest, reg32_t src, uint8_t *buf) {
        *buf++ = 0x8B;
        if (src == ESP) {
                *buf++ = 8*dest + src;
                *buf++ = 0x24;
        } else if (src == EBP) {
                *buf++ = 0x45 + 8*dest;
                *buf++ = 0x00;
        } else {
                *buf++ = 8*dest + src;
        }
        return buf;
}

Store value into memory - mov DWORD PTR [reg], reg

   When mov is used to store a value in memory, the encodings are almost
   identical to the encodings for loading a value from memory, except the
   first byte is 89h and the source and destination operands are reversed
   when encoding the second byte.
         Instruction           Encoding (hex)
   mov DWORD PTR [eax], eax    89 00
   mov DWORD PTR [ecx], eax    89 01
   mov DWORD PTR [edx], eax    89 02
   mov DWORD PTR [ebx], eax    89 03
   mov DWORD PTR [esp], eax    89 04 24
   mov DWORD PTR [ebp], eax    89 45 00
   mov DWORD PTR [esi], eax    89 06
   mov DWORD PTR [edi], eax    89 07
   mov DWORD PTR [eax], ecx    89 08
   mov DWORD PTR [ecx], ecx    89 09
   mov DWORD PTR [edx], ecx    89 0A
   mov DWORD PTR [ebx], ecx    89 0B
   mov DWORD PTR [esp], ecx    89 0C 24
   mov DWORD PTR [ebp], ecx    89 4D 00
   mov DWORD PTR [esi], ecx    89 0E
   mov DWORD PTR [edi], ecx    89 0F
   mov DWORD PTR [eax], edx    89 10
   mov DWORD PTR [ecx], edx    89 11
   ...
   mov DWORD PTR [edi], edi    89 3F
uint8_t *mov_to_ptr(reg32_t dest, reg32_t src, uint8_t *buf) {
        *buf++ = 0x89;
        if (dest == ESP) {
                *buf++ = 8*src + dest;
                *buf++ = 0x24;
        } else if (dest == EBP) {
                *buf++ = 0x45 + 8*src;
                *buf++ = 0x00;
        } else {
                *buf++ = 8*src + dest;
        }
        return buf;
}

RM-encoded instructions: mov, add, sub, and, or, xor, cmp

   Next, we will tackle register-register mov, as well as add, sub, and,
   or, xor, and cmp. All of these instructions have a similar encoding: an
   opcode byte (that differs from one instruction to the next - hence the
   name, "operation code"), followed by a single byte indicating the
   source and destination registers.

   To see the pattern, consider mov and add:
   Instruction     Encoding (hex)          Instruction     Encoding (hex)
   mov eax, eax    8B C0                   add eax, eax    03 C0
   mov eax, ecx    8B C1                   add eax, ecx    03 C1
   mov eax, edx    8B C2                   add eax, edx    03 C2
   ...                                     ...
   mov eax, edi    8B C7                   add eax, edi    03 C7
   mov ecx, eax    8B C8                   add ecx, eax    03 C8
   mov ecx, ecx    8B C9                   add ecx, ecx    03 C9
   ...                                     ...
   mov ecx, edi    8B CF                   add ecx, edi    03 CF
   mov edx, eax    8B D0                   add edx, eax    03 D0
   ...                                     ...
   mov edi, edi    8B FF                   add edi, edi    03 FF

   The second byte of the encoding is hex C0, plus 8 times the destination
   register number, plus the source register number.
#define DEFINE_INSN_RM(mnemonic, opcode)                     \
uint8_t *mnemonic(reg32_t dest, reg32_t src, uint8_t *buf) { \
        *buf++ = opcode;                                     \
        *buf++ = 8*dest + 0xC0 + src;                        \
        return buf;                                          \
}

DEFINE_INSN_RM(mov, 0x8B)
DEFINE_INSN_RM(add, 0x03)
DEFINE_INSN_RM(sub, 0x2B)
DEFINE_INSN_RM(and, 0x23)
DEFINE_INSN_RM( or, 0x0B)
DEFINE_INSN_RM(xor, 0x33)
DEFINE_INSN_RM(cmp, 0x3B)

Instructions with opcodes beginning with F7: not, neg, mul, imul, div, idiv

   The not, neg, mul, imul, div, and idiv instructions also have similar
   encodings. The first byte of the encoding is F7. The second byte
   indicates both the operation and the operand (register).
   Instruction    Encoding (hex)          Instruction    Encoding (hex)
   not eax        F7 D0                   neg eax        F7 D8
   not ecx        F7 D1                   neg ecx        F7 D9
   ...
   not edi        F7 D7                   neg edi        F7 DF

   As a note, we named the C function for the div instruction div_, since
   the C standard library's stdlib.h includes the [13]div(3) instruction.
#define DEFINE_INSN_F7(mnemonic, reg_base)     \
uint8_t *mnemonic(reg32_t reg, uint8_t *buf) { \
        *buf++ = 0xF7;                         \
        *buf++ = reg_base + reg;               \
        return buf;                            \
}

DEFINE_INSN_F7( not, 0xD0)
DEFINE_INSN_F7( neg, 0xD8)
DEFINE_INSN_F7( mul, 0xE0)
DEFINE_INSN_F7(imul, 0xE8)
DEFINE_INSN_F7(div_, 0xF0)
DEFINE_INSN_F7(idiv, 0xF8)

Convert doubleword to quadword - cdq

   Both the div and idiv instructions take a 64-bit dividend (with the
   high 32 bits in EDX and the low 32 bits in EAX) and divide it by a
   32-bit divisor (the register operand). To divide two 32-bit values, the
   dividend must be extended to 64 bits. For unsigned division (div), this
   is easy: mov edx, 0. For signed division (idiv), the 32-bit value must
   be sign-extended to 64 bits. This is done by the cdq instruction: it
   copies the sign bit of EAX into all 32 bits of EDX.
uint8_t *cdq(uint8_t *buf) {
        *buf++ = 0x99;
        return buf;
}

Bit shift instructions - shl, shr, sar

   The bit shift instructions are interesting for two reasons:
     * The number of bits to shift can be an immediate value (0-255), or
       it can be stored in the CL register (another name for the lowest 8
       bits of the ECX register).
     * The encoding for a one-bit shift is different.

   Using the left shift instruction as an example:
    Instruction     Encoding (hex)          Instruction    Encoding (hex)
   shl eax, 0       C1 E0 00                shl eax, cl    D3 E0
   shl eax, 1       D1 E0                   shl ecx, cl    D3 E1
   shl eax, 2       C1 E0 02                shl edx, cl    D3 E2
   shl eax, 3       C1 E0 03                shl ebx, cl    D3 E3
   ...                                      ...
   shl ecx, 0FFh    C1 E1 FF
   shl ecx, 0       C1 E1 00
   shl ecx, 1       D1 E1
   shl ecx, 2       C1 E1 02
   shl ecx, 3       C1 E1 03
   ...
   shl ecx, 0FFh    C1 E1 FF
   ...

   We can implement this in our assembler as follows.
#define DEFINE_INSN_D1C1(mnemonic, reg_base)                  \
uint8_t *mnemonic(reg32_t reg, uint8_t bits, uint8_t *buf) {  \
        switch (bits) {                                       \
        case 1: /* 1-bit shifts have a different opcode */    \
                *buf++ = 0xD1;                                \
                *buf++ = reg_base + reg;                      \
                break;                                        \
        default:                                              \
                *buf++ = 0xC1;                                \
                *buf++ = reg_base + reg;                      \
                *buf++ = bits;                                \
        }                                                     \
        return buf;                                           \
}                                                             \
uint8_t *mnemonic##_cl(reg32_t reg, uint8_t *buf) {           \
        *buf++ = 0xD3;                                        \
        *buf++ = reg_base + reg;                              \
        return buf;                                           \
}

DEFINE_INSN_D1C1(shl, 0xE0)
DEFINE_INSN_D1C1(shr, 0xE8)
DEFINE_INSN_D1C1(sar, 0xF8)

Procedure calls: push, pop, call, ret

   The push, pop, call, and ret instructions are the four essential
   instructions for procedure calls. Their encodings follow similar
   patterns to those we've already seen, except with different opcode
   bytes.
uint8_t *push(reg32_t reg, uint8_t *buf) {
        *buf++ = 0x50 + reg;
        return buf;
}

uint8_t *pop(reg32_t reg, uint8_t *buf) {
        *buf++ = 0x58 + reg;
        return buf;
}

uint8_t *call(reg32_t reg, uint8_t *buf) {
        *buf++ = 0xFF;
        *buf++ = 0xD0 + reg;
        return buf;
}

   The encoding of ret is only slightly more interesting, since ret 0
   (which is often written as ret with no operand) is encoded differently
   than ret with a nonzero immediate operand, such as ret 4 or ret 16.
uint8_t *ret(uint16_t bytes, uint8_t *buf) {
        if (bytes == 0) {
                *buf++ = 0xC3;
        } else {
                *buf++ = 0xC2;
                *((uint16_t *)buf) = bytes; buf += sizeof(uint16_t);
        }
        return buf;
}

Jumps

   In x86 assembly language, jumps are usually written with labels. For
   example:
there: mov eax, 12345678h    ; b8 78 56 34 12
       jmp there             ; eb f9
       nop                   ; 90

   Recall that the EIP register is the instruction pointer. When the
   processor fetches an instruction to execute, it increments EIP to point
   to the following instruction. A jump changes the value of EIP. In our
   example, the effect of the jump is to move EIP backward by 7 bytes, so
   it will point to the start of the mov instruction.
                            EIP is here after the processor
                            fetches the  "jmp there" instruction
                            |v
B8  78  56  34  12  EB  F9  90
^|___________________________|
We want to move it 7 bytes backward
to place it here

   So, how is jmp encoded? Hex F9 is the two's complement representation
   of -7... so the encoding above (EB F9) is in essence "jump -7 bytes."

   Complicating things slightly, the jmp instruction is encoded with an EB
   opcode byte if the jump distance is between -128 and 127 bytes,
   inclusive, and with an E9 opcode if the jump distance is larger than
   that.
uint8_t *jmp(int32_t bytes, uint8_t *buf) {
        if (INT8_MIN <= bytes && bytes <= INT8_MAX) {
                *buf++ = 0xEB;
                *buf++ = (int8_t)bytes;
        } else {
                *buf++ = 0xE9;
                *((int32_t *)buf) = bytes; buf += sizeof(int32_t);
        }
        return buf;
}

   Conditional jumps are encoded similarly, except with different opcodes,
   of course.
#define DEFINE_INSN_JCC(mnemonic, byte_opcode)                     \
uint8_t *mnemonic(int32_t bytes, uint8_t *buf) {                   \
        if (INT8_MIN <= bytes && bytes <= INT8_MAX) {              \
                *buf++ = byte_opcode;                              \
                *buf++ = (int8_t)bytes;                            \
        } else {                                                   \
                *buf++ = 0x0F;                                     \
                *buf++ = byte_opcode + 0x10;                       \
                *((int32_t *)buf) = bytes; buf += sizeof(int32_t); \
        }                                                          \
        return buf;                                                \
}

DEFINE_INSN_JCC( jb, 0x72)
DEFINE_INSN_JCC(jae, 0x73)
DEFINE_INSN_JCC( je, 0x74)
DEFINE_INSN_JCC(jne, 0x75)
DEFINE_INSN_JCC(jbe, 0x76)
DEFINE_INSN_JCC( ja, 0x77)
DEFINE_INSN_JCC( jl, 0x7C)
DEFINE_INSN_JCC(jge, 0x7D)
DEFINE_INSN_JCC(jle, 0x7E)
DEFINE_INSN_JCC( jg, 0x7F)

5. What's next?

   So, we have a working x86 assembler. Not bad for 256 lines of code. You
   can download the complete source code below.

   In the next few posts, we'll:
     * show how to test this assembler (are you sure it actually works?).
     * show how to find the encodings of other instructions (in case you
       want to extend this assembler).
     * show how to actually execute the generated machine code.

   At some point in the future - maybe not right away - I'd like to
     * show how the Builder design pattern can make the assembler easier
       to use.
     * build an x64 assembler (since you're probably not running a 32-bit
       machine).

   But there are plenty of other non-assembler-related topics I'd like to
   blog about, so let's see what actually materializes.

Download the source code

   Source Code:    [14]x86asm.h      69 lines
                   [15]x86asm.c      171 lines
                   [16]demo.c        16 lines
                                     Total: 256 lines
   Makefiles:      [17]GNUmakefile   (GNU Make on Linux/macOS)
                   [18]Makefile      (NMAKE on Windows)

   ^1 If you're familiar with the x86 encoding scheme, [EBP] is actually
   encoded as [EBP+0] (i.e., EBP with an 8-bit displacement), and ESP is
   encoded using the SIB byte.

   Published on 15 Jan 2017 o 4019 words o Comments? [19]E-mail me!

   Copyright � 2017 Jeffrey L. Overbey. All rights reserved. Except for
   source code where an explicit license is given, no part of this blog
   may be copied, reproduced, published, translated, or distributed, in
   whole or in part, without the written permission of the copyright
   owner.

References

   1. http://blog.jeff.over.bz/rss.xml
   2. http://jeff.over.bz/
   3. http://blog.jeff.over.bz/refactoring/golang/godoctor/2018/04/22/building-a-godoctor-refactoring.html
   4. http://blog.jeff.over.bz/compilers/lexers/2017/09/09/lexical-analysis.html
   5. http://blog.jeff.over.bz/performance/statistics/2017/06/01/on-performance-improvements.html
   6. http://blog.jeff.over.bz/assembly/compilers/jit/2017/03/30/executing-dynamically-generated-machine-code.html
   7. http://blog.jeff.over.bz/assembly/2017/02/15/finding-machine-language-encodings.html
   8. http://blog.jeff.over.bz/assembly/compilers/jit/2017/01/15/x86-assembler.html
   9. http://blog.jeff.over.bz/blog/2016/12/15/first-post.html
  10. http://blog.jeff.over.bz/rss.xml
  11. http://jeff.over.bz/
  12. https://software.intel.com/en-us/articles/intel-sdm
  13. https://linux.die.net/man/3/div
  14. http://blog.jeff.over.bz/_posts/code/x86-assembler/x86asm.h
  15. http://blog.jeff.over.bz/_posts/code/x86-assembler/x86asm.c
  16. http://blog.jeff.over.bz/_posts/code/x86-assembler/demo.c
  17. http://blog.jeff.over.bz/_posts/code/x86-assembler/GNUmakefile
  18. http://blog.jeff.over.bz/_posts/code/x86-assembler/Makefile
  19. http://www.google.com/recaptcha/mailhide/d?k=01Y7tF2jw9w9xLZucF314wPA==&c=q6vUJyh7NztFNzwTojRuECzwRgJ8lixJT3LvIi58VCM=