summaryrefslogtreecommitdiff
path: root/miniany/doc/C Compiler, Part 10_ Global Variables.html
blob: 6c00b8ef3f3d48e34a8e82e3d1d650e6aea9d4df (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
<!DOCTYPE html>
<!-- saved from url=(0059)https://norasandler.com/2019/02/18/Write-a-Compiler-10.html -->
<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <title>C Compiler, Part 10: Global Variables</title>
  <meta name="description" content="This is the tenth post in a series. Read part 1 here.">

  <link rel="stylesheet" href="./C Compiler, Part 10_ Global Variables_files/main.css">
  <link rel="canonical" href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html">
  <link rel="alternate" type="application/rss+xml" title="Nora Sandler" href="https://norasandler.com/feed.xml">
  
</head>


  <body>

    <header class="site-header" role="banner">

  <div class="wrapper">
    
    
    <a class="site-title" href="https://norasandler.com/">Nora Sandler</a>
  
    
      <nav class="site-nav">
        <input type="checkbox" id="nav-trigger" class="nav-trigger">
        <label for="nav-trigger">
          <span class="menu-icon">
            <svg viewBox="0 0 18 15" width="18px" height="15px">
              <path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"></path>
              <path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"></path>
              <path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"></path>
            </svg>
          </span>
        </label>

        <div class="trigger">
          
            
            
              
            
          
            
            
              
              <a class="page-link" href="https://norasandler.com/about/">About</a>
              
            
          
            
            
              
              <a class="page-link" href="https://norasandler.com/archive/">Archive</a>
              
            
          
            
            
          
            
            
              
            
          
            
            
              
            
          
            
            
              
            
          
          <a class="page-link" href="https://github.com/nlsandler">Github</a>
          <a href="https://norasandler.com/feed.xml"><img id="rss" height="20" width="20" src="./C Compiler, Part 10_ Global Variables_files/rss.png"></a>

        </div>
      </nav>
    
  </div>
</header>


    <main class="page-content" aria-label="Content">
      <div class="wrapper">
        <article class="post h-entry" itemscope="" itemtype="http://schema.org/BlogPosting">

  <header class="post-header">
    <h1 class="post-title p-name" itemprop="name headline">C Compiler, Part 10: Global Variables</h1>
    <p class="post-meta">
      <time class="dt-published" datetime="2019-02-18T17:00:00+00:00" itemprop="datePublished">Feb 18, 2019
      </time></p>
  </header>

  <div class="post-content e-content" itemprop="articleBody">
    <p><em>This is the tenth post in a series. Read part 1 <a href="https://norasandler.com/2017/11/29/Write-a-Compiler.html">here</a>.</em></p>

<p>We’re back! I said I was going to do a non-compiler post next, but that turned out to be a lie. Instead, we’re going to implement global variables. This isn’t too complicated, but it lets us learn about some new sections of object files and program memory.</p>

<p>As always, tests are <a href="https://github.com/nlsandler/write_a_c_compiler">here</a>.</p>

<p><strong>Note for macOS Users:</strong> since the last post, Apple started phasing out support for 32-bit programs on macOS. What that means for us is that if you’re using the default C compiler on macOS Mojave, you’ll get an error if you try to compile for a 32-bit backend<sup id="anchor1"><a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#fn1">1</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-m32</span> example.c
ld: warning: The i386 architecture is deprecated <span class="k">for </span>macOS <span class="o">(</span>remove from the Xcode build setting: ARCHS<span class="o">)</span>
ld: warning: ignoring file /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/lib/libSystem.tbd, missing required architecture i386 <span class="k">in </span>file /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/lib/libSystem.tbd
ld: dynamic main executables must <span class="nb">link </span>with libSystem.dylib <span class="k">for </span>architecture i386
clang: error: linker <span class="nb">command </span>failed with <span class="nb">exit </span>code 1 <span class="o">(</span>use <span class="nt">-v</span> to see invocation<span class="o">)</span>
ld: warning: The i386 architecture is deprecated <span class="k">for </span>macOS <span class="o">(</span>remove from the Xcode build setting: ARCHS<span class="o">)</span>
</code></pre></div></div>

<p>But never fear! The Homebrew version of GCC works just fine, although it still emits a warning:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc-8 <span class="nt">-m32</span> static.c
ld: warning: The i386 architecture is deprecated <span class="k">for </span>macOS <span class="o">(</span>remove from the Xcode build setting: ARCHS<span class="o">)</span>
</code></pre></div></div>

<p>I’m pretty sure there’s a way to get the default compiler to build 32-bit programs as well but I don’t know what it is.</p>

<p>When you run a 32-bit program (like the ones produced by <em>your</em> compiler), you might also get a warning that it isn’t optimized for your computer. This is also due to Apple’s efforts to phase out 32-bit programs, but you don’t need to do anything about it.</p>

<p>The bigger issue, of course, is that the next version of macOS won’t run 32-bit programs at all. I plan to update all my posts before that happens to cover 64-bit compilation too. And yes, I do regret targeting a 32-bit architecture to begin with, thank you for asking. Luckily, apart from calling conventions all the differences so far are pretty minor.</p>

<p>With that out of the way, let’s move on to…</p>

<h1 id="part-10-global-variables">Part 10: Global Variables</h1>

<p>We can already handle local variables declared inside functions. Now we’ll add support for global variables, which any function can access.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">fun1</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">foo</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">fun2</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">fun1</span><span class="p">();</span>
    <span class="k">return</span> <span class="n">fun2</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note that global variables can be shadowed by local variables of the same name:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span> <span class="c1">//shadows global 'foo'</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">;</span> <span class="c1">// returns 4</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Global variables are similar to functions in that they can be declared many times, but defined (i.e. initialized) only once:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span><span class="p">;</span> <span class="c1">// declaration</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">;</span> <span class="c1">// returns 3</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span> <span class="c1">// definition</span>
</code></pre></div></div>

<p>And, like functions, global variables must be declared (but not necessarily defined) before they’re used:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">;</span> <span class="c1">// ERROR: not declared!</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="n">foo</span><span class="p">;</span>
</code></pre></div></div>

<p>Declaring a function and a global variable with the same name is an error:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">foo</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="mi">3</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span> <span class="c1">// ERROR</span>
</code></pre></div></div>

<p>Unlike local variables, global variables don’t need to be explicitly initialized. If a local variable isn’t initialized, its value is undefined, but if a global variable isn’t initialized its value is 0.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">foo</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">;</span> <span class="c1">// This could be literally anything</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">foo</span><span class="p">;</span> <span class="c1">// This will definitely be 0</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Note that we’re using the terms “declaration” and “definition” the same way we did for functions. This is a global variable declaration<sup id="anchor2"><a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#fn2">2</a></sup>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span><span class="p">;</span>
</code></pre></div></div>

<p>This is both a declaration and a definition:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">static</code> and <code class="language-plaintext highlighter-rouge">extern</code> keywords would add some extra complications, but we won’t support those yet.</p>

<p>Now let’s move on to…</p>

<h2 id="lexing">Lexing</h2>
<p>No new tokens this week, so we don’t have to touch the lexer.</p>

<h2 id="parsing">Parsing</h2>
<p>Previously, a program was a list of function declarations. Now it’s a list of top-level declarations, each of which is either a function declaration or a variable declaration.</p>

<p>So our top-level AST definitions now look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>toplevel_item = Function(function_declaration)
              | Variable(declaration)
toplevel = Program(toplevel_item list)              
</code></pre></div></div>

<p>And we need a corresponding change to the top-level grammar rule:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;program&gt; ::= { &lt;function&gt; | &lt;declaration&gt; }
</code></pre></div></div>

<h4 id="-task">☑ Task:</h4>
<p>Update the parsing pass to support global variables. The parsing stage should now succeed on all valid examples in stages 1-10.</p>

<h2 id="code-generation">Code Generation</h2>
<p>Global variables need to live somewhere in memory. They can’t live on the stack, because they need to be accessible from every stack frame. Instead, they live in a different chunk of memory, the data section. We’ve already seen what a running program’s stack looks like; now let’s step back and see how all of its memory is laid out<sup id="anchor3"><a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#fn3">3</a></sup>:</p>

<p><img class="small" style="width: 20%;" alt="Diagram of program memory layout. The stack starts at a high address and grows down into free space. The heap starts at a lower address and grows up into the same region of free space. Below the heap, from top to bottom, are Initialized Data, Uninitialized Data (BSS) and Text." src="./C Compiler, Part 10_ Global Variables_files/program_memory_layout.png"></p>

<p>The x86 instructions we’ve been dealing with so far all live in the text section. Our global variables will live in the data section, which we can further subdivide into initialized and uninitialized data—the uninitialized data section is usually called BSS<sup id="anchor4"><a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#fn4">4</a></sup>.</p>

<p>So far we’ve only generated assembly for the text section, which contains actual program instructions; let’s see what the assembly to describe a variable in the data section looks like:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nf">.globl</span> <span class="nv">_my_var</span> <span class="c1">; make this symbol visible to the linker</span>
    <span class="nf">.data</span>          <span class="c1">; what's next describes the data section    </span>
    <span class="nf">.align</span> <span class="mi">2</span>       <span class="c1">; this data should aligned on 4-byte intervals</span>
<span class="nl">_my_var:</span>
    <span class="nf">.long</span> <span class="mi">1337</span>     <span class="c1">; allocate a long integer with value 1337</span>
</code></pre></div></div>

<p>A couple things to note here:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">.data</code> directive tells the assembler we’re in the data section. We’ll also need a <code class="language-plaintext highlighter-rouge">.text</code> directive to indicate when we switch back to the text section.</li>
  <li>A label like <code class="language-plaintext highlighter-rouge">_my_var</code> labels a memory address. The assembler and linker don’t care whether that address refers to an instruction in the text section or a variable in the data section; they’re going to treat it the same way.</li>
  <li>On macOS, <code class="language-plaintext highlighter-rouge">.align n</code> means “align the next thing to a multiple of 2<sup>n</sup> bytes”. So <code class="language-plaintext highlighter-rouge">.align 2</code> means we’re using a 4-byte alignment. On Linux, <code class="language-plaintext highlighter-rouge">.align n</code> means “align the next thing to a multiple of n bytes”, so you’d want <code class="language-plaintext highlighter-rouge">.align 4</code> to get the same result.</li>
</ul>

<p>Once you’ve allocated a variable, you can refer to its label directly in assembly:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nf">movl</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="nv">_my_var</span> <span class="c1">; move the value in %eax to the memory address of _my_var</span>
</code></pre></div></div>

<p>So the basic gist here is:</p>

<ol>
  <li>
    <p>When you encounter a <em>declaration</em> for a global variable, add it to the variable map. The variable map entry will be its label instead of a stack index:</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> var_map = var_map.put("my_var", "_my_var")
</code></pre></div>    </div>

    <p>Note that this new variable map entry must be visible when we generate later top-level items; this isn’t true of entries we add while processing function definitions.</p>
  </li>
  <li>When you encounter a <em>definition</em> for a global variable, with an initializer, emit assembly to allocate it in the data section. Then emit a <code class="language-plaintext highlighter-rouge">.text</code> directive before you go back to generating function definitions.</li>
  <li>When you encounter a <em>reference</em> to a variable, handle it the same way you did before. If its entry in the variable map is a label instead of a stack index, of course, you should use it directly instead of as an offset from <code class="language-plaintext highlighter-rouge">%ebp</code>. If it doesn’t have an entry, that’s an error.</li>
</ol>

<p>But there are a few wrinkles.</p>

<h3 id="uninitialized-variables">Uninitialized Variables</h3>
<p>If, by the end of the program, we have any variables left that have been declared but not defined, we need to declare them in a special section for uninitialized data. On Linux, all uninitialized data lives in the BSS section, which also includes any variables initialized to 0. On macOS it’s a little more complicated: uninitialized static variables go in BSS, and uninitialized global variables go in the common section, which indicates to the linker that they may be initialized in a different object file. We don’t support static variables yet, so on macOS we don’t need to store anything in BSS. Of course, we also don’t have any tests with multiple source files, so if you just use BSS instead of common, effectively making all global variables static, the tests will still pass.</p>

<p>The data section consists of the actual values of our data; we can load it directly into memory and use it as-is. The BSS and common sections, on the other hand, don’t contain all of our uninitialized values, because they would just be big blocks of zeros. Storing a big block of zeros on disk would be a waste of space. Instead, we just store the size of BSS and common in our binary, and allocate that much memory for them when we load the program. So keeping initialized and uninitialized variables separate is just a trick to reduce the size of binaries.</p>

<p>On macOS, we can allocate space in the common section using the <code class="language-plaintext highlighter-rouge">.comm</code> directive:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nf">.text</span>
    <span class="nf">.comm</span> <span class="nv">_my_var</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">2</span> <span class="c1">; allocate 4 bytes for symbol _my_var, with 4-byte alignment</span>
</code></pre></div></div>

<p>Allocating space in BSS, on the other hand, looks almost exactly the same as allocating a non-zero variable, but we’ll use <code class="language-plaintext highlighter-rouge">.zero 4</code> to allocate 4 bytes of zeros instead of <code class="language-plaintext highlighter-rouge">.long n</code> to allocate a long integer with value <code class="language-plaintext highlighter-rouge">n</code>:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nf">.globl</span> <span class="nv">_my_var</span> <span class="c1">; make this symbol visible to the linker</span>
    <span class="nf">.bss</span>           <span class="c1">; what's next describes the BSS section    </span>
    <span class="nf">.align</span> <span class="mi">4</span>       <span class="c1">; this data should aligned on 4-byte intervals (Linux align directive)</span>
<span class="nl">_my_var:</span>
    <span class="nf">.zero</span> <span class="mi">4</span>        <span class="c1">; allocate 4 bytes of zeros</span>
</code></pre></div></div>

<p>Note that in assembly, unlike in C, it’s perfectly fine to reference a label like <code class="language-plaintext highlighter-rouge">_my_var</code> before that label is defined. That’s why we can wait until the end of the program to allocate any uninitialized variables.</p>

<h3 id="non-constant-initializers">Non-Constant Initializers</h3>
<p>Global variables are loaded into memory before the program starts, which means we can’t execute any instructions to calculate their initial values. Therefore their initializers need to be constants. For example, this isn’t valid:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">bar</span> <span class="o">=</span> <span class="n">foo</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// NOT A CONSTANT!</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">bar</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Most compilers permit global variables to be initialized with constant expressions, like:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">foo</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">3</span> <span class="o">*</span> <span class="mi">5</span><span class="p">;</span>
</code></pre></div></div>

<p>This requires you to compute <code class="language-plaintext highlighter-rouge">2 + 3 * 5</code> at compile time. You can support this if you want, but you don’t have to; the test suite doesn’t check for it.</p>

<h3 id="validation">Validation</h3>

<p>To recap, here’s what we need to validate:</p>
<ul>
  <li>Variables, including global variables, are declared before they are defined.</li>
  <li>No global variable is defined more than once.</li>
  <li>No global variable is initialized with a non-constant value.</li>
  <li>No symbol is declared as both a function and a variable.</li>
</ul>

<p>It’s easy to validate the first bullet point during code generation; we’re doing that for local variables anyway. The remaining points can be validated either during code generation, or in a separate validation pass. I’d recommend handling them wherever you validate function definitions and calls.</p>

<h4 id="-task-1">☑ Task:</h4>
<p>Update the code generation pass (and your validation pass, if you have one) to fail with an error for all invalid stage 10 examples, and succeed on all valid stage 10 examples.</p>

<h2 id="pie-">PIE 🥧</h2>

<p>If you compile a program with global variables using a real compiler, the assembly will look quite different from what we described above. You may also notice, if you’re on macOS, that the linker will warn you about the assembly your compiler produces:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./my_compiler global.c
ld: warning: The i386 architecture is deprecated <span class="k">for </span>macOS <span class="o">(</span>remove from the Xcode build setting: ARCHS<span class="o">)</span>
ld: warning: PIE disabled. Absolute addressing <span class="o">(</span>perhaps <span class="nt">-mdynamic-no-pic</span><span class="o">)</span> not allowed <span class="k">in </span>code signed PIE, but used <span class="k">in </span>_main from /var/folders/9t/p20tf0zs4ql425tdktwnfjkm0000gn/T//cczcZcyQ.o. To fix this warning, don<span class="s1">'t compile with -mdynamic-no-pic or link with -Wl,-no_pie
</span></code></pre></div></div>

<p>PIE stands for “position-independent executable”, which means an executable consisting entirely of position-independent code. This section briefly explains what position-independent code is and why you might need it, but doesn’t explain how to implement it. Feel free to skip it if you’re not interested.</p>

<p>Position-independent code is code that can run no matter where it’s loaded in memory, because it never refers to absolute memory addresses. The code our compiler produces is not position-independent, because it has instructions like:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nf">movl</span> <span class="kc">$</span><span class="mi">3</span><span class="p">,</span> <span class="nv">_my_var</span>
</code></pre></div></div>

<p>In order for this instruction to run, the linker needs to replace <code class="language-plaintext highlighter-rouge">_my_var</code> with an absolute memory address. This works if we know the absolute address of the data and BSS sections in advance.</p>

<p>Position-independent code, on the other hand, never refers to the address of symbols like <code class="language-plaintext highlighter-rouge">_my_var</code> directly; instead, those addresses are calculated relative to the current instruction pointer. In case I didn’t have enough of a reason to regret targeting a 32-bit architecture, position-independent assembly is much simpler with a 64-bit instruction set:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">movl</span> <span class="kc">$</span><span class="mi">3</span><span class="p">,</span> <span class="nv">_my_var</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">)</span> <span class="c1">; use _my_var as offset from instruction pointer</span>
</code></pre></div></div>

<p>To get the same result with a 32-bit architecture you need something like this:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="nf">call</span>    <span class="nv">___x86.get_pc_thunk.ax</span>
<span class="nl">L1$pb:</span>
    <span class="nf">leal</span>    <span class="nv">_my_var</span><span class="o">-</span><span class="nv">L1$pb</span><span class="p">(</span><span class="o">%</span><span class="nb">eax</span><span class="p">),</span> <span class="o">%</span><span class="nb">eax</span>
    <span class="nf">movl</span>    <span class="p">(</span><span class="o">%</span><span class="nb">eax</span><span class="p">),</span> <span class="o">%</span><span class="nb">eax</span>
</code></pre></div></div>
<p>I won’t walk through exactly what this code is doing; if you’re curious, <a href="https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/">this article</a> gives a good overview of position-independent code for x86.</p>

<p>There are two reasons you might want to generate position-independent code:</p>

<ol>
  <li>
    <p>You’re compiling a shared library. Maybe this is a really widely used library, like libc. Maybe all or most processes on a system will want a copy of this library. It seems like a waste to have a separate copy for every process, eating up all your RAM. Instead, we can load the library into physical memory just once, then map it into the virtual memory of every process that needs it. But we can’t guarantee a library the same starting address in every process that loads it. So sharing one library between several processes only works if the library works no matter what memory address it’s at—which is to say, it needs to be position-independent. However, we’re compiling an executable, not a library, so this doesn’t apply to us.</p>
  </li>
  <li>
    <p>You have address space layout randomization (ASLR) enabled. ASLR is a security feature that makes some memory corruption attacks harder to carry out. Many of these attacks involve forcing program execution to jump to the instructions an attacker would like to execute. With ASLR enabled, memory segments are loaded at random locations<sup id="anchor5"><a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#fn5">5</a></sup>, which makes it harder for attackers to figure out what address to jump to. Code needs to be position independent in order to run correctly when loaded to a random memory address. Since Apple really wants all macOS applications to support ASLR<sup id="anchor6"><a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#fn6">6</a></sup>, the linker will try to build a position-independent executable by default, and complain if it can’t.</p>
  </li>
</ol>

<p>The fact that your compiler can’t generate position-independent executables is just one of many, many reasons you shouldn’t use it to build real software. I don’t have that much faith in these blog posts, and neither should you!</p>

<p>If you want to learn more about ASLR, I found <a href="http://security.cs.rpi.edu/courses/binexp-spring2015/lectures/15/09_lecture.pdf">these slides</a> helpful. Of course, there’s also <a href="https://en.wikipedia.org/wiki/Address_space_layout_randomization">Wikipedia</a>.</p>

<h2 id="up-next">Up Next</h2>

<p>So far, I’ve been implementing a compiler and writing posts as I go. This system worked really well for a while, but now it’s starting to work less well; I realized that some decisions I made in earlier stages made this stage harder to complete, so I had to go back and change them. I think I’m likely to run into more problems like that in later posts. So I’m going to take a break, finish building the compiler (whatever I decide “finished” means), and then come back and write the rest of this series. I probably won’t post another update for six months. So basically…I’m going to keep posting at about the same rate I have been.</p>

<p>When I come back, I’ll have a plan for what to cover in the rest of the series. See you then!</p>

<p><em>If you have any questions, corrections, or other feedback, you can <a href="mailto:nora@norasandler.com">email me</a> or <a href="https://github.com/nlsandler/write_a_c_compiler/issues">open an issue</a>.</em></p>

<div class="footnote">
  <p><sup id="fn1">1</sup>
The compiler that ships with the XCode Command Line Tools—the one that was giving me this error—is actually <em>not</em> GCC. It’s <a href="https://en.wikipedia.org/wiki/Clang">Clang</a>, another open-source compiler that’s developed mostly by Apple. XCode installs Clang at <code class="language-plaintext highlighter-rouge">/usr/bin/gcc</code>, no doubt for very sound and legitimate reasons, although I don’t know what they are.
<a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#anchor1">↩</a></p>
</div>

<div class="footnote">
  <p><sup id="fn2">2</sup>
The standard actually considers this a <em>tentative definition</em> (section 6.9.2):</p>

  <blockquote>
    <p>A declaration of an identifier for an object that has file scope without an initializer, and without a
storage-class specifier or with the storage-class specifier static, constitutes a tentative definition.</p>
  </blockquote>

  <p>Basically, if we can’t find a real definition anywhere else in the file, we can treat a declaration like a definition with an initial value of 0. We’re still going to call it a declaration, though.
<a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#anchor2">↩</a></p>
</div>

<div class="footnote">
  <p><sup id="fn3">3</sup>
<a href="https://commons.wikimedia.org/wiki/File:Typical_computer_data_memory_arrangement.png">Typical computer data memory arrangement</a> by Majenko is licensed under <a href="https://creativecommons.org/licenses/by-sa/4.0/deed.en">CC BY-SA 4.0</a>.</p>

  <p>This diagram is an oversimplification; it doesn’t show every memory segment we might find in a running program. Also, sometimes memory segments are laid out in a different order—we’ll talk about that later. The point is that we have a dedicated chunk of memory for global variables.<a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#anchor3">↩</a></p>
</div>

<div class="footnote">
  <p><sup id="fn4">4</sup>
BSS stands for “Block Started by Symbol,” which is a relic of an assembler written in the 1950s(!). You can read more <a href="https://en.wikipedia.org/wiki/.bss#Origin">here</a> if you want to go down a bit of a Wikipedia rabbit hole.<a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#anchor4">↩</a></p>
</div>

<div class="footnote">
  <p><sup id="fn5">5</sup>
Exactly which memory segments are randomized, and how random their base addresses actually are, varies between systems. <a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#anchor5">↩</a></p>
</div>

<div class="footnote">
  <p><sup id="fn6">6</sup>
<a href="https://developer.apple.com/library/archive/qa/qa1788/_index.html">Source</a>. <a href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html#anchor6">↩</a></p>
</div>

  </div><a class="u-url" href="https://norasandler.com/2019/02/18/Write-a-Compiler-10.html" hidden=""></a>
</article>

      </div>
    </main>

    <footer class="site-footer">

  <div class="wrapper">
      <div class="footer-col-wrapper">
        <div class="footer-col footer-col-1">
            <div class="rc-scout" data-scout-rendered="true"><p class="rc-scout__text"><i class="rc-scout__logo"></i> Want to become a better programmer? <a class="rc-scout__link" href="https://www.recurse.com/scout/click?t=8f520efbc4be09fb83a71920f53a07b7">Join the Recurse Center!</a></p></div><script async="" defer="" src="./C Compiler, Part 10_ Global Variables_files/loader.js"></script>
        </div>
      </div>
    <div class="footer-col-wrapper">
      <div class="footer-col footer-col-1">
        © 2023 Nora Sandler.
      </div>
    </div>
  </div>

</footer>


  


<script async="" src="./C Compiler, Part 10_ Global Variables_files/scout-176be16681a03cbddd686f3cc96694d3aab338ea5cb65452f83d989309810528.js"></script><style class="rc-scout__style" type="text/css">.rc-scout {
  display: block;
  padding: 0;
  border: 0;
  margin: 0;
}
.rc-scout__text {
  display: block;
  padding: 0;
  border: 0;
  margin: 0;
  height: 100%;
  font-size: 100%;
}
.rc-scout__logo {
  display: inline-block;
  padding: 0;
  border: 0;
  margin: 0;
  width: 0.85em;
  height: 0.85em;
  background: no-repeat center url('data:image/svg+xml;utf8,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%2012%2015%22%3E%3Crect%20x%3D%220%22%20y%3D%220%22%20width%3D%2212%22%20height%3D%2210%22%20fill%3D%22%23000%22%3E%3C%2Frect%3E%3Crect%20x%3D%221%22%20y%3D%221%22%20width%3D%2210%22%20height%3D%228%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%222%22%20y%3D%222%22%20width%3D%228%22%20height%3D%226%22%20fill%3D%22%23000%22%3E%3C%2Frect%3E%3Crect%20x%3D%222%22%20y%3D%223%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%233dc06c%22%3E%3C%2Frect%3E%3Crect%20x%3D%224%22%20y%3D%223%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%233dc06c%22%3E%3C%2Frect%3E%3Crect%20x%3D%226%22%20y%3D%223%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%233dc06c%22%3E%3C%2Frect%3E%3Crect%20x%3D%223%22%20y%3D%225%22%20width%3D%222%22%20height%3D%221%22%20fill%3D%22%233dc06c%22%3E%3C%2Frect%3E%3Crect%20x%3D%226%22%20y%3D%225%22%20width%3D%222%22%20height%3D%221%22%20fill%3D%22%233dc06c%22%3E%3C%2Frect%3E%3Crect%20x%3D%224%22%20y%3D%229%22%20width%3D%224%22%20height%3D%223%22%20fill%3D%22%23000%22%3E%3C%2Frect%3E%3Crect%20x%3D%221%22%20y%3D%2211%22%20width%3D%2210%22%20height%3D%224%22%20fill%3D%22%23000%22%3E%3C%2Frect%3E%3Crect%20x%3D%220%22%20y%3D%2212%22%20width%3D%2212%22%20height%3D%223%22%20fill%3D%22%23000%22%3E%3C%2Frect%3E%3Crect%20x%3D%222%22%20y%3D%2213%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%223%22%20y%3D%2212%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%224%22%20y%3D%2213%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%225%22%20y%3D%2212%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%226%22%20y%3D%2213%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%227%22%20y%3D%2212%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%228%22%20y%3D%2213%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3Crect%20x%3D%229%22%20y%3D%2212%22%20width%3D%221%22%20height%3D%221%22%20fill%3D%22%23fff%22%3E%3C%2Frect%3E%3C%2Fsvg%3E');
}
.rc-scout__link:link, .rc-scout__link:visited {
  color: #3dc06c;
  text-decoration: underline;
}
.rc-scout__link:hover, .rc-scout__link:active {
  color: #4e8b1d;
}
</style></body></html>