summaryrefslogtreecommitdiff
path: root/README
blob: c5824c764bdaab3602b3eb86d509d08645e1da6b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
uflbbl - USTAR Floppy Linux BIOS Boot Loader
--------------------------------------------

For old BIOS based booting, loading a set of floppies which contain
the kernel, the ramdisk, etc. in USTAR format. It can only load
Linux kernels and ramdisks currently. And though it might be able
to boot a AMD64 kernel this is not the primary focus. It is there
to boot on old IA-32 based machines which have an original floppy
drive.

The filenames recognized are:
- 'bzImage': the Linux kernel
- 'ramdisk.img': the initial ramdisk
- 'EOF': an empty file indicating the end of the tar file

Customization of boot parameters must be done in source currently
in 'KERNEL_CMD_LINE' in 'boot.asm'.

You can also change the greeting mesage in varialble 'MESSAGE_GREETING'
in 'boot.asm' to your likeing.

An example boot sequence looks as follows:

Booting from Floppy...
UFLBB loading...
Checking A20 address gate.. + enabled
Switching to unreadl mode.. enabled
Boot parameters 0x00 0x04 0x02 0x13
 bzImage 00004727760 0013AFF0!
Number of real-mode kernel sectors: 1D
Number of protected-mode kernel sectors: 09BA
Linux boot protocol version: 02.0F
Linux kernel version: 6.2.10 (user@machine) #1 Mon Apr 10 11:33:20 CET 2023
 ramdisk.img 00012114554 0028996C!
Insert next floppy and press any key to continue..
Insert next floppy and press any key to continue..
Insert next floppy and press any key to continue..
 EOF 00000000000 00000000
Reached end of tar file..
Ramdisk address: 008000000
Ramdisk size: 0028996C
Booting kernel..
early console in setup code
early console in extract_kernel
input_data: 0x01328079
input_len: 0x00132e1c
output: 0x01000000
output_len: 0x0032ce60
kernel_total_size: 0x00472000
needed_size: 0x00472000
Decompressing Linux...
...

requirements
------------

nasm

how to build a set of bootable floppies
---------------------------------------

Create a 'bzImage' kernel and an initial ramdisk 'ramdisk.img'.

Assemble 'boot.asm', put it to start of 'floppy.img', tar the kernel,
the 'ramdisk.img' and the 'EOF' file into a 'data.tar' file, concatenate
the boot loader file 'boot.img' and the 'data.tar' file, then
split into floppy size (assuming you have 3 1/4" 1.44MB floppies).

nasm -o boot.img boot.asm
touch EOF
tar -cvf data.tar -b1 bzImage ramdisk.img EOF
cat boot.img data.tar > floppy.img
./lstar floppy.img
split -d -b 1474560 floppy.img floppy
dd if=floppy00 of=/dev/fd0 bs=512
dd if=floppy01 of=/dev/fd0 bs=512
..

Boot the floppies in order, insert next floppy if asked by the boot
loader.

'lstar' is a small convenience program to list the entries in the tar
(of course also 'tar xvf' works for this).

gcc -lbsd -o lstar src/lstar.c
./lstar floppy.img

testing
-------

tests/run_qemu.sh 

when asked to change the floppy change to the Qemu console and 
type 'change floppy0 floppy01' (same for all other floppies).

floppy format
-------------

512 bytes	MBR stage 1 simple boot loader and magic boot string
		loads stage 2 directly following stage 1, also assumes
		stage 2 fits on one track of the floppy, so we don't
		need a complicated loading method probing tracks per sector
1024 bytes	stage 2 boot loader, interprets tar format one sector after
		stage 2 and reads files into memory (vmlinuz, ramdisk.img)
N		tar file format (no compression, we expect the files to
		be well compressed). 2 blocks .ustar format, file names
		are easy accessible (vmlinuz, ramdisk.img). We sacrifice 512
		bytes for easier reading in multiple disks (for instance
		a kernel disk, an initial ramdisk, a driver disk for
		SCSI, a root file system, etc.), we could even do multi-floppy
		kernels, so we can read the kernel distributed on more than
		one floppy.

ustar/tar format
----------------

offset			length		description				example
byte 0			100		filename in ascii, zero-term string	"bzImage"
byte 0x7c (124)		12		length in octal, zero-term string	"00004014360"
byte 0x94 (148)		8		checksum in octal, zero-term string	"012757"
					with an ending space for some reason
					sum the header bytes with the checksum
					bytes as spaces (0x20)
byte 0x101 (257)	6		UStar indicator, zero-term string	"ustar"
					one of the easiest ways to detect
					a tar header sector

ramdisk
-------

find . | cpio -H newc -o -R root:root | xz --check=crc32 > ../ramdisk.img

memory layout
-------------

0x07c00 - 0x08fff	boot loader
0x09000 - 0x091ff	floppy read buffer
0x0e000	- 0x09200	stack of real mode kernel
0x10000 - 0x101ff	Linux zero page (first part)
0x10200 - 0x103ff	zero page (part two), real mode entry point at 0x10200
0x10400 - xxx		continue code of real mode kernel
0x1e000 - 0xe0ff	cmd line for kernel
0x100000 - xxx		protected mode kernel code (at 1 MB)
0x800000 - xxx		ram disk (at 8 MB)
		
state machine
-------------

tar state machine: reading metadata, reading data, we know
whether we are in the kernel, ramdisk, etc.
kernel substates:
- sector 1: read number of real mode sectors
- sector 2: read and check params, set params
- sector >2: always read and copy data from floppy to destination area

error codes
-----------

error codes consist of a error class (DISK, KERN) and a code

ERR DISK 0x01	stage 1 read error while reading stage 2
ERR DISK 0x02	stage 1 short read error (we didn't read as many stage 2
		sectors as expected)
ERR DISK 0x03	reading and interpreting tar state machine error
ERR DISK 0xXX	other read errors (BIOS int 0x13 codes), stage 2
ERR A20  0x01	A20 address line not enabled
ERR KERN 0x01	kernel read state machine error
ERR KERN 0x02	kernel signature 'HdrS' not found
ERR KERN 0x03	kernel boot protocol too old
ERR KERN 0x04	kernel cannot be started (or better, we return from the
		real mode jump)

Linux IA-32 boot sequence
-------------------------

- load Kernel boot sector at 0x10000 (first 512 bytes)
- read 0x10000+0x1f1 number of sectors
  => minimal 4 sectors (if 0 is in 1f1), number of setup sectors
- read 0x10200 (second part of the zero page)
- compare 0x10202 to linux header 'HdrS', must be equal
- compare 0x10206 to linux boot protocol version, don't allow anything
  below 0x215 (the newest one) for now
- set various zero page data
  - test for KASLR enabled
    0x10211 has bit 1 set?
    (this we might not want to do for old i486 kernels and systems)
  - set 0xFF for non-registered boot loader in 0x10210
  - set 0x80 in loadflags 0x10211
    - CAN_USE_HEAP (bit 7)
    - LOADED_HIGH? where do we load protected mode code?
  - set head_end_ptr 0x10224 to 0xde00
    ; heap_end = 0xe000
    ; heap_end_ptr = heap_end - 0x200 = 0xde00
      mov word [es:0x224], 0xde00 ;head_end_ptr
    "Set this field to the offset (from the beginning of the real-mode
     code) of the end of the setup stack/heap, minus 0x0200."
  - set 0x10228 to 0x1e000
     set to mov dword [es:0x228], 0x1e000 ;cmd line ptr
     mov	dword [es:0x228], 0x1e000	; set cmd_line_ptr
     also copy your command line to 0x1e000, for now from the boot loader
     data segment (initialized data) area.
     At offset 0x0020 (word), “cmd_line_magic”, enter the magic number 0xA33F.
     At offset 0x0022 (word), “cmd_line_offset”, enter the offset of the kernel command line (relative to the start of the real-mode kernel).
     The kernel command line must be within the memory region covered by setup_move_size, so you may need to adjust this field.
- read to 0x10400 N-1 sectors (as much as we calculated above) as the
  real mode kernel part
- 0x1001f4 is the 16-byte paragraphs of 32-bit code for protected mode
  kernel to load -> transform to 512 byte sectors to read
- eventually get the prefered loading location for the kernel
- load the protected part to 0x100000 by loading it to low memory and
  copy it to high memory in unreal mode
- print kernel version number, 020E, offset, but we must load the complete
  kernel first
- at end of kernel PM code read check if we have the same size as the tar
  entry
- run_kernel (real mode)
	cli
	mov	ax, 0x1000
	mov	ds, ax
	mov	es, ax
	mov	fs, ax
	mov	gs, ax
	mov	ss, ax
	mov	sp, 0xe000
	jmp	0x1020:0
- eventually get the prefered loading location for the ramdisk
  or highest possible location (should make the kernel happy), but
  then we have to know a little bit about the memory layout and size of
  the machine..
- read ram image
  - read octal size in tar metadata of ramdisk, convert do decimal
  - set address and size in kernel zero page
    - 0x218/4 ramdisk image address
    - 0x21c/4 ramdisk image size

Bochs commands
--------------

# have a look at the boot.map file for the address of a symbol
# set breakpoint
b 0x7F93

# dump memory in floppy read buffer
x /30b 0x0008800

# dump real mode kernel code/data
x /30b 0x0010000

interrupts
----------

Relevant interrupts as documented in http://www.cs.cmu.edu/~ralf/files.html:

--------B-1302-------------------------------
INT 13 - DISK - READ SECTOR(S) INTO MEMORY
	AH = 02h
	AL = number of sectors to read (must be nonzero)
	CH = low eight bits of cylinder number
	CL = sector number 1-63 (bits 0-5)
	     high two bits of cylinder (bits 6-7, hard disk only)
	DH = head number
	DL = drive number (bit 7 set for hard disk)
	ES:BX -> data buffer
Return: CF set on error
	    if AH = 11h (corrected ECC error), AL = burst length
	CF clear if successful
	AH = status (see #00234)
	AL = number of sectors transferred (only valid if CF set for some
	      BIOSes)
Notes:	errors on a floppy may be due to the motor failing to spin up quickly
	  enough; the read should be retried at least three times, resetting
	  the disk with AH=00h between attempts
	most BIOSes support "multitrack" reads, where the value in AL
	  exceeds the number of sectors remaining on the track, in which
	  case any additional sectors are read beginning at sector 1 on
	  the following head in the same cylinder; the MSDOS CONFIG.SYS command
	  MULTITRACK (or the Novell DOS DEBLOCK=) can be used to force DOS to
	  split disk accesses which would wrap across a track boundary into two
	  separate calls
	the IBM AT BIOS and many other BIOSes use only the low four bits of
	  DH (head number) since the WD-1003 controller which is the standard
	  AT controller (and the controller that IDE emulates) only supports
	  16 heads
	AWARD AT BIOS and AMI 386sx BIOS have been extended to handle more
	  than 1024 cylinders by placing bits 10 and 11 of the cylinder number
	  into bits 6 and 7 of DH
	under Windows95, a volume must be locked (see INT 21/AX=440Dh/CX=084Bh)
	  in order to perform direct accesses such as INT 13h reads and writes
	all versions of MS-DOS (including MS-DOS 7 [Windows 95]) have a bug
	  which prevents booting on hard disks with 256 heads (FFh), so many
	  modern BIOSes provide mappings with at most 255 (FEh) heads
	some cache drivers flush their buffers when detecting that DOS is
	  bypassed by directly issuing INT 13h from applications.  A dummy
	  read can be used as one of several methods to force cache
	  flushing for unknown caches (e.g. before rebooting).
BUGS:	When reading from floppies, some AMI BIOSes (around 1990-1991) trash
	  the byte following the data buffer, if it is not arranged to an even
	  memory boundary.  A workaround is to either make the buffer word
	  aligned (which may also help to speed up things), or to add a dummy
	  byte after the buffer.
	MS-DOS may leave interrupts disabled on return from this function.
	Apparently some BIOSes or intercepting resident software have bugs
	  that may destroy DX on return or not properly set the Carry flag.
	  At least some Microsoft software frames calls to this function with
	  PUSH DX, STC, INT 13h, STI, POP DX.
	on the original IBM AT BIOS (1984/01/10) this function does not disable
	  interrupts for harddisks (DL >= 80h).	 On these machines the MS-DOS/
	  PC DOS IO.SYS/IBMBIO.COM installs a special filter to bypass the
	  buggy code in the ROM (see CALL F000h:211Eh)
SeeAlso: AH=03h,AH=0Ah,AH=06h"V10DISK.SYS",AH=21h"PS/1",AH=42h"IBM"
SeeAlso: INT 21/AX=440Dh/CX=084Bh,INT 4D/AH=02h

--------B-1300-------------------------------
INT 13 - DISK - RESET DISK SYSTEM
	AH = 00h
	DL = drive (if bit 7 is set both hard disks and floppy disks reset)
Return: AH = status (see #00234)
	CF clear if successful (returned AH=00h)
	CF set on error
Note:	forces controller to recalibrate drive heads (seek to track 0)
	for PS/2 35SX, 35LS, 40SX and L40SX, as well as many other systems,
	  both the master drive and the slave drive respond to the Reset
	  function that is issued to either drive
SeeAlso: AH=0Dh,AH=11h,INT 21/AH=0Dh,INT 4D/AH=00h"TI Professional"
SeeAlso: INT 56"Tandy 2000",MEM 0040h:003Eh

--------B-1308-------------------------------
INT 13 - DISK - GET DRIVE PARAMETERS (PC,XT286,CONV,PS,ESDI,SCSI)
	AH = 08h
	DL = drive (bit 7 set for hard disk)
	ES:DI = 0000h:0000h to guard against BIOS bugs
Return: CF set on error
	    AH = status (07h) (see #00234)
	CF clear if successful
	    AH = 00h
	    AL = 00h on at least some BIOSes
	    BL = drive type (AT/PS2 floppies only) (see #00242)
	    CH = low eight bits of maximum cylinder number
	    CL = maximum sector number (bits 5-0)
		 high two bits of maximum cylinder number (bits 7-6)
	    DH = maximum head number
	    DL = number of drives
	    ES:DI -> drive parameter table (floppies only)
Notes:	may return successful even though specified drive is greater than the
	  number of attached drives of that type (floppy/hard); check DL to
	  ensure validity
	for systems predating the IBM AT, this call is only valid for hard
	  disks, as it is implemented by the hard disk BIOS rather than the
	  ROM BIOS
	the IBM ROM-BIOS returns the total number of hard disks attached
	  to the system regardless of whether DL >= 80h on entry.
	Toshiba laptops with HardRAM return DL=02h when called with DL=80h,
	  but fail on DL=81h.  The BIOS data at 40h:75h correctly reports 01h.
	may indicate only two drives present even if more are attached; to
	  ensure a correct count, one can use AH=15h to scan through possible
	  drives
	Reportedly some Compaq BIOSes with more than one hard disk controller
	  return only the number of drives DL attached to the corresponding
	  controller as specified by the DL value on entry.  However, on
	  Compaq machines with "COMPAQ" signature at F000h:FFEAh,
	  MS-DOS/PC DOS IO.SYS/IBMBIO.COM call INT 15/AX=E400h and 
	  INT 15/AX=E480h to enable Compaq "mode 2" before retrieving the count
	  of hard disks installed in the system (DL) from this function.
	the maximum cylinder number reported in CX is usually two less than
	  the total cylinder count reported in the fixed disk parameter table
	  (see INT 41h,INT 46h) because early hard disks used the last cylinder
	  for testing purposes; however, on some Zenith machines, the maximum
	  cylinder number reportedly is three less than the count in the fixed
	  disk parameter table.
	for BIOSes which reserve the last cylinder for testing purposes, the
	  cylinder count is automatically decremented
	on PS/1s with IBM ROM DOS 4, nonexistent drives return CF clear,
	  BX=CX=0000h, and ES:DI = 0000h:0000h
	machines with lost CMOS memory may return invalid data for floppy
	  drives. In this situation CF is cleared, but AX,BX,CX,DX,DH,DI,
	  and ES contain only 0.  At least under some circumstances, MS-DOS/
	  PC DOS IO.SYS/IBMBIO.COM just assumes a 360 KB floppy if it sees
	  CH to be zero for a floppy.
	the PC-Tools PCFORMAT program requires that AL=00h before it will
	  proceed with the formatting
	if this function fails, an alternative way to retrieve the number
	  of floppy drives installed in the system is to call INT 11h.
	In fact, the MS-DOS/PC-DOS IO.SYS/IBMBIO.COM attempts to get the
	  number of floppy drives installed from INT 13/AH=08h, when INT 11h
	  AX bit 0 indicates there are no floppy drives installed. In addition
	  to testing the CF flag, it only trusts the result when the number of
	  sectors (CL preset to zero) is non-zero after the call.
BUGS:	several different Compaq BIOSes incorrectly report high-numbered
	  drives (such as 90h, B0h, D0h, and F0h) as present, giving them the
	  same geometry as drive 80h; as a workaround, scan through disk
	  numbers, stopping as soon as the number of valid drives encountered
	  equals the value in 0040h:0075h
	a bug in Leading Edge 8088 BIOS 3.10 causes the DI,SI,BP,DS, and ES
	  registers to be destroyed
	some Toshiba BIOSes (at least before 1995, maybe some laptops???
	  with 1.44 MB floppies) have a bug where they do not set the ES:DI
	  vector even for floppy drives. Hence these registers should be
	  preset with zero before the call and checked to be non-zero on
	  return before using them.  Also it seems these BIOSes can return
	  wrong info in BL and CX, as S/DOS 1.0 can be configured to preset
	  these registers as for an 1.44 MB floppy.
	the PS/2 Model 30 fails to reset the bus after INT 13/AH=08h and
	  INT 13/AH=15h. A workaround is to monitor for these functions
	  and perform a transparent INT 13/AH=01h status read afterwards.
	  This will reset the bus. The MS-DOS 6.0 IO.SYS takes care of
	  this by installing a special INT 13h interceptor for this purpose.
	AD-DOS may leave interrupts disabled on return from this function.
	Some Microsoft software explicitly sets STI after return.
SeeAlso: AH=06h"Adaptec",AH=13h"SyQuest",AH=48h,AH=15h,INT 1E
SeeAlso: INT 41"HARD DISK 0"

(Table 00242)
Values for diskette drive type:
 01h	360K
 02h	1.2M
 03h	720K
 04h	1.44M
 05h	??? (reportedly an obscure drive type shipped on some IBM machines)
	2.88M on some machines (at least AMI 486 BIOS)
 06h	2.88M
 10h	ATAPI Removable Media Device
--------d-1308-------------------------------
INT 13 - V10DISK.SYS - SET FORMAT
	AH = 08h
	AL = number of sectors
	CH = cylinder number (bits 8,9 in high bits of CL)
	CL = sector number
	DH = head
	DL = drive
Return: AH = status code (see #00234)
Program: V10DISK.SYS is a driver for the Flagstaff Engineering 8" floppies
Note:	details not available
SeeAlso: AH=03h,AH=06h"V10DISK.SYS"

references
----------

- kernel boot up in all it's details, really nice documentation:
  - https://www.kernel.org/doc/html/latest/x86/boot.html
  - https://www.kernel.org/doc/html/latest/x86/zero-page.html
  - https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html
  - https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-2.html
- debug kernel with bochs
  - https://bochs.sourceforge.io/doc/docbook/user/debugging-with-gdb.html
  - https://www.kernel.org/doc/html/v4.12/dev-tools/gdb-kernel-debugging.html
  - https://www.cs.princeton.edu/courses/archive/fall09/cos318/precepts/bochs_gdb.html
- interrupt list and BIOS documentation
  - http://www.cs.cmu.edu/~ralf/files.html
  - https://members.tripod.com/vitaly_filatov/ng/asm/
- unreal mode
  - https://wiki.osdev.org/Unreal_Mode
  - http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/
- Linux boot protocol
  - https://docs.kernel.org/x86/boot.html
  - https://www.spinics.net/lists/linux-integrity/msg14580.html: version string
- get available memory
  - http://www.uruk.org/orig-grub/mem64mb.html
  - https://wiki.osdev.org/Detecting_Memory_(x86)
- create ramdisk.img:
  https://people.freedesktop.org/~narmstrong/meson_drm_doc/admin-guide/initrd.html
- tar format
  - https://wiki.osdev.org/USTAR
  - https://en.wikipedia.org/wiki/Tar_(computing)#UStar_format
  - https://github.com/calccrypto/tar
  - https://github.com/Papierkorb/tarfs
- other minimal bootloader projects
  - https://github.com/wikkyk/mlb
  - https://github.com/owenson/tiny-linux-bootloader and
    https://github.com/guineawheek/tiny-floppy-bootloader
  - http://dc0d32.blogspot.com/2010/06/real-mode-in-c-with-gcc-writing.html (Small C and 16-bit code,
    leads to a quite big boot loader, in the end we didn't use C but Unreal mode 16/32-bittish assembly)
  - https://wiki.syslinux.org/wiki/index.php?title=The_Syslinux_Project
  - Lilo (but the code is hard to read and looks quite chaotic)
  - Linux 1.x old boot floppy code
- alternative implementations of a real mode boot loader
  - http://dc0d32.blogspot.com/2010/06/real-mode-in-c-with-gcc-writing.html