uflbbl - USTAR Floppy Linux BIOS Boot Loader -------------------------------------------- For old BIOS based booting, loading a set of floppies which contain the kernel, the ramdisk, etc. in USTAR format. It can only load Linux kernels and ramdisks currently. And though it might be able to boot a AMD64 kernel this is not the primary focus. It is there to boot on old IA-32 based machines which have an original floppy drive. The filenames recognized are: - 'bzImage': the Linux kernel - 'ramdisk.img': the initial ramdisk - 'EOF': an empty file indicating the end of the tar file Customization of boot parameters must be done in source currently in 'KERNEL_CMD_LINE' in 'boot.asm'. You can also change the greeting mesage in varialble 'MESSAGE_GREETING' in 'boot.asm' to your likeing. An example boot sequence looks as follows: Booting from Floppy... UFLBB loading... Checking A20 address gate.. + enabled Switching to unreadl mode.. enabled Boot parameters 0x00 0x04 0x02 0x13 bzImage 00004727760 0013AFF0! Number of real-mode kernel sectors: 1D Number of protected-mode kernel sectors: 09BA Linux boot protocol version: 02.0F Linux kernel version: 6.2.10 (user@machine) #1 Mon Apr 10 11:33:20 CET 2023 ramdisk.img 00012114554 0028996C! Insert next floppy and press any key to continue.. Insert next floppy and press any key to continue.. Insert next floppy and press any key to continue.. EOF 00000000000 00000000 Reached end of tar file.. Ramdisk address: 008000000 Ramdisk size: 0028996C Booting kernel.. early console in setup code early console in extract_kernel input_data: 0x01328079 input_len: 0x00132e1c output: 0x01000000 output_len: 0x0032ce60 kernel_total_size: 0x00472000 needed_size: 0x00472000 Decompressing Linux... ... requirements ------------ nasm how to build a set of bootable floppies --------------------------------------- Create a 'bzImage' kernel and an initial ramdisk 'ramdisk.img'. Assemble 'boot.asm', put it to start of 'floppy.img', tar the kernel, the 'ramdisk.img' and the 'EOF' file into a 'data.tar' file, concatenate the boot loader file 'boot.img' and the 'data.tar' file, then split into floppy size (assuming you have 3 1/4" 1.44MB floppies). nasm -o boot.img boot.asm touch EOF tar -cvf data.tar -b1 bzImage ramdisk.img EOF cat boot.img data.tar > floppy.img ./lstar floppy.img split -d -b 1474560 floppy.img floppy dd if=floppy00 of=/dev/fd0 bs=512 dd if=floppy01 of=/dev/fd0 bs=512 .. Boot the floppies in order, insert next floppy if asked by the boot loader. 'lstar' is a small convenience program to list the entries in the tar (of course also 'tar xvf' works for this). gcc -lbsd -o lstar src/lstar.c ./lstar floppy.img testing ------- tests/run_qemu.sh when asked to change the floppy change to the Qemu console and type 'change floppy0 floppy01' (same for all other floppies). floppy format ------------- 512 bytes MBR stage 1 simple boot loader and magic boot string loads stage 2 directly following stage 1, also assumes stage 2 fits on one track of the floppy, so we don't need a complicated loading method probing tracks per sector 1024 bytes stage 2 boot loader, interprets tar format one sector after stage 2 and reads files into memory (vmlinuz, ramdisk.img) N tar file format (no compression, we expect the files to be well compressed). 2 blocks .ustar format, file names are easy accessible (vmlinuz, ramdisk.img). We sacrifice 512 bytes for easier reading in multiple disks (for instance a kernel disk, an initial ramdisk, a driver disk for SCSI, a root file system, etc.), we could even do multi-floppy kernels, so we can read the kernel distributed on more than one floppy. ustar/tar format ---------------- offset length description example byte 0 100 filename in ascii, zero-term string "bzImage" byte 0x7c (124) 12 length in octal, zero-term string "00004014360" byte 0x94 (148) 8 checksum in octal, zero-term string "012757" with an ending space for some reason sum the header bytes with the checksum bytes as spaces (0x20) byte 0x101 (257) 6 UStar indicator, zero-term string "ustar" one of the easiest ways to detect a tar header sector ramdisk ------- find . | cpio -H newc -o -R root:root | xz --check=crc32 > ../ramdisk.img memory layout ------------- 0x07c00 - 0x08fff boot loader 0x09000 - 0x091ff floppy read buffer 0x0e000 - 0x09200 stack of real mode kernel 0x10000 - 0x101ff Linux zero page (first part) 0x10200 - 0x103ff zero page (part two), real mode entry point at 0x10200 0x10400 - xxx continue code of real mode kernel 0x1e000 - 0xe0ff cmd line for kernel 0x100000 - xxx protected mode kernel code (at 1 MB) 0x800000 - xxx ram disk (at 8 MB) state machine ------------- tar state machine: reading metadata, reading data, we know whether we are in the kernel, ramdisk, etc. kernel substates: - sector 1: read number of real mode sectors - sector 2: read and check params, set params - sector >2: always read and copy data from floppy to destination area error codes ----------- error codes consist of a error class (DISK, KERN) and a code ERR DISK 0x01 stage 1 read error while reading stage 2 ERR DISK 0x02 stage 1 short read error (we didn't read as many stage 2 sectors as expected) ERR DISK 0x03 reading and interpreting tar state machine error ERR DISK 0xXX other read errors (BIOS int 0x13 codes), stage 2 ERR A20 0x01 A20 address line not enabled ERR KERN 0x01 kernel read state machine error ERR KERN 0x02 kernel signature 'HdrS' not found ERR KERN 0x03 kernel boot protocol too old ERR KERN 0x04 kernel cannot be started (or better, we return from the real mode jump) Linux IA-32 boot sequence ------------------------- - load Kernel boot sector at 0x10000 (first 512 bytes) - read 0x10000+0x1f1 number of sectors => minimal 4 sectors (if 0 is in 1f1), number of setup sectors - read 0x10200 (second part of the zero page) - compare 0x10202 to linux header 'HdrS', must be equal - compare 0x10206 to linux boot protocol version, don't allow anything below 0x215 (the newest one) for now - set various zero page data - test for KASLR enabled 0x10211 has bit 1 set? (this we might not want to do for old i486 kernels and systems) - set 0xFF for non-registered boot loader in 0x10210 - set 0x80 in loadflags 0x10211 - CAN_USE_HEAP (bit 7) - LOADED_HIGH? where do we load protected mode code? - set head_end_ptr 0x10224 to 0xde00 ; heap_end = 0xe000 ; heap_end_ptr = heap_end - 0x200 = 0xde00 mov word [es:0x224], 0xde00 ;head_end_ptr "Set this field to the offset (from the beginning of the real-mode code) of the end of the setup stack/heap, minus 0x0200." - set 0x10228 to 0x1e000 set to mov dword [es:0x228], 0x1e000 ;cmd line ptr mov dword [es:0x228], 0x1e000 ; set cmd_line_ptr also copy your command line to 0x1e000, for now from the boot loader data segment (initialized data) area. At offset 0x0020 (word), “cmd_line_magic”, enter the magic number 0xA33F. At offset 0x0022 (word), “cmd_line_offset”, enter the offset of the kernel command line (relative to the start of the real-mode kernel). The kernel command line must be within the memory region covered by setup_move_size, so you may need to adjust this field. - read to 0x10400 N-1 sectors (as much as we calculated above) as the real mode kernel part - 0x1001f4 is the 16-byte paragraphs of 32-bit code for protected mode kernel to load -> transform to 512 byte sectors to read - eventually get the prefered loading location for the kernel - load the protected part to 0x100000 by loading it to low memory and copy it to high memory in unreal mode - print kernel version number, 020E, offset, but we must load the complete kernel first - at end of kernel PM code read check if we have the same size as the tar entry - run_kernel (real mode) cli mov ax, 0x1000 mov ds, ax mov es, ax mov fs, ax mov gs, ax mov ss, ax mov sp, 0xe000 jmp 0x1020:0 - eventually get the prefered loading location for the ramdisk or highest possible location (should make the kernel happy), but then we have to know a little bit about the memory layout and size of the machine.. - read ram image - read octal size in tar metadata of ramdisk, convert do decimal - set address and size in kernel zero page - 0x218/4 ramdisk image address - 0x21c/4 ramdisk image size Bochs commands -------------- # have a look at the boot.map file for the address of a symbol # set breakpoint b 0x7F93 # dump memory in floppy read buffer x /30b 0x0008800 # dump real mode kernel code/data x /30b 0x0010000 interrupts ---------- Relevant interrupts as documented in http://www.cs.cmu.edu/~ralf/files.html: --------B-1302------------------------------- INT 13 - DISK - READ SECTOR(S) INTO MEMORY AH = 02h AL = number of sectors to read (must be nonzero) CH = low eight bits of cylinder number CL = sector number 1-63 (bits 0-5) high two bits of cylinder (bits 6-7, hard disk only) DH = head number DL = drive number (bit 7 set for hard disk) ES:BX -> data buffer Return: CF set on error if AH = 11h (corrected ECC error), AL = burst length CF clear if successful AH = status (see #00234) AL = number of sectors transferred (only valid if CF set for some BIOSes) Notes: errors on a floppy may be due to the motor failing to spin up quickly enough; the read should be retried at least three times, resetting the disk with AH=00h between attempts most BIOSes support "multitrack" reads, where the value in AL exceeds the number of sectors remaining on the track, in which case any additional sectors are read beginning at sector 1 on the following head in the same cylinder; the MSDOS CONFIG.SYS command MULTITRACK (or the Novell DOS DEBLOCK=) can be used to force DOS to split disk accesses which would wrap across a track boundary into two separate calls the IBM AT BIOS and many other BIOSes use only the low four bits of DH (head number) since the WD-1003 controller which is the standard AT controller (and the controller that IDE emulates) only supports 16 heads AWARD AT BIOS and AMI 386sx BIOS have been extended to handle more than 1024 cylinders by placing bits 10 and 11 of the cylinder number into bits 6 and 7 of DH under Windows95, a volume must be locked (see INT 21/AX=440Dh/CX=084Bh) in order to perform direct accesses such as INT 13h reads and writes all versions of MS-DOS (including MS-DOS 7 [Windows 95]) have a bug which prevents booting on hard disks with 256 heads (FFh), so many modern BIOSes provide mappings with at most 255 (FEh) heads some cache drivers flush their buffers when detecting that DOS is bypassed by directly issuing INT 13h from applications. A dummy read can be used as one of several methods to force cache flushing for unknown caches (e.g. before rebooting). BUGS: When reading from floppies, some AMI BIOSes (around 1990-1991) trash the byte following the data buffer, if it is not arranged to an even memory boundary. A workaround is to either make the buffer word aligned (which may also help to speed up things), or to add a dummy byte after the buffer. MS-DOS may leave interrupts disabled on return from this function. Apparently some BIOSes or intercepting resident software have bugs that may destroy DX on return or not properly set the Carry flag. At least some Microsoft software frames calls to this function with PUSH DX, STC, INT 13h, STI, POP DX. on the original IBM AT BIOS (1984/01/10) this function does not disable interrupts for harddisks (DL >= 80h). On these machines the MS-DOS/ PC DOS IO.SYS/IBMBIO.COM installs a special filter to bypass the buggy code in the ROM (see CALL F000h:211Eh) SeeAlso: AH=03h,AH=0Ah,AH=06h"V10DISK.SYS",AH=21h"PS/1",AH=42h"IBM" SeeAlso: INT 21/AX=440Dh/CX=084Bh,INT 4D/AH=02h --------B-1300------------------------------- INT 13 - DISK - RESET DISK SYSTEM AH = 00h DL = drive (if bit 7 is set both hard disks and floppy disks reset) Return: AH = status (see #00234) CF clear if successful (returned AH=00h) CF set on error Note: forces controller to recalibrate drive heads (seek to track 0) for PS/2 35SX, 35LS, 40SX and L40SX, as well as many other systems, both the master drive and the slave drive respond to the Reset function that is issued to either drive SeeAlso: AH=0Dh,AH=11h,INT 21/AH=0Dh,INT 4D/AH=00h"TI Professional" SeeAlso: INT 56"Tandy 2000",MEM 0040h:003Eh --------B-1308------------------------------- INT 13 - DISK - GET DRIVE PARAMETERS (PC,XT286,CONV,PS,ESDI,SCSI) AH = 08h DL = drive (bit 7 set for hard disk) ES:DI = 0000h:0000h to guard against BIOS bugs Return: CF set on error AH = status (07h) (see #00234) CF clear if successful AH = 00h AL = 00h on at least some BIOSes BL = drive type (AT/PS2 floppies only) (see #00242) CH = low eight bits of maximum cylinder number CL = maximum sector number (bits 5-0) high two bits of maximum cylinder number (bits 7-6) DH = maximum head number DL = number of drives ES:DI -> drive parameter table (floppies only) Notes: may return successful even though specified drive is greater than the number of attached drives of that type (floppy/hard); check DL to ensure validity for systems predating the IBM AT, this call is only valid for hard disks, as it is implemented by the hard disk BIOS rather than the ROM BIOS the IBM ROM-BIOS returns the total number of hard disks attached to the system regardless of whether DL >= 80h on entry. Toshiba laptops with HardRAM return DL=02h when called with DL=80h, but fail on DL=81h. The BIOS data at 40h:75h correctly reports 01h. may indicate only two drives present even if more are attached; to ensure a correct count, one can use AH=15h to scan through possible drives Reportedly some Compaq BIOSes with more than one hard disk controller return only the number of drives DL attached to the corresponding controller as specified by the DL value on entry. However, on Compaq machines with "COMPAQ" signature at F000h:FFEAh, MS-DOS/PC DOS IO.SYS/IBMBIO.COM call INT 15/AX=E400h and INT 15/AX=E480h to enable Compaq "mode 2" before retrieving the count of hard disks installed in the system (DL) from this function. the maximum cylinder number reported in CX is usually two less than the total cylinder count reported in the fixed disk parameter table (see INT 41h,INT 46h) because early hard disks used the last cylinder for testing purposes; however, on some Zenith machines, the maximum cylinder number reportedly is three less than the count in the fixed disk parameter table. for BIOSes which reserve the last cylinder for testing purposes, the cylinder count is automatically decremented on PS/1s with IBM ROM DOS 4, nonexistent drives return CF clear, BX=CX=0000h, and ES:DI = 0000h:0000h machines with lost CMOS memory may return invalid data for floppy drives. In this situation CF is cleared, but AX,BX,CX,DX,DH,DI, and ES contain only 0. At least under some circumstances, MS-DOS/ PC DOS IO.SYS/IBMBIO.COM just assumes a 360 KB floppy if it sees CH to be zero for a floppy. the PC-Tools PCFORMAT program requires that AL=00h before it will proceed with the formatting if this function fails, an alternative way to retrieve the number of floppy drives installed in the system is to call INT 11h. In fact, the MS-DOS/PC-DOS IO.SYS/IBMBIO.COM attempts to get the number of floppy drives installed from INT 13/AH=08h, when INT 11h AX bit 0 indicates there are no floppy drives installed. In addition to testing the CF flag, it only trusts the result when the number of sectors (CL preset to zero) is non-zero after the call. BUGS: several different Compaq BIOSes incorrectly report high-numbered drives (such as 90h, B0h, D0h, and F0h) as present, giving them the same geometry as drive 80h; as a workaround, scan through disk numbers, stopping as soon as the number of valid drives encountered equals the value in 0040h:0075h a bug in Leading Edge 8088 BIOS 3.10 causes the DI,SI,BP,DS, and ES registers to be destroyed some Toshiba BIOSes (at least before 1995, maybe some laptops??? with 1.44 MB floppies) have a bug where they do not set the ES:DI vector even for floppy drives. Hence these registers should be preset with zero before the call and checked to be non-zero on return before using them. Also it seems these BIOSes can return wrong info in BL and CX, as S/DOS 1.0 can be configured to preset these registers as for an 1.44 MB floppy. the PS/2 Model 30 fails to reset the bus after INT 13/AH=08h and INT 13/AH=15h. A workaround is to monitor for these functions and perform a transparent INT 13/AH=01h status read afterwards. This will reset the bus. The MS-DOS 6.0 IO.SYS takes care of this by installing a special INT 13h interceptor for this purpose. AD-DOS may leave interrupts disabled on return from this function. Some Microsoft software explicitly sets STI after return. SeeAlso: AH=06h"Adaptec",AH=13h"SyQuest",AH=48h,AH=15h,INT 1E SeeAlso: INT 41"HARD DISK 0" (Table 00242) Values for diskette drive type: 01h 360K 02h 1.2M 03h 720K 04h 1.44M 05h ??? (reportedly an obscure drive type shipped on some IBM machines) 2.88M on some machines (at least AMI 486 BIOS) 06h 2.88M 10h ATAPI Removable Media Device --------d-1308------------------------------- INT 13 - V10DISK.SYS - SET FORMAT AH = 08h AL = number of sectors CH = cylinder number (bits 8,9 in high bits of CL) CL = sector number DH = head DL = drive Return: AH = status code (see #00234) Program: V10DISK.SYS is a driver for the Flagstaff Engineering 8" floppies Note: details not available SeeAlso: AH=03h,AH=06h"V10DISK.SYS" references ---------- - kernel boot up in all it's details, really nice documentation: - https://www.kernel.org/doc/html/latest/x86/boot.html - https://www.kernel.org/doc/html/latest/x86/zero-page.html - https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-1.html - https://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-2.html - debug kernel with bochs - https://bochs.sourceforge.io/doc/docbook/user/debugging-with-gdb.html - https://www.kernel.org/doc/html/v4.12/dev-tools/gdb-kernel-debugging.html - https://www.cs.princeton.edu/courses/archive/fall09/cos318/precepts/bochs_gdb.html - interrupt list and BIOS documentation - http://www.cs.cmu.edu/~ralf/files.html - https://members.tripod.com/vitaly_filatov/ng/asm/ - unreal mode - https://wiki.osdev.org/Unreal_Mode - http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/ - Linux boot protocol - https://docs.kernel.org/x86/boot.html - https://www.spinics.net/lists/linux-integrity/msg14580.html: version string - get available memory - http://www.uruk.org/orig-grub/mem64mb.html - https://wiki.osdev.org/Detecting_Memory_(x86) - create ramdisk.img: https://people.freedesktop.org/~narmstrong/meson_drm_doc/admin-guide/initrd.html - tar format - https://wiki.osdev.org/USTAR - https://en.wikipedia.org/wiki/Tar_(computing)#UStar_format - https://github.com/calccrypto/tar - https://github.com/Papierkorb/tarfs - other minimal bootloader projects - https://github.com/wikkyk/mlb - https://github.com/owenson/tiny-linux-bootloader and https://github.com/guineawheek/tiny-floppy-bootloader - http://dc0d32.blogspot.com/2010/06/real-mode-in-c-with-gcc-writing.html (Small C and 16-bit code, leads to a quite big boot loader, in the end we didn't use C but Unreal mode 16/32-bittish assembly) - https://wiki.syslinux.org/wiki/index.php?title=The_Syslinux_Project - Lilo (but the code is hard to read and looks quite chaotic) - Linux 1.x old boot floppy code - alternative implementations of a real mode boot loader - http://dc0d32.blogspot.com/2010/06/real-mode-in-c-with-gcc-writing.html