From 3a467eff5e435b5709d48f9d6cb48859925be5b8 Mon Sep 17 00:00:00 2001 From: root Date: Sat, 21 Mar 2009 11:49:00 +0100 Subject: checked in initial version --- doc/tutorial.txt | 390 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 390 insertions(+) create mode 100644 doc/tutorial.txt (limited to 'doc/tutorial.txt') diff --git a/doc/tutorial.txt b/doc/tutorial.txt new file mode 100644 index 0000000..68a7cad --- /dev/null +++ b/doc/tutorial.txt @@ -0,0 +1,390 @@ +Cross Compiling Linux - 2 hour tutorial + +This is a practical introduction to cross compiling, during which we'll build +a working cross-compiler, use it to cross-compile a native uClibc-based Linux +development environment, and boot this new environment under QEMU. + +Attendees may choose arm, mips, x86, x86_64, sparc, or PPC as the platform +they wish to build for. The author's Firmware Linux project (which already +does all this) will be used as an example. Attendees should bring a reasonably +fast laptop with net access and at least 256 megs of ram. + +General outline: + +1) Terminology: cross compiling, native compiling, host/target, toolchain, etc. +2) Why cross compiling is hard, and why we need to do it anyway. +3) Building a cross compiler toolchain from linux, binutils, gcc, and uClibc. +4) Making a native build environment (adding make, busybox, and bash). +5) Packaging disk images, booting, and running under QEMU. +6) Optimizations and alternatives. + (distcc, armulator, boards/bootloaders, nfs, tsrpm) +7) Where to from here? (LFS, gentoo, etc.) + + +----------------------------------------------------------------------------- +Links: + http://www.landley.net/writing/docs/cross-compiling.html + http://www.landley.net/code/firmware/about.html + http://www.landley.net/code/firmware/design.html + http://cross-lfs.org/files/BOOK/1.0.0/ + http://www.gentoo.org/proj/en/base/embedded/index.xml + http://gentoo-wiki.com/Embedded_Gentoo + +http://www.quietearth.us/articles/2006/08/16/Building-deb-package-from-source + +http://qemu-forum.ipi.fi/qemu-snapshots/ +git://git.kernel.dk/data/git/qemu.git +----------------------------------------------------------------------------- + +Today's agenda: + + - learn about cross compiling + - build a working cross-compiler + - use it to cross-compile a native uClibc-based Linux development environment + - boot this new environment under QEMU. + +Platforms: + - What platforms does Linux support? + - To get the full list: cd include; echo asm-* | sed 's/asm-//g' + - alpha arm arm26 avr32 cris frv generic h8300 i386 ia64 m32r + m68k m68knommu mips parisc powerpc ppc s390 sh sh64 sparc sparc64 um + v850 x86_64 xtensa + - Not quite architectures: generic=shared code, um=User Mode Linux + - What dominates the big iron space? + - Top 500 supercomputers list: http://top500.org/stats/28/procfam/ + x86-64 (44%), x86 (24%), and PPC (18%). + - Note: s390 important but not general purpose + - What dominates the embedded space? + - The big four: arm, mips, i386/x86_64, ppc. + - important but not general purpose: + - sh (super-hitachi) used in japan, especially in auto industry. + - coldfire (m68knommu): used in a small number of high volume devices. + - blackfin: up-and-coming, not merged yet. Employ interesting people. + - in decline (used to be more important) but still in use: + alpha, ia64, sparc, parisc. + + - price, power consumption, performance, features + - power consumption == heat, high end and low end converge due to this. + - features could be software or integrated peripherals. + + - close up on arm + - best power/performance ratio, owns over 80% of cell phone market. + - entire arm core only 43,000 transistors, 34 instructions. + - armv3 vs arm 7, architecture vs processor. We focus on architecture. + - armv3 introduced 32-bit, but obsolete. armv4 now low-end. + - most systems now being manufactured armv5 or up. + - newer runs older instructions. armv4->armv5 25% speedup for recompile. + - arm26 obsolete 26 bit addressing mode, like x86's 16-bit mode. + - can be LE or BE. Linux supports both, but only one at a time. + Chip doesn't care but motherboard might. + + - close up on mips + - customizable: sold as library, FPGA version, fab your own, used in SOC. + - there is now a 64 bit version. Probably for bragging rights. + - reasonable power consumption, reasonable performance. + - can be LE or BE. + + - close up on ppc + - can be LE or BE, long ago software selectable now almost universally BE. + - Apple/IBM/Motorola. Apple switched, Motorola spun off freescale. + - power.org attempt to stir up third party interest, some success. + - Game consoles give it serious volume, but Apple TV is x86-64. + - Models: + - Everything can run 7xx code except 4xx (ibm) and 8xx (motorola). + - 74xx is "G4", 970 is "G5" and is 64-bit. + - High power consumption, high performance + - Cell would have been very interesting if it had shipped in 2005, but + it's too late to matter outside gaming consoles now. Too much power + for embedded space, and big iron assembly programming specialized. + + - close up on x86/x86-64 + - Intel and AMD have stopped making non-embedded 32-bit processors. + - Once existing inventory sold, it's 64-bit only from here on out. + - A number of smaller players like via. + - Best price/performance, often best general purpose absolute performance + - Historically terrible power/performance, hence the fan. + - Recently paying attention to that, but a ways to go. On x86 a + fanless heat sink is a victory, whereas most arm runs cool to the touch. + +Compiling software for different platforms: + - Native compiling on different platforms: + - endianness, word size, alignment, sign of char, optimizations + - nommu is its own can of worms: stack, malloc, vfork, mmap + - x86 is the common case, but this changes to x86-64 soon. + - Neither intel nor amd making x86 outside of embedded space anymore. + Once current inventory sold, it's all 64-bit from here. + - cross compiling + - host and target are two contexts, most programs used to one. + - For most programs there is only one context: the target. Worrying about + host is the compiler's job, not yours. + - The compiler tells program what target it's building for with #defines + (__i386__, __arm__, __mips__) etc. + - To see all predefined symbols: gcc -dM -E - < /dev/null + - The headers specify endianness, #include + - confusing the two contexts: + - May have to build and run programs on host, ala menuconfig or unifdef. + - Need a host compiler for that, HOSTCC. + - Two compilers, keep track of which to use where + - ./configure asks questions about the machine it's building on to + determine what kind of program to build. Assumes host==target. + Fundamentally wrong for cross compiling, at the design level. + - two compilers (host and target), they get each other's files mixed up. + - #includes paths, library paths, gcc calls wrong ld... + - prefix the names, but this isn't complete ("collect2"). + - gcc falls back to "default" search path when it can't find something. + - gcc doesn't know where to find the files it installed. + + - Everybody cares about native compiling, nobody cares about cross compiling. + - Most projects don't care about cross compiling, and never will. + - gentoo over 4000 packages, gentoo embedded cross compiles maybe 300. + - Cross compiling complicates build system while restricting options, + this infrastructure is a source of bugs even when it's not used, and + only a tiny minority of the userbase will ever want it. + - Everybody cares about native compiling. + - If they don't, they'll probably take patches. + - simple to fix, non-intrusive, generally considered a good thing. + - Most developers don't know there's more to it than "build on arm". + + - Building natively on real hardware can be problematic + - 200 mhz, 32 megs of ram, only access through serial port + - We have one piece of real hardware and five developers. + + - But there are emulators, and desktops are cheap and powerful these days. + - Throwing hardware at the problem cheaper than developers. + - State of the art is QEMU, which is GPL. + + - A certain amount of cross compiling can't be avoided. + - where do you get your development environment from? + - bootstrap new platform. + - recent ubuntu desktop CD for little-endian mips? + + - So cross compile to bootstrap a native environment, then build natively + under emulation. + + - Trick: have the emulator call out to the cross compiler with distcc. + - ./configure, make, preprocessing, and linking all run native. + - heavy lifting of compilation farmed out, but that's hard to screw up. + +Firmware Linux Walkthrough. + - Prepare (download source, build some optional tools) + - Build cross compiler (cross-compiler.sh) + - Cross-compile native build environment (mini-native.sh) + - package this so qemu can run it + - Run it under qemu + - Build hello world natively. + +Build binutils. + Fairly straightforward, no target dependencies. + +Build gcc. + Needs a target version of binutils. + Pain in the ass because gcc is "special". + +Beating GCC into submission + - gcc is NOT SPECIAL. But it thinks it is. + - special case, special olympics, very special episode, isn't that special. + - Compiler turns input into output. So does a docbook to pdf converter. + - Explicit and implicit input files. + - xmlto has cmdline files, plus fonts and stylesheets. + - gcc has headers and libraries and such. + + - A compiler doesn't try to run the programs it's building. + - Reads C source, assembly, and ELF files. + - Outputs C source, assembly, ELF files. + - It reads and generates ELF and a.out as archive formats + - readelf, ar, ldd, nm, objdump, libbfd + - Not special. Just files. + + - In theory, all you have to tell GCC during ./configure is what target it + should produce binaries for. + - What host the resulting gcc you're building _runs_ on is determined by + the host compiler you're building gcc with, and it's got the same + __i386__ #defines as every other platform if it really wants details. + - That some compilers produce binaries that can be immediately executed + is sheer coincidence, nothing more. So can sed when it outputs a shell + script. + + - Targets are descirbed as tuples. + - What's a tuple? + - It's designed to conflate together several characteristics to reduce + orthogonality in the configuration. (No, this is not a good thing.) + Luckily, most of it doesn't apply to Linux. + - just append "-unknown-linux" to your architecture. + - armv4l, armv5l, mipsel, mips, sparc, powerpc, i686, x86_64 + - So nothing like the kernel's ARCH= values? + - Not really. + + - In practice, gcc wants to build itself with itself. + - No other compiler could possibly build a usable copy of gcc! + - gcc is _special_ + - so build a temporary version (xgcc) with the tainted other compiler, + then rebuild a _clean_ version with xgcc. + - That's not enough, it wants to build itself three times. + - 1) build xgcc, 2) rebuild, 3) rebuild with #2 and compare 2 & 3. + - Serious paranoia. Emotional scars. It was abused as a child. + - I'm sure the developers can tell us horror stories of platforms + where this was a vitally important safeguard. Those platforms were + not Linux. + - Dear gcc: you can't always get what you want. + + - This redundant build is not compatible with cross compiling. + - When gcc is making cross compiling, it can't do this crazy stuff + - Two vastly different build systems within the same project? + - But of course. It's from the FSF, where bigger is better. + - How do you tell it to produce a cross-compiler? + - When host and target differ, it behaves almost rationally. + --host=i686-unknown-linux --target=armv4l-unknown-linux + + - But what if I'm building for i686-uclibc target from i686-glibc host? + If gcc does the three-step, xgcc won't run because uclibc isn't + installed on the host. + - Lie to gcc, like so: + --host=i686-walrus-linux --target=i686-unknown-linux + - Then it thinks it's cross compiling, and will WORK. + + - The wrapper script: + - "pathological", adjective, "having to do with the path logic in gcc". + - gcc can't find files it installed. + - It searches in lots of different places using crazy paths with lots + of "subdir/../../../newdir" in them. Run it under strace sometime, + it's frightening. + - Paths from environment variables, paths from built-in spec files, + paths supplied from ./configure, paths added on the command line, + paths hardwired into the C code... + - Every time the previous layer bit-rots to the point of collapse, + they add yet another layer. They never _remove_ anything, they + just stick it at the front of the list and fall back to looking + in all the other locations when they can't find it. + - All this is based on an absolute path from the root, hardwiring + into the gcc binary the path to the directory gcc was built in. + (What if you want to install a toolchain into your home directory, + without needing root access?) + - At the end it falls back to default locations like /usr/include + which contains host stuff, not target stuff. + - If it can't find the headers or libraries it needs, it happily + substitutes the ones out of the host compiler. THIS IS WRONG. + + - How to we untangle this? + + - gcc has seven important sources of input (spec files don't count): + - Explicit input (C files listed on the command line) + - system library headers + - #include stdio.h. From libc, zlib, etc. + - default is /usr/include + - compiler headers + - #include stdarg.h, for va_arg and such. + - default looks like /usr/lib/gcc/i486-linux-gnu/4.1.2/include + - system libraries + - libc.so, libz.so, libncurses.so.5.5 + - search path, includes /lib:/usr/lib and elsewhere. + - try ldd /bin/ls + - note .a vs .so vs .so.6 + - compiler libraries + - libgcc_s.so (is evil). + - stack unwinding, divide by long on 32-bit platforms, soft float... + - at compile time, /usr/lib/gcc/i486-linux-gnu/4.1.2 + - at run time libgcc_s.so is in /lib + - Executable search path + - In theory so gcc can call ld. + - in practice, collect2 and cc1 because gcc is bloated. + - You'd think this would use $PATH, but that would be too easy. + - Install binaries in $PATH? But I'm _SPECIAL_ + + - Bypass gcc's path logic entirely. It's the only way to be sure. + - Wrapper, parse gcc command line, call gcc with --nostdinc and + --nostdlib (so it won't fall back to leaking in host stuff), + explicitly specify header and library search paths (since we know + where they are), and edit $PATH so it can call ld and such. + - While we're at it, specify the location of the shared library loader. + - Not actually used at compile time, just written into the binary to + be used at runtime. But it's something else we need to get right + to make usable binaries. + + - Hang on, if we've got a wrapper why do we need to recompile for the + i686-glibc to i686-uClibc case? + - Because libgcc_s.so links against glibc, thus leaking a reference to + the wrong C library. (Thanks gcc!) If we rebuild libgcc_s.so against + uClibc, it's at least leaking a reference to the right library. (Or + better yet, configure gcc with --disable-shared so it has libgcc.a + instead and doesn't do this at all.) + + - Ok, that was the hard part. It's all downhill from here. + +So back to bootstrapping a toolchain. + Binutils builds by itself with no dependencies on other target packages. + gcc needs a target version of binutils + gcc also needs a wrapper script (or _extensive_ patching) to have sane + path logic, but the wrapper script has no dependencies on anything else. + uclibc needs: + a gcc for the target + linux kernel headers for the target (but the kernel headers have no + dependencies and can be installed first). + + - start with binutils, wrapper, and/or linux kernel headers. + - gcc comes after binutils. + - uClibc comes after gcc and kernel headers. + +Now build a native build environment for the target. + In theory, you could follow Linux From Scratch from this point on. + In practice, a minimal development environment only needs seven packages: + binutils, gcc, kernel (headers _and_ vmlinux this time), uClibc, + busybox, make, bash. + (Why bash? Busybox shell isn't good enough yet.) + - might have been fixed in newer versins, I don't follow it anymore. + From that, you can build an entire Linux From Scratch system. + I spent ~3 years fixing up busybox to the point where it works for this. + + In theory, the smallest self-bootstrapping system would be four packages: + Linux, uclibc, busybox, tcc. + But that doesn't work yet. :) + +How do you build a native compiler for another platform? + Build twice: + Use your host compiler to build a compiler targeting the platform. + This runs on the host to produce target binaries, so it's a + cross compiler. + Use the cross compiler to build a compiler targeting the platform. + This runs on the target to produce target binaries, so it's a + native compiler for the target. + Technically, this is called a "canadian cross". + - They had to come up with a special name for it. It's NOT SPECIAL. + - Your cross compiler could target one platform and the second compiler + could target another, so you could use your x86 machine to build a + cross compiler that runs on sparc and outputs arm binaries. So what? + - This is what gcc is doing internally anyway if you tell it + --host arm and --target arm on your x86 machine. Except in _that_ + case it wants to know --build. Why? Don't go there. + +General outline: + +1) Terminology: cross compiling, native compiling, host/target, toolchain, etc. + +2) Why cross compiling is hard, and why we need to do it anyway. + +3) Building a cross compiler toolchain from linux, binutils, gcc, and uClibc. + Bootstrapping issues, what depends on what? + +4) Making a native build environment (adding make, busybox, and bash). + Building a new system. + +5) Packaging disk images, booting, and running under QEMU. + +6) Optimizations and alternatives. + (distcc, armulator, boards/bootloaders, nfs, tsrpm) + +7) Where to from here? (LFS, gentoo, etc.) + + +----------------------------------------------------------------------------- +Links: + http://www.landley.net/writing/docs/cross-compiling.html + http://www.landley.net/code/firmware/about.html + http://www.landley.net/code/firmware/design.html + http://cross-lfs.org/files/BOOK/1.0.0/ + http://www.gentoo.org/proj/en/base/embedded/index.xml + http://gentoo-wiki.com/Embedded_Gentoo + +http://www.quietearth.us/articles/2006/08/16/Building-deb-package-from-source + +http://qemu-forum.ipi.fi/qemu-snapshots/ +git://git.kernel.dk/data/git/qemu.git -- cgit v1.2.3-54-g00ecf