summaryrefslogtreecommitdiff
path: root/doc/tutorial.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/tutorial.txt')
-rw-r--r--doc/tutorial.txt390
1 files changed, 390 insertions, 0 deletions
diff --git a/doc/tutorial.txt b/doc/tutorial.txt
new file mode 100644
index 0000000..68a7cad
--- /dev/null
+++ b/doc/tutorial.txt
@@ -0,0 +1,390 @@
+Cross Compiling Linux - 2 hour tutorial
+
+This is a practical introduction to cross compiling, during which we'll build
+a working cross-compiler, use it to cross-compile a native uClibc-based Linux
+development environment, and boot this new environment under QEMU.
+
+Attendees may choose arm, mips, x86, x86_64, sparc, or PPC as the platform
+they wish to build for. The author's Firmware Linux project (which already
+does all this) will be used as an example. Attendees should bring a reasonably
+fast laptop with net access and at least 256 megs of ram.
+
+General outline:
+
+1) Terminology: cross compiling, native compiling, host/target, toolchain, etc.
+2) Why cross compiling is hard, and why we need to do it anyway.
+3) Building a cross compiler toolchain from linux, binutils, gcc, and uClibc.
+4) Making a native build environment (adding make, busybox, and bash).
+5) Packaging disk images, booting, and running under QEMU.
+6) Optimizations and alternatives.
+ (distcc, armulator, boards/bootloaders, nfs, tsrpm)
+7) Where to from here? (LFS, gentoo, etc.)
+
+
+-----------------------------------------------------------------------------
+Links:
+ http://www.landley.net/writing/docs/cross-compiling.html
+ http://www.landley.net/code/firmware/about.html
+ http://www.landley.net/code/firmware/design.html
+ http://cross-lfs.org/files/BOOK/1.0.0/
+ http://www.gentoo.org/proj/en/base/embedded/index.xml
+ http://gentoo-wiki.com/Embedded_Gentoo
+
+http://www.quietearth.us/articles/2006/08/16/Building-deb-package-from-source
+
+http://qemu-forum.ipi.fi/qemu-snapshots/
+git://git.kernel.dk/data/git/qemu.git
+-----------------------------------------------------------------------------
+
+Today's agenda:
+
+ - learn about cross compiling
+ - build a working cross-compiler
+ - use it to cross-compile a native uClibc-based Linux development environment
+ - boot this new environment under QEMU.
+
+Platforms:
+ - What platforms does Linux support?
+ - To get the full list: cd include; echo asm-* | sed 's/asm-//g'
+ - alpha arm arm26 avr32 cris frv generic h8300 i386 ia64 m32r
+ m68k m68knommu mips parisc powerpc ppc s390 sh sh64 sparc sparc64 um
+ v850 x86_64 xtensa
+ - Not quite architectures: generic=shared code, um=User Mode Linux
+ - What dominates the big iron space?
+ - Top 500 supercomputers list: http://top500.org/stats/28/procfam/
+ x86-64 (44%), x86 (24%), and PPC (18%).
+ - Note: s390 important but not general purpose
+ - What dominates the embedded space?
+ - The big four: arm, mips, i386/x86_64, ppc.
+ - important but not general purpose:
+ - sh (super-hitachi) used in japan, especially in auto industry.
+ - coldfire (m68knommu): used in a small number of high volume devices.
+ - blackfin: up-and-coming, not merged yet. Employ interesting people.
+ - in decline (used to be more important) but still in use:
+ alpha, ia64, sparc, parisc.
+
+ - price, power consumption, performance, features
+ - power consumption == heat, high end and low end converge due to this.
+ - features could be software or integrated peripherals.
+
+ - close up on arm
+ - best power/performance ratio, owns over 80% of cell phone market.
+ - entire arm core only 43,000 transistors, 34 instructions.
+ - armv3 vs arm 7, architecture vs processor. We focus on architecture.
+ - armv3 introduced 32-bit, but obsolete. armv4 now low-end.
+ - most systems now being manufactured armv5 or up.
+ - newer runs older instructions. armv4->armv5 25% speedup for recompile.
+ - arm26 obsolete 26 bit addressing mode, like x86's 16-bit mode.
+ - can be LE or BE. Linux supports both, but only one at a time.
+ Chip doesn't care but motherboard might.
+
+ - close up on mips
+ - customizable: sold as library, FPGA version, fab your own, used in SOC.
+ - there is now a 64 bit version. Probably for bragging rights.
+ - reasonable power consumption, reasonable performance.
+ - can be LE or BE.
+
+ - close up on ppc
+ - can be LE or BE, long ago software selectable now almost universally BE.
+ - Apple/IBM/Motorola. Apple switched, Motorola spun off freescale.
+ - power.org attempt to stir up third party interest, some success.
+ - Game consoles give it serious volume, but Apple TV is x86-64.
+ - Models:
+ - Everything can run 7xx code except 4xx (ibm) and 8xx (motorola).
+ - 74xx is "G4", 970 is "G5" and is 64-bit.
+ - High power consumption, high performance
+ - Cell would have been very interesting if it had shipped in 2005, but
+ it's too late to matter outside gaming consoles now. Too much power
+ for embedded space, and big iron assembly programming specialized.
+
+ - close up on x86/x86-64
+ - Intel and AMD have stopped making non-embedded 32-bit processors.
+ - Once existing inventory sold, it's 64-bit only from here on out.
+ - A number of smaller players like via.
+ - Best price/performance, often best general purpose absolute performance
+ - Historically terrible power/performance, hence the fan.
+ - Recently paying attention to that, but a ways to go. On x86 a
+ fanless heat sink is a victory, whereas most arm runs cool to the touch.
+
+Compiling software for different platforms:
+ - Native compiling on different platforms:
+ - endianness, word size, alignment, sign of char, optimizations
+ - nommu is its own can of worms: stack, malloc, vfork, mmap
+ - x86 is the common case, but this changes to x86-64 soon.
+ - Neither intel nor amd making x86 outside of embedded space anymore.
+ Once current inventory sold, it's all 64-bit from here.
+ - cross compiling
+ - host and target are two contexts, most programs used to one.
+ - For most programs there is only one context: the target. Worrying about
+ host is the compiler's job, not yours.
+ - The compiler tells program what target it's building for with #defines
+ (__i386__, __arm__, __mips__) etc.
+ - To see all predefined symbols: gcc -dM -E - < /dev/null
+ - The headers specify endianness, #include <endian.h>
+ - confusing the two contexts:
+ - May have to build and run programs on host, ala menuconfig or unifdef.
+ - Need a host compiler for that, HOSTCC.
+ - Two compilers, keep track of which to use where
+ - ./configure asks questions about the machine it's building on to
+ determine what kind of program to build. Assumes host==target.
+ Fundamentally wrong for cross compiling, at the design level.
+ - two compilers (host and target), they get each other's files mixed up.
+ - #includes paths, library paths, gcc calls wrong ld...
+ - prefix the names, but this isn't complete ("collect2").
+ - gcc falls back to "default" search path when it can't find something.
+ - gcc doesn't know where to find the files it installed.
+
+ - Everybody cares about native compiling, nobody cares about cross compiling.
+ - Most projects don't care about cross compiling, and never will.
+ - gentoo over 4000 packages, gentoo embedded cross compiles maybe 300.
+ - Cross compiling complicates build system while restricting options,
+ this infrastructure is a source of bugs even when it's not used, and
+ only a tiny minority of the userbase will ever want it.
+ - Everybody cares about native compiling.
+ - If they don't, they'll probably take patches.
+ - simple to fix, non-intrusive, generally considered a good thing.
+ - Most developers don't know there's more to it than "build on arm".
+
+ - Building natively on real hardware can be problematic
+ - 200 mhz, 32 megs of ram, only access through serial port
+ - We have one piece of real hardware and five developers.
+
+ - But there are emulators, and desktops are cheap and powerful these days.
+ - Throwing hardware at the problem cheaper than developers.
+ - State of the art is QEMU, which is GPL.
+
+ - A certain amount of cross compiling can't be avoided.
+ - where do you get your development environment from?
+ - bootstrap new platform.
+ - recent ubuntu desktop CD for little-endian mips?
+
+ - So cross compile to bootstrap a native environment, then build natively
+ under emulation.
+
+ - Trick: have the emulator call out to the cross compiler with distcc.
+ - ./configure, make, preprocessing, and linking all run native.
+ - heavy lifting of compilation farmed out, but that's hard to screw up.
+
+Firmware Linux Walkthrough.
+ - Prepare (download source, build some optional tools)
+ - Build cross compiler (cross-compiler.sh)
+ - Cross-compile native build environment (mini-native.sh)
+ - package this so qemu can run it
+ - Run it under qemu
+ - Build hello world natively.
+
+Build binutils.
+ Fairly straightforward, no target dependencies.
+
+Build gcc.
+ Needs a target version of binutils.
+ Pain in the ass because gcc is "special".
+
+Beating GCC into submission
+ - gcc is NOT SPECIAL. But it thinks it is.
+ - special case, special olympics, very special episode, isn't that special.
+ - Compiler turns input into output. So does a docbook to pdf converter.
+ - Explicit and implicit input files.
+ - xmlto has cmdline files, plus fonts and stylesheets.
+ - gcc has headers and libraries and such.
+
+ - A compiler doesn't try to run the programs it's building.
+ - Reads C source, assembly, and ELF files.
+ - Outputs C source, assembly, ELF files.
+ - It reads and generates ELF and a.out as archive formats
+ - readelf, ar, ldd, nm, objdump, libbfd
+ - Not special. Just files.
+
+ - In theory, all you have to tell GCC during ./configure is what target it
+ should produce binaries for.
+ - What host the resulting gcc you're building _runs_ on is determined by
+ the host compiler you're building gcc with, and it's got the same
+ __i386__ #defines as every other platform if it really wants details.
+ - That some compilers produce binaries that can be immediately executed
+ is sheer coincidence, nothing more. So can sed when it outputs a shell
+ script.
+
+ - Targets are descirbed as tuples.
+ - What's a tuple?
+ - It's designed to conflate together several characteristics to reduce
+ orthogonality in the configuration. (No, this is not a good thing.)
+ Luckily, most of it doesn't apply to Linux.
+ - just append "-unknown-linux" to your architecture.
+ - armv4l, armv5l, mipsel, mips, sparc, powerpc, i686, x86_64
+ - So nothing like the kernel's ARCH= values?
+ - Not really.
+
+ - In practice, gcc wants to build itself with itself.
+ - No other compiler could possibly build a usable copy of gcc!
+ - gcc is _special_
+ - so build a temporary version (xgcc) with the tainted other compiler,
+ then rebuild a _clean_ version with xgcc.
+ - That's not enough, it wants to build itself three times.
+ - 1) build xgcc, 2) rebuild, 3) rebuild with #2 and compare 2 & 3.
+ - Serious paranoia. Emotional scars. It was abused as a child.
+ - I'm sure the developers can tell us horror stories of platforms
+ where this was a vitally important safeguard. Those platforms were
+ not Linux.
+ - Dear gcc: you can't always get what you want.
+
+ - This redundant build is not compatible with cross compiling.
+ - When gcc is making cross compiling, it can't do this crazy stuff
+ - Two vastly different build systems within the same project?
+ - But of course. It's from the FSF, where bigger is better.
+ - How do you tell it to produce a cross-compiler?
+ - When host and target differ, it behaves almost rationally.
+ --host=i686-unknown-linux --target=armv4l-unknown-linux
+
+ - But what if I'm building for i686-uclibc target from i686-glibc host?
+ If gcc does the three-step, xgcc won't run because uclibc isn't
+ installed on the host.
+ - Lie to gcc, like so:
+ --host=i686-walrus-linux --target=i686-unknown-linux
+ - Then it thinks it's cross compiling, and will WORK.
+
+ - The wrapper script:
+ - "pathological", adjective, "having to do with the path logic in gcc".
+ - gcc can't find files it installed.
+ - It searches in lots of different places using crazy paths with lots
+ of "subdir/../../../newdir" in them. Run it under strace sometime,
+ it's frightening.
+ - Paths from environment variables, paths from built-in spec files,
+ paths supplied from ./configure, paths added on the command line,
+ paths hardwired into the C code...
+ - Every time the previous layer bit-rots to the point of collapse,
+ they add yet another layer. They never _remove_ anything, they
+ just stick it at the front of the list and fall back to looking
+ in all the other locations when they can't find it.
+ - All this is based on an absolute path from the root, hardwiring
+ into the gcc binary the path to the directory gcc was built in.
+ (What if you want to install a toolchain into your home directory,
+ without needing root access?)
+ - At the end it falls back to default locations like /usr/include
+ which contains host stuff, not target stuff.
+ - If it can't find the headers or libraries it needs, it happily
+ substitutes the ones out of the host compiler. THIS IS WRONG.
+
+ - How to we untangle this?
+
+ - gcc has seven important sources of input (spec files don't count):
+ - Explicit input (C files listed on the command line)
+ - system library headers
+ - #include stdio.h. From libc, zlib, etc.
+ - default is /usr/include
+ - compiler headers
+ - #include stdarg.h, for va_arg and such.
+ - default looks like /usr/lib/gcc/i486-linux-gnu/4.1.2/include
+ - system libraries
+ - libc.so, libz.so, libncurses.so.5.5
+ - search path, includes /lib:/usr/lib and elsewhere.
+ - try ldd /bin/ls
+ - note .a vs .so vs .so.6
+ - compiler libraries
+ - libgcc_s.so (is evil).
+ - stack unwinding, divide by long on 32-bit platforms, soft float...
+ - at compile time, /usr/lib/gcc/i486-linux-gnu/4.1.2
+ - at run time libgcc_s.so is in /lib
+ - Executable search path
+ - In theory so gcc can call ld.
+ - in practice, collect2 and cc1 because gcc is bloated.
+ - You'd think this would use $PATH, but that would be too easy.
+ - Install binaries in $PATH? But I'm _SPECIAL_
+
+ - Bypass gcc's path logic entirely. It's the only way to be sure.
+ - Wrapper, parse gcc command line, call gcc with --nostdinc and
+ --nostdlib (so it won't fall back to leaking in host stuff),
+ explicitly specify header and library search paths (since we know
+ where they are), and edit $PATH so it can call ld and such.
+ - While we're at it, specify the location of the shared library loader.
+ - Not actually used at compile time, just written into the binary to
+ be used at runtime. But it's something else we need to get right
+ to make usable binaries.
+
+ - Hang on, if we've got a wrapper why do we need to recompile for the
+ i686-glibc to i686-uClibc case?
+ - Because libgcc_s.so links against glibc, thus leaking a reference to
+ the wrong C library. (Thanks gcc!) If we rebuild libgcc_s.so against
+ uClibc, it's at least leaking a reference to the right library. (Or
+ better yet, configure gcc with --disable-shared so it has libgcc.a
+ instead and doesn't do this at all.)
+
+ - Ok, that was the hard part. It's all downhill from here.
+
+So back to bootstrapping a toolchain.
+ Binutils builds by itself with no dependencies on other target packages.
+ gcc needs a target version of binutils
+ gcc also needs a wrapper script (or _extensive_ patching) to have sane
+ path logic, but the wrapper script has no dependencies on anything else.
+ uclibc needs:
+ a gcc for the target
+ linux kernel headers for the target (but the kernel headers have no
+ dependencies and can be installed first).
+
+ - start with binutils, wrapper, and/or linux kernel headers.
+ - gcc comes after binutils.
+ - uClibc comes after gcc and kernel headers.
+
+Now build a native build environment for the target.
+ In theory, you could follow Linux From Scratch from this point on.
+ In practice, a minimal development environment only needs seven packages:
+ binutils, gcc, kernel (headers _and_ vmlinux this time), uClibc,
+ busybox, make, bash.
+ (Why bash? Busybox shell isn't good enough yet.)
+ - might have been fixed in newer versins, I don't follow it anymore.
+ From that, you can build an entire Linux From Scratch system.
+ I spent ~3 years fixing up busybox to the point where it works for this.
+
+ In theory, the smallest self-bootstrapping system would be four packages:
+ Linux, uclibc, busybox, tcc.
+ But that doesn't work yet. :)
+
+How do you build a native compiler for another platform?
+ Build twice:
+ Use your host compiler to build a compiler targeting the platform.
+ This runs on the host to produce target binaries, so it's a
+ cross compiler.
+ Use the cross compiler to build a compiler targeting the platform.
+ This runs on the target to produce target binaries, so it's a
+ native compiler for the target.
+ Technically, this is called a "canadian cross".
+ - They had to come up with a special name for it. It's NOT SPECIAL.
+ - Your cross compiler could target one platform and the second compiler
+ could target another, so you could use your x86 machine to build a
+ cross compiler that runs on sparc and outputs arm binaries. So what?
+ - This is what gcc is doing internally anyway if you tell it
+ --host arm and --target arm on your x86 machine. Except in _that_
+ case it wants to know --build. Why? Don't go there.
+
+General outline:
+
+1) Terminology: cross compiling, native compiling, host/target, toolchain, etc.
+
+2) Why cross compiling is hard, and why we need to do it anyway.
+
+3) Building a cross compiler toolchain from linux, binutils, gcc, and uClibc.
+ Bootstrapping issues, what depends on what?
+
+4) Making a native build environment (adding make, busybox, and bash).
+ Building a new system.
+
+5) Packaging disk images, booting, and running under QEMU.
+
+6) Optimizations and alternatives.
+ (distcc, armulator, boards/bootloaders, nfs, tsrpm)
+
+7) Where to from here? (LFS, gentoo, etc.)
+
+
+-----------------------------------------------------------------------------
+Links:
+ http://www.landley.net/writing/docs/cross-compiling.html
+ http://www.landley.net/code/firmware/about.html
+ http://www.landley.net/code/firmware/design.html
+ http://cross-lfs.org/files/BOOK/1.0.0/
+ http://www.gentoo.org/proj/en/base/embedded/index.xml
+ http://gentoo-wiki.com/Embedded_Gentoo
+
+http://www.quietearth.us/articles/2006/08/16/Building-deb-package-from-source
+
+http://qemu-forum.ipi.fi/qemu-snapshots/
+git://git.kernel.dk/data/git/qemu.git