CS 452 S04 Tutorial

Introduction

This informal tutorial is designed to get everybody comfortable with the architecture we will be using. By the end of this tutorial you should have a general feel for how programming the train is done.

The information covered in this tutorial will be slightly useful for Assignment 1, however the tutorial's main focus will be material that is relevant to Kernel Projects 1 & 2.

Assumptions

It will be assumed that you:

Have had a little bit exposure to assembly language programming (dlx, 68000, x86).
Know Unix pretty well (emacs, vi).
Are familiar with standard Unix programming tools, like gcc, nm, make, ar.

If you cringe after reading the above list because you suspect you may be ill-prepared, please contact the instructor, or the TAs for advice.

In this tutorial:

Introduction.
High-level overview of the Intel x86 Eos architecture which you will be dealing with.
Compiling and running "Hello world."
Assembly language tutorial. How to integrate x86 assembly language into your C code.
The 486 register set, segmentation, and Eos memory layout.
Software tools and engineering tips.

Much of the material here will overlap that material contained in Chapter 3 of your course notes: Software information. You should peruse this chapter at some point to familiarize yourself with the CS 452 software environment.

Setting up the environment

Before you do anything else, you need to get the right path settings for the CS452 tools. We provide scripts that you can use directly to do so. If your shell is csh, run

source /u/cs452/public/env.csh

If your shell is bash, run

source /u/cs452/public/env.sh

This should set up everything. You probably want to put the appropriate line in your .cshrc or .bashrc respectively.

"Hello, world" Example

The source for "Hello, world" can be found in the directory:

  /u/cs452/public/examples/cs452-demo

You can copy the files in this directory to a private directory of your own in order to do some experimentation. The directory contains:

a Makefile for making the targets.
linker script loader.cpp.x
Multiboot header multiboot.S / multiboot.h
some serial routines serial.cc
some text console routines video.cc

Look at main.cc to see what it does. You'll note that instead of passing argc, and argv[] to main, there are arguments magic and mbiaddr. magic is simply a magic number, used to ensure your kernel booted correctly. mbiaddr is the pointer address of the Multiboot header. You'll be needing this when you get to your actual kernel.

Compiling the Source

If you look at the Makefile you'll notice the C flags:

-nostdinc -nostdinc++ -nodefaultlibs -nostdlib
-I$(CS452DIR)/include -I$(CS452DIR)/i586-elf/include -I$(CS452DIR)/include/cs452
-I$(CS452DIR)/lib/gcc-lib/i586-elf/3.3.3/include/

and the linker flags:

-L$(CS452DIR)/lib -lstdc++ -lgcc -lc

Instead of using the standard C library, we've provided an implementation of the Newlib library (http://sources.redhat.com/newlib/). Newlib is a C library intended for use on embedded systems. Note however that certain C functions require OS services, such as printf(), which of course don't exist. The file stubs.cc provides stubs for all the OS methods Newlib requires. Generally speaking, if the method requires file or console I/O then you can't use it without specific kernel support. See chapter 12 of the Newlib C documentation for more information.

Using Newlib is the same as using the regular C library. You simply include the header and link with the library. Because GCC doesn't use Newlib by default, we include the above compilation and linking flags. See the gcc man page or documentation for more info about them.

Here is what happens when you compile the "Hello world" sample program.

[cs452@lagrange]...public/examples/cs452-demo> pwd
/u3/cs452/public/examples/cs452-demo
[cs452@lagrange]...public/examples/cs452-demo> make
/u3/cs452/i586-3.3.3//bin/i586-elf-g++ -nostdinc -nostdinc++ -nodefaultlibs -nostdlib
-O6 -Wall -I.  -I/u3/cs452/i586-3.3.3//include -I/u3/cs452/i586-3.3.3//i586-elf/include 
-I/u3/cs452/i586-3.3.3//include/cs452 -I/u3/cs452/i586-3.3.3//lib/gcc-lib/i586-elf/3.3.3/include/ 
-c main.cc -o main.o
main.cc: In function `int main(long unsigned int, long unsigned int)':
main.cc:10: warning: unused variable `int i'

/u3/cs452/i586-3.3.3//bin/i586-elf-g++ -nostdinc -nostdinc++ -nodefaultlibs -nostdlib 
-O6 -Wall -I.  -I/u3/cs452/i586-3.3.3//include -I/u3/cs452/i586-3.3.3//i586-elf/include 
-I/u3/cs452/i586-3.3.3//include/cs452 -I/u3/cs452/i586-3.3.3//lib/gcc-lib/i586-elf/3.3.3/include/ 
-c serial.cc -o serial.o

/u3/cs452/i586-3.3.3//bin/i586-elf-g++ -nostdinc -nostdinc++ -nodefaultlibs -nostdlib 
-O6 -Wall -I.  -I/u3/cs452/i586-3.3.3//include -I/u3/cs452/i586-3.3.3//i586-elf/include
-I/u3/cs452/i586-3.3.3//include/cs452 -I/u3/cs452/i586-3.3.3//lib/gcc-lib/i586-elf/3.3.3/include/ 
-c video.cc -o video.o

/u3/cs452/i586-3.3.3//bin/i586-elf-g++ -nostdinc -nostdinc++ -nodefaultlibs -nostdlib  
-O6 -Wall -I.  -I/u3/cs452/i586-3.3.3//include -I/u3/cs452/i586-3.3.3//i586-elf/include
-I/u3/cs452/i586-3.3.3//include/cs452 -I/u3/cs452/i586-3.3.3//lib/gcc-lib/i586-elf/3.3.3/include/ 
-c stubs.cc -o stubs.o
stubs.cc: In function `clock_t times(tms*)':
stubs.cc:120: warning: return of negative value `-1' to `clock_t'
stubs.cc:120: warning: argument of negative value `-1' to `long unsigned int'
stubs.cc: In function `void _exit(int)':
stubs.cc:18: warning: `noreturn' function does return

/u3/cs452/i586-3.3.3//bin/i586-elf-gcc -nostdinc -nostdinc++ -nodefaultlibs -nostdlib 
-O6 -Wall -I.  -I/u3/cs452/i586-3.3.3//include -I/u3/cs452/i586-3.3.3//i586-elf/include 
-I/u3/cs452/i586-3.3.3//include/cs452 -I/u3/cs452/i586-3.3.3//lib/gcc-lib/i586-elf/3.3.3/include/ 
-c multiboot.S -o multiboot.o

/u3/cs452/i586-3.3.3//bin/i586-elf-g++ -nostdinc -nostdinc++ -nodefaultlibs -nostdlib  
-O6 -Wall -I.  -I/u3/cs452/i586-3.3.3//include -I/u3/cs452/i586-3.3.3//i586-elf/include 
-I/u3/cs452/i586-3.3.3//include/cs452 -I/u3/cs452/i586-3.3.3//lib/gcc-lib/i586-elf/3.3.3/include/ 
-Wl,-T,loader.cpp.x multiboot.o main.o serial.o video.o stubs.o -L/u3/cs452/i586-3.3.3//lib 
-lstdc++ -lgcc -lc  -o cs452_demo

/u3/cs452/i586-3.3.3//bin/i586-elf-strip cs452_demo -o cs452_demo-stripped

Posting the application

Once compiled, we have two i586 executables, cs452_demo and cs452_demo-stripped. Now we need to make our program available to run. We do this by running the 452-post.py program. The Makefile is already set up to do so for us:

[cs452@lagrange]...public/examples/cs452-demo> make post
Acquiring lock for grub listing
Got lock
Reading grub entries
Cleaning up kernel/module list
Adding entry: ['cs452_demo']
Saving grub list
Copying your files to /software/undergrad.math/data/cs452-sysfiles/cs452/
All done

In the /software/undergrad.math/data/cs452-sysfiles/ directory 452-post will create a directory with your username (if it doesn't already exist) and copy your "kernel" and any modules to it. The "kernel" is just the application Grub will initially boot. The string "Hello World" is how your application will show up in the Grub listing. Looking at the file /software/undergrad.math/data/cs452-sysfiles/.grub.lst we see it now contains:

title cs452: Hello World
root (nd)
kernel /software/undergrad.math/data/cs452-sysfiles/cs452/cs452_demo

Plus whatever else was in the listing before. Entries from 452-post will be removed if the kernel or any modules the Grub listing entry points to are removed.

Running the Program

Here are the steps you take to run the Hello program.

Walk over to the Eos system.
Boot with the Grub boot disk, or select "Update Menu"
Select "Hello World", or whatever title you gave 452-post
View the results on the WYSE and CRT displays.

The results of running your hello program are not extremely exciting. You will notice message "Hello World", and the computer will promptly halt.

Links

Newlib: http://sources.redhat.com/newlib/
Grub: http://www.gnu.org/software/grub/manual/
Multiboot: http://www.gnu.org/software/grub/manual/multiboot/multiboot.html

Assembly Language Tutorial

You will not be needing assembly language for your first assignment. However, the sooner you learn how to deal with assembly language in your code, the better off you will be.

For more detailed information about the architecture and about processor instructions, you will need access to a 486 (or 386+) microprocessor manual. There is one in the lab, but you may want your own so that you don't have to share. The one I like is entitled The 80386 book, by Ross P. Nelson. It is much more readable than the book in the lab. (This book is copyright 1988 by Microsoft Press, ISBN 1-55615-138-1.) You can also refer to Chapter 3, Software Information, of your course notes.

Intel processor manuals may also be found at http://www.x86.org/intel.doc/586manuals.htm.

The GNU Assembler, gas, uses a different syntax from what you will likely find in any x86 reference manual, and the two-operand instructions have the source and destinations in the opposite order. Here are the types of the gas instructions:

    opcode                    (e.g., pushal)
    opcode operand            (e.g., pushl %edx)
    opcode source,dest        (e.g., movl %edx,%eax) (e.g., addl %edx,%eax)

Where there are two operands, the rightmost one is the destination. The leftmost one is the source.
For example, movl %edx, %eax means Move the contents of the edx register into the eax register. For another example, addl %edx,%eax means Add the contents of the edx and eax registers, and place the sum in the eax register.

Included in the syntactic differences between gas and Intel assemblers is that all register names used as operands must be preceeded by a percent (%) sign, and instruction names usually end in either "l", "w", or "b", indicating the size of the operands: long (32 bits), word (16 bits), or byte (8 bits), respectively. For our purposes, we will usually be using the "l" (long) suffix.

80386+ Register Set

There are different names for the same register depending on what part of the register you want to use. To use the first set of 8 bits of eax (bits 0-7), you would use %al. For the second set of 8 bits (bits 8-15) of eax you would use %ah. To refer to the lowest 16 bits of eax (bits 0-15) together you would use %ax. For the entire 32 bits you would use %eax (90% of the time this is what you will be using). The form of the register name must agree with the size suffix of the instruction.

Here are the important processor registers:

    EAX,EBX,ECX,EDX - "general purpose", more or less interchangeable

    EBP             - used to access data on stack
                    - when this register is used to specify an address, SS is
                      used implicitly

    ESI,EDI         - index registers, relative to DS,ES respectively

    SS,DS,CS,ES,FS,GS - segment registers
                      - (when Intel went from the 286 to the 386, they figured
                         that providing more segment registers would be more
                         useful to programmers than providing more general-
                         purpose registers... now, they have an essentially
                         RISC processor with only _FOUR_ GPRs!)
                      - these are all only 16 bits in size

    EIP            - program counter (instruction pointer), relative to CS

    ESP            - stack pointer, relative to SS

    EFLAGS         - condition codes, a.k.a. flags

Segmentation

We are using the 32-bit segment addressing feature of the 486. Using 32-bit addressing as opposed to 16-bit addressing gives us many advantages:

No need to worry about 64K segments. Segments can be 4 gigabytes in length under the 32-bit architecture.
32-bit segments have a protection mechanism for segments, which you have the option of using.

You don't have to deal with any of that ugly 16-bit crud that is used in other operating systems for the PC, like DOS or OS/2; 32-bit segmentation is really a thing of beauty in comparison to that.

i486 addresses are formed from a segment base address plus an offset. To compute an absolute memory address, the i486 figures out which segment register is being used, and uses the value in that segment register as an index into the global descriptor table (GDT). The entry in the GDT tells (among other things) what the absolute address of the start of the segment is. The processor takes this base address and adds on the offset to come up with the final absolute address for an operation. You'll be able to look in a 486 manual for more information about this or about the GDT's organization.

i486 has 6 16-bit segment registers, listed here in order of importance:

CS: Code Segment Register
Added to address during instruction fetch.
SS: Stack Segment Register
Added to address during stack access.
DS: Data Segment Register
Added to address when accessing a memory operand that is not on the stack.
ES, FS, GS: Extra Segment Registers
Can be used as extra segment registers; also used in special instructions that span segments (like string copies).

Initially there is no (guaranteed) GDT once Grub boots your kernel. Grub will place your kernel's code segment at address 0x00100000 (1M) and the initial GDT Grub creates sets the base CS and DS segments to the start of memory, and the length of the segments to 4GB. What the means is that you must tell the linker to link your kernel's code at 1M (see the linker script provided in the "Hello World" example).

To explain the contradiction in the above paragraph with the GDT we note that once your kernel is running, you're in protected mode. To get into protected mode Grub had to set up a GDT. Once in the kernel though, until the processor executes an instruction that looks at the GDT (such as when performing a context switch), the GDTR doesn't need to be valid. The Multiboot specification states that we can't assume the GDTR points to anything valid, which is why you need to create your own GDT as soon as possible. This is only necessary for your actual kernel, not your first assignment.

The x86 architecture supports different addressing modes for the operands. A discussion of all modes is out of the scope of this tutorial, and you may refer to your favourite x86 reference manual for a painfully-detailed discussion of them. Segment registers are special, you can't do a

    movw seg-reg, seg-reg

You can, however, do

    movw seg-reg,memory
    movw memory,seg-reg
    movw seg-reg,reg
    movw reg,seg-reg

Note: If you movw %ss,%ax, then you should xorl %eax,%eax first to clear the high-order 16 bits of %eax, so you can work with long values.

Common/Useful Instructions

mov (especially with segment registers)
    - e.g.,:
        movw %es,%ax
        movl %cs:4,%esp
        movw _processControlBlock,%cs

    - note:     mov's do NOT set flags

pushl, popl       - push/pop long
pushal, popal     - push/pop EAX,EBX,ECX,EDX,ESP,EBP,ESI,EDI

call  (jumps to piece of code, saves return address on stack)
         e.g., call _cFunction

int   - call a software interrupt

ret   (returns from piece of code entered due to call instruction)
iretl (returns from piece of code entered due to hardware or software
       interrupt)


sti, cli - set/clear the interrupt bit to enable/disable interrupts
           respectively

Mixing C and Assembly Language

The way to mix C and assembly language is outlined on page 33 of your Course Notes. You use the "asm" directive. To access C-language variables from inside of assembly language, you simply use the C identifier name as a memory operand. These variables cannot be local to a procedure, and also cannot be static inside a procedure. They must be global (but can be static global). The newline characters are necessary.


unsigned long a1, r;
void junk( void )
{
   asm(
        "pushl %eax \n"
        "pushl %ebx \n"
        "movl $100,%eax \n"
        "movl a1,%ebx \n"
        "int $69 \n"
        "movl %eax,r \n"
        "popl %ebx \n"
        "popl %eax \n"
   );
}

This example does the following:

Pushes the value stored in %eax and %ebx onto the stack.
Puts a value of 100 into %eax.
Copies the value in global variable a1 into %ebx.
Executes a software interrupt number 69.
Copies the value in %eax into the global variable r.
Restores (pops) the contents of the temporary registers %eax and %ebx.

Software Engineering Tools and Tips

Software Tools

Here is a list of software tools that you have at your disposal. You can peruse the man page of each tool in order to become more familiar with it.

gcc
This is of course your compiler.
nm, objdump
These tools print out the symbol table for object files, executables, and libraries.
strip
This tool strips the symbol table out of the desired executable, object file, or library.
make
This tool is used to compile entire projects. If you are using make and you change one file, only those object files, executables and libraries will be recompiled, not the entire project.
ar
Packages a bunch of object files into a library.
gripe
Use this tool to report problems with the hardware and the software.

Note that you do not have a debugger at your disposal. If you wish to have a debugger for your system, you must write one. Note that this is a non-trivial exercise and depends on how interesting you want to make your debugger.

Software Engineering Tips

The projects in this course are a great test of your software engineering skill. They will test not only your programming speed, but also your ability to design and debug code, as well as your ability to interact with the other person(s) in your group. The course notes contain numerous software engineering hints that will help you speed your development. We also have a few pieces of advice for you:

Use the -S option of gcc to generate assembly language. This will allow you to see exactly what the compiler is generating as assembly code for your C statements.
Use CVS, RCS or some other version control tool.
Learn the make tool well. Use the SUFFIXES feature of "make".
You can compile on the l-hosts.
Keep assembly-level programming to a minimum.
Look into using Bochs
Start early on projects.