CS 350 - Operating Systems

Hints for A2

This file may be updated, so check back periodically


Quick Topic Links

General

  • Don't forget to re-configure your kernel for Assignment 2, and to do all of your kernel builds in kern/compile/ASST2
    % cd $HOME/cs350-os161/os161-1.99/kern/conf
    % ./config ASST2
    % cd ../compile/ASST2
    % bmake depend
    % bmake
    % bmake install
    

  • Don't forget that if you add any new source files to the kernel code, you need to update the file kern/conf/conf.kern and you need to reconfigure your kernel. If you don't do this, the code in your new source file will not be included in the kernel build.

  • Don't worry about trying to reclaim physical frames in the kernel when they are no longer used. However, you should make sure that you kfree kernel memory that you've allocated using kmalloc but that the kernel no longer needs.

  • You may like to have lots of debugging information printing out from your kernel, but the kernel you submit to us must not do so. OS/161's DEBUG facility that can help with this problem - you can bury lots of debugging statements into your code, and you can turn them all off by just changing the value of a single variable. You can even turn them off conditionally by labeling them properly.

    You encountered the DEBUG mechanism in A0, but as a reminder:


  • Clarification of Lock Behaviour

    You need to enforce some constraints on how locks are used. One that is a common source of questions states roughly: A common question is how to handle these types of problems. Some possible options are:
    1. Do not release the lock but do not report an error. This is the least desirable option since a programmer calling lock release in an incorrect way will not be notified of the error.
    2. Crash the kernel using panic().
    3. Use KASSERT() to test for and catch the error.
    Between panic() and KASSERT(), KASSERT is preferred, since lock_acquire() and lock_release() will be called within the kernel, so there is a high degree of confidence that they will be called correctly. If they are not there is a bug in the kernel and we want to be sure to fix the bug.

    OS/161 crash with no output

    When the OS/161 kernel starts to boot, there is a period of time during which the kernel cannot produce any output because the output device has not yet been initialized. If OS/161 crashes during this early phase, it will exit without producing any output, like this:
    @cpu20.student[104]% sys161 kernel
    sys161: System/161 release 1.13, compiled May 19 2007 21:48:06
    sys161: 213925 cycles (88758k, 0u, 125167i)
    sys161: 0 irqs 0 exns 0r/0w disk 0r/0w console 0r/0w/1m emufs 0r/0w net
    sys161: Elapsed real time: 0.017304 seconds (12.3627 mhz)
    sys161: Elapsed virtual time: 0.008557000 seconds (25 mhz)
    @cpu20.student[105]% 
    
    In the example above, the only output is from the simulator program (sys161), which reports some execution statistics when the simulated machine shuts down. All of the usual boot messages from the OS/161 kernel are missing.

    It is also true that the kernel creates and uses locks very early in the boot process. Thus, if you start working on locks and you find that the kernel starts failing without producing any output, as shown above, a likely explanation is that your lock code is responsible for an early crash of the kernel. Note that you cannot debug this problem by adding kprintfs to your kernel, because the failure is happening before the kernel can display the output from your kprintf. (In addition, kprintf actually uses a lock, so adding more kprintfs may affect the problem you are trying to debug, or may cause new problems.)

    You need to use gdb to help you determine where the problem lies. A good thing to start with is to set breakpoints at your lock functions, and step through them when they are called to ensure that they are behaving as you expect. Another useful idea is to set a breakpoint at the panic function, which the kernel will invoke if an assertion fails.


    Weird/Garbled output from sy2

    If the output you are seeing appears to be a mess that looks something like the sample bit of output shown below, it is most likely because you have not implemented locks correctly. kprintf (which is used to print output when tests are run) requires lock_acquire and lock_release to be implemented correctly in order to prevent too much text from different threads from being completely intermingled. When locks don't work correctly the sy2 test prints out messages indicating failure but because the locks aren't implemented correctly the output becomes very hard to read because characters from the output from different threads are interleaved.
    Starting lock test...
    thread 0: Mismatch ont thhrteeads rtval2/3:te thMrsismetval1
    aTtceehs t faaildoed
    n t hreaadttheths2r: rd  eeMitsattahtrdev1 mtathr1t3dtaahhr7le 2hrdc: hea/dar2e :rMte0hd 1o 1ita:t hr Methreed ans4:htvea1add te5t aMlastdht:t  Mhi 2ir11d :hr et7shm1rs:is  maMtMsemr2et1hm1: dai Miis6m
    tTaetar9:ththr etsasMiahhreathtch: e:chrv area:ddtsadcera sattaMi otdmm M  i22r madhd eh toaMi1hresa2s 5ce  tt9: M1Mhrrsn namaett:m:haoccd 2aiislem3et6 datahc a  dnhh f3ds8m12a0ae: tcdtrhMtMon   a: m8t a6tcas ct 2ee iciooi a /nud:h:ttMih:dc4sa1oshsnn lMestm  o e12s8 Mh:tvm nmi e3m tv:
    nM s0 :h  aa 5ad st1tiattea Titvld
      oMl1:/ tch27McchMesa1:t: oMno tin seome:ihs isml/ evMnis mchu mtnasesstvotannMsai naMmtos ttv atma fsaat  ote l1toenmtctvalM2tslictiuns/si
    ctTs athcat2i/vm2lhvsmmanmat tcn cheltc/steased 
    uth eotsnt oh s1ehnmatt/o
    aTlcveaets vn ott/s ualvctnem1shhlsvma ttevestont1a e s
    /Tn otall2enaantc/lhvmtstteut 2n3o /st sl1vhn1
    ale2sessmt/n/u ttctt/ua u T
    /Tntt
     vnnomfavthevnmlomoeeu Tvfa n
    uetveil sau
    Tn n tsftaal m
    saTTaeeotlm1e
    sesmaieliltlseltvsd
    

    Testing Synchronization Primitives

    Locks and Condition Variables

    We will use some combination of sy2, uw1, and sy3 to test locks and condition variables. We will run these tests with different numbers of processors. (You can change the number of processors in the simulated machine by changing the sys161.conf file before starting sys161.)

    Controlling the Number of Processors

    You can control the number of processors in the simulated machine by changing the SYS/161 configuration file, sys161.conf. Look for a line like this:

    31	mainboard  ramsize=524288  cpus=1
    
    The cpus=1 indicates that that simulated machine has a single CPU (processor). To simulate more CPUs, just use change the CPU count. For example, changing the line above to
    31	mainboard  ramsize=524288  cpus=4
    
    will give you a system with 4 CPUs.

    So that you're not constantly editing the configuration file, you can create multiple different configuration files and pass the one you want to the simulator when it starts, using the -c option, e.g,

    sys161 -c my-config-file.conf kernel
    

    GDB tips for Synchronization

    This is an example gdb session to show what output to expect and how to up up the stack trace to look at a simple variable. This session was produced by Tim Brecht using a previous (older) version of OS/161 (so your code may not look the same). His comments are on the lines that start with "//". Each comment refers to the output that follows it.

    This example assumes that there is a .gdbinit file like the one described in the A0 hints.

    // Start cs350-gdb.
    % cs350-gdb kernel
    GNU gdb 6.0
    Copyright 2003 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "--host=i686-pc-linux-gnu --target=mips-elf"...
    __start () at ../../arch/mips/mips/start.S:24
    24         addiu sp, sp, -20
    Current language:  auto; currently asm
    Breakpoint 1 at 0x80010948: file ../../lib/kprintf.c, line 111.
    
    // Continue execution. This stops and waits for input into os161.
    // In the window running sys161 -w kernel after I type something like
    // p testbin/program it runs and eventually panics.
    (gdb) continue
    Continuing.
    
    Breakpoint 1, panic (fmt=0x8002fe6c "Assertion failed: %s, at %s:%d (%s)\n") at ../../lib/kprintf.c:111
    111             if (evil==0) {
    Current language:  auto; currently c
    
    // I want to find out who called panic and why.
    // Where shows the stack trace of who called what and all
    // the parameters.
    // panic is where we are currently.
    // It was called from lock_destroy at line 181 of thread/synch.c
    // It was called from proc_entry_init at line 601 of userprog/syscalls.c
    // etc.
    // proc_entry_init was the function that called lock_destroy.
    (gdb) where
    #0  panic (fmt=0x8002fe6c "Assertion failed: %s, at %s:%d (%s)\n") at ../../lib/kprintf.c:111
    #1  0xffffffff8001740c in lock_destroy (lock=0x800c3e20) at ../../thread/synch.c:181
    #2  0xffffffff80022b30 in proc_entry_init (pid=2) at ../../userprog/syscalls.c:601
    #3  0xffffffff800258dc in sys_waitpid (pid=2, status=0x7fffff90, options=0, retval=0x800fcf18)
        at ../../userprog/syscalls.c:1367
    #4  0xffffffff8000c6b8 in mips_syscall (tf=0x800fcf6c) at ../../arch/mips/mips/syscall.c:145
    #5  0xffffffff8000cb68 in mips_trap (tf=0x800fcf6c) at ../../arch/mips/mips/trap.c:134
    
    // Move up the stack.
    (gdb) up
    #1  0xffffffff8001740c in lock_destroy (lock=0x800c3e20) at ../../thread/synch.c:181
    181             assert(lock->held == 0);
    
    // Move up the stack.
    (gdb) up
    #2  0xffffffff80022b30 in proc_entry_init (pid=2) at ../../userprog/syscalls.c:601
    601                     lock_destroy(p->ft_lock);
    
    // Move up the stack.
    (gdb) up
    #3  0xffffffff800258dc in sys_waitpid (pid=2, status=0x7fffff90, options=0, retval=0x800fcf18)
        at ../../userprog/syscalls.c:1367
    1367              proc_entry_init(pid);
    
    // Print out some info about the current thread.
    (gdb) print *curthread
    $1 = {t_pcb = {pcb_switchstack = 2148519408, pcb_kstack = 2148519936, pcb_ininterrupt = 0, pcb_badfaultfunc = 0,
        pcb_copyjmp = {2148517880, 2147537544, 0, 0, 0, 0, 0, 0, 0, 0, 2148517880}},
      t_name = 0x800d2d00 "ourtests/conc-io", t_sleepaddr = 0x0, t_stack = 0x800fc000 "o?=\021o?=3", t_vmspace = 0x800f7360,
      t_cwd = 0x800d2f40, pid = 1}
    
    // Print out the value of pid as it is inside of userprog/syscalls.c in the
    // function sys_waitpid at the time of the call to proc_entry_init.
    (gdb) print pid
    $2 = 2
    
    // Print out the value of pid in hex.
    (gdb) print/x pid
    $2 = 0x2
    
    // x/i Examine (x) specified memory location as an instruction (i).
    // Print out the instruction located at address 0xffffffff8000c6b8.
    // Note that I got this address from the above stack trace.
    // It shows you that this is address is 520 bytes inside of mips_syscall.
    // This can be really helpful if OS/161 dies and sys161 just prints
    // out the epc (which contains the address of the instruction
    // that generated the execption.
    (gdb) x/i 0xffffffff8000c6b8
    0x8000c6b8 :  j       0x8000c6f0 
    
    // x/i Examine (x) specified memory location as an instruction (i).
    // Print out the instruction located at address 0xffffffff8001740c.
    // Note that I got this address from the above stack trace.
    // It shows you that this address is 148 bytes inside of lock_destroy.
    (gdb) x/i 0xffffffff8001740c
    0x8001740c :  lw      v0,32(s8)
    
    
    The list command is useful for finding the line of source that contains the faulting instruction (note the *):
    
    Something else that can be done (from a different version of the code)
    (gdb) list *0x800197c4
    0x800197c4 is in kmain (../../main/main.c:164).
    159     kmain(char *arguments)
    160     {
    161         char *bad_ptr = NULL;
    162         boot();
    163         
    164         *bad_ptr = 0;
    165
    166         menu(arguments);
    167
    168         /* Should not get here *
    

    _exit


    fork


    Kernel include files


    proper use of kmalloc and kfree


    System Call Testing


    Passing Arguments


    File I/O


    gdb