Name

bittersuite3 — A framework for running tests via RST

Description

BitterSuite3 is a testing framework that works on top of rst. Instead of providing full runTests and computeMarks programs, the user simply provides stubs that redirect to the main BitterSuite3 code. The goal is to minimize the amount of scripting required by course staff, so the focus instead can be on creating tests.

The output file (generated by rst, and suitable either for mailing testing output to students or generating a postscript printout for handmarking) contains by default a processed version of the staff-provided mark-scheme, a nicely-formatted summary of all of the autotesting, side-by-side output comparisons for generated output that differs from the expected output, and optionally the text of all submitted files.

RST hooks

Alternatively, the runTests and computeMarks in /u/isg/bittersuite3 should be set as defaults in .rstrc. This way, once a course is set up to use BitterSuite, course staff no longer have to think about this for every suite for every assignment.

Alternatively, the stubs that should be provided in the RST testdir are very short. The contents of runTests should be

         #!/bin/sh

         exec /u/isg/bittersuite3/runTests
      

and the contents of computeMarks should be

         #!/bin/sh

         exec /u/isg/bittersuite3/computeMarks
      

with any appropriate flags appended. The -f flag means that any student-submitted files should appear on the output; this is desirable for output to be marked by TAs, but not for a public test system. The intent-indicating -q flag and the mark-scheme modifying -n and -i flags are described below.

Test directory setup

In addition to the runTests and computeMarks scripts in a given test suite directory, there are other files and subdirectories standardized by Bittersuite.

config.ss

This file specifies options that will apply to this entire BitterSuite run. They should be specified in key-value pairs within S-expressions.

interpret-mark-scheme
A boolean value, set to false by default. If changed to true, bash commands in the mark-scheme file, such as autotesting grades, will be evaluated.
nroff-mark-scheme
A boolean value, set to false by default. If changed to true, nroff commands in the mark-scheme file, such as .ti, will be evaluated.
print-by-question
A boolean value, set to false by default, which determines how the autotesting results are displayed in the output file of rst. If true, autotesting totals for each question will be displayed, and the tests will be labelled as "Question X, test Y". If false, autotesting totals for each question will not be displayed, and the tests will be labelled as "Test X_Y".
print-submit-files
A boolean value, set to false by default. If changed to true, the student's code will be included in the output file of rst.
test-account
A string specifying which account the tests will be run on (default: csXXXt)
test-connect
A list of strings specifying the command used to connect to the test account. This command will be run as test-connect test-account (default: ("ssh" "-o" "NoHostAuthenticationForLocalhost=yes" "-o" "NumberOfPasswordPrompts=0" "localhost" "-l"))
verbosity
A number between 0 and 10 specifying the amount of output you would like BitterSuite to produce (default: 1). This refers specifically to output visible to the person running the tests, not to the students.

mark-scheme

This is a file that contains the marking scheme that should be at the top of the student output. If no behaviour-modifying flags are supplied to computeMarks, then this is just used verbatim.

If the -n flag is used, the mark-scheme is formatted by nroff, allowing nroff directives to specify cleaner, page-size independent formatting.

If the -i flag is used, the mark-scheme is interpreted by the bash shell. This allows the use of a number of environment variables that mirror locations inside of the in hierarchy (see below) to be used internally in the marking scheme. The variables te and to specify the total earned marks and the total "out of" marks, respectively. The mark total for all tests housed under the directory in/3/2 would be t3_2o, and the earned mark would be t3_2e. The script is interpreted by the shell by doing the equivalent of

               eval echo "`cat mark-scheme`"
            

The simplest way to access the variables made available by -i is to enclose the entire contents of mark-scheme in double-quotes (and avoid the use of unescaped double-quotes elsewhere in the file). The shell will simply replace all variables with their values. The more complex way is to encase the marking scheme in $( ... ), which enables the use of scripting commands (and requires explicit output commands to construct the output marking scheme).

provided

The provided directory simply contains any files you want to be available to the student files, but which should not have been submitted. This may include Scheme modules used for marking, provided C header files, and so on.

in

The in directory is where all of the tests are kept in a hierarchy of subdirectories. Tests occur at the leaf directories, and scores propagate recursively back from there to the root directory. Any options.ss or options.scm that are encountered are parsed, and the key-values list S-expressions that are in each file modify the state of the tester at the given directory level. Options are parsed in-order, so the state changes in option N are visible to option N+1. All state changes are propagated to the child directories, but are not propagated back to the parent directory. If the -q flag is provided to computeMarks, it indicates that the top-level subdirectories of in each represent individual questions, and the autotesting output will be formatted accordingly.

Default Behaviour

BitterSuite3 provides a set of default behaviour that is meant to apply to all domain-specific languages. Some of the implementation is left to these other languages (the timeout and memory options, as well as the handling of input files), but they should honour the intent of these settings as closely as they possibly can.

Allowable options

language
This should be followed by a single string or symbol specifying which language should be used from this point on to interpret any options/files before the default behaviour is used as a fallback. Details of these alternate languages is in a later section.
value
This should be followed by a single S-expression which specifies a number representing the number of points any encountered tests will be worth. This S-expression will be evaled immediately. Ideally, this expression will result in a rational value, as in the case of (/ 4 2) or 3/5. However, expressions such as (begin (require scheme/math) pi) are also valid. The default value is 1.
desc, description
This should be followed by a single string representing a description of the current test. The default value is the empty string.
timeout
This should be followed by a single number. It represents the length of time, in seconds, that a test should be given to execute before it is timed out (likely under the assumption that it will continue executing indefinitely). The default value is 15.
memory
This should be followed by a single number. It represents the amount of memory, in Mb, a test should be allowed to consume before it is killed. The default value is 50.
diff
This should be followed by a single string. It specifies a program that takes two filenames as parameters and will be used to compare output from a student test to output from the model solution. This program will output a number representing the percentage earned on this question to file descriptor three, and an explanation for that earned mark to standard output. The default is a wrapper for diff -ibB -q which gives a mark of 100 if that command finds no differences, and 0 if it does.
thread-children
This should be followed by a single boolean. It specifies whether child directories should be processed in parallel or not. The default value is #f.

This is an option that must be treated very carefully. If any mutable state is shared among several tests, then this state may become incorrect. So in that case, while top-level directories *may* be able to run in parallel, the low-level ones would not be able to. As an example, python tests should not be run in parallel, nor should tests on a single evaluator in advanced student scheme that has global mutable state.

On the Solaris systems, an initial test produced quite unsatisfactory results. Multiple tests confirmed that on a beginning student sample, threading increased the runtime from about 2.5 minutes to about 3 minutes, and a question tested in intermediate student consistently would time out because an evaluator was not made after 15 seconds, meaning the thread scheduling algorithm is doing quite a poor job of sharing time slices. This may still be useful for one of three conditions: tests in language like external or C where the OS can take control of some processes, Linux machines where the implementation may be better, or in future versions of mzscheme that are properly multithreaded. Further testing will need to be done for confirmation on any of these scenarios.

Any other value types for these keys, or any other keys, will result in a fatal error that kills the testing suite so course staff can fix the problem.

Allowable non-option files

input
A file whose contents will be used as input on standard input for any encountered tests.

Any other encountered files will result in a fatal error that kills the testing suite so course staff can fix the problem.

Languages

The current language, as selected by the language option, is what provides interesting behaviour for BitterSuite3. Any languages that are supported directly are housed in /u/isg/bittersuite3/languages. Alternate domain-specific or testing languages may also be placed in /u/csXXX/bittersuite_languages. The current centrally-supported languages are the Scheme family (scheme/module, scheme/beginner, and so on), C, external and python. Documentation for these languages is on the ISG TWiki, but should eventually be propagated to their own man pages, named bittersuite3-scheme, etc.)

Providing Language Implementations

definitions.ss

Language definitions must be placed in a Scheme module named definitions.ss inside of a subdirectory of one of the main directories mentioned in the previous section with the same name as the language itself. This module must provide four functions, which will be called from the main testing code, and which define the behaviour for this particular language.

initialize: hashtable -> ?
Mutates the provided hashtable so it contains an appropriate default state for this language. The produced value is discarded.
parse-option: hashtable symbol [value1 ... valueN] -> symbol
Mutates the provided hashtable so its state reflects the changes dictated by the provided symbol key and the list of values provided for that key. If the return value is 'not-handled, this indicates that the key was not recognized by this handler, and the default attempts to handle the key. If the return value is the symbol 'bad-value, the suite will die with an error so course staff can fix the problem. If the return value is 'handled, this indicates to the test suite that the option was recognized and handled successfully. Any other return value is an error.
interpret-file: hashtable path -> symbol
Mutates the provided hashtable so its state reflects the changes dicated by the provided file. If the return value is 'not-handled, it indicates the language handler could not make use of the given file, and the default handler attempts to make use of it instead. If the return value is 'handled, this indicates to the test suite that the file was recognized and handled successfully. Any other return value is an error.
run-test: hashtable -> (values (union number symbol) string)
Run a test, given all of the state provided in the hashtable. There are two return values: either a number representing the percentage earned on this particular test or the symbol 'defer indicating output comparison tests need to be done later, and a string specifying a message explaining the mark earned.

hierarchy-runner/common

All language modules have hierarchy-runner added to their module search paths, which enables the use of a number of helper functions and constants when a module contains (require hierarchy-runner/common).

cond-print: integer>0 string ... -> (void)
The first parameter specifies the minimum verbosity required for the following information to be printed. If only a string is provided with no other parameters, then the string will be printed verbatim followed by a newline. If other parameters are provided, they and the initial string are passed to the format function before it is displayed with a trailing newline. The minimum verbosity should be used with care to try to ensure that the end user gets an appropriate amount of feedback printed.
additional-indent
This is a parameter that determines how much more each subsequent line should be indented by cond-print. So, for example, passing a value of 2 to additional-indent if the current value is 6 will make every subsequent line be indented by 8 space characters. The value passed must be an integer value.
cond-set!: hash-table any any -> (void)
The second parameter specifies the key. If this key already exists in the hash table, then nothing is done; otherwise, the third parameter is inserted in the hash table as the value associated with that key. This is particularly useful in the initialize function of the language modules.
hash-set!&success
Applies hash-set! to the provided arguments, and returns the value 'handled. This is particularly useful in the parse-option and interpret-file functions whenever a provided key/value pair or file can be handled successfully.

computeMarks-postprocess

If an executable by this name exists, it is run at the end of computeMarks, after all diff checking and mark-scheme processing has been done, and the default set of files have been kept for the output. It gives the option, for example, to run keepFile on any extra files that were generated during testing in this particular language.

Suite-specific configuration

There are four executables that, if they are provided in the test suite directory along with runTests and computeMarks, provide entry points into BitterSuite for various kinds of customization.

Note that these scripts often lend themselves to "quick hack" solutions that could often be implemented more properly as a customized language implementation.

runTests-preprocess
Run after student-submitted files and the files from provided are linked into the temporary directory, but before the in hierarchy has been processed.
runTests-postprocess
Run after the in hierarchy has been processed.
computeMarks-preprocess
Run near the beginning of computeMarks, after the environment has been set up but before other processing has been done.
computeMarks-postprocess
Run at the end of computeMarks, after the default files have been kept for the marking output, and after any and all language-specific postprocessors have been run.