Name

file_filter — Perform filtering on files students submit

Synopsis

file_filter

Usage

Inside of the course's /u/csXXX/.submitrc there should be the following line:

         file_filter=/u/isg/bin/file_filter
      

Description

This is a script which defines a filter which is used on individual files passed into submit. Each course account needs to be configured to read a particular executable to perform the proper filtering as specified above.

Filtering Features

This program first checks for file filter scripts in the following locations: /u/csXXX/bin/file_filter and /u/csXXX/handin/assign/.file_filter. If a program is found in either of those locations, then it will be executed and the behaviour specified by the rest of this script is discarded. However, if the default filtering behaviour is still desired, this auxiliary script can itself call the original filter script; the full path to it will be provided in the ISG_FILTER environment variable. It will then perform its filtering tasks without again calling the auxiliary script. See the man pages (man 4 file_filter) if you wish to provide one of those filters. Make sure you exit with an appropriate unique error code when errors occur so the logging system can track this.

The first filter is against file size. Any files larger than 1 Mb are rejected by default. To override this, simply put a maximum file size (in bytes) inside of the file /u/csXXX/handin/assign/.max_size.

There is also a filter for binary files. Perl attempts to verify that the file is plain text, and if it does not appear to be, the file is rejected. If you wish to allow binary files to be submitted for the current assignment, then simply create a file /u/csXXX/handin/assign/.binary_allow. If you wish to allow only certain submitted files to be binary, then a series of glob patterns can be placed in this file, in a format similar to that of /u/csXXX/handin/assign/.subfiles. One common cause of files being flagged as binary is UTF-16 text encoding; if the file appears to be in this format (by checking for a byte order marker), then a customized message is provided to the user.

Next is the rejection list. This is a file with a list of filenames which unconditionally will not be accepted in a submission. This is convenient if wildcards are used to specify submitted files, but you wish to provide some files of your own. For example, if *.java is accepted, but you are going to provide a file Apple.java and do not want them to be able to submit their own version. The list of files goes in the file /u/csXXX/handin/assign/.reject.

Then there are two Scheme-particular checks. If the file ends in .ss or .scm, then the filter tries to determine if the file is in "WXME format." This is a special format DrScheme uses to represent non-textual data, including images, comment boxes, and non-integer numbers copied from the interactions window to the definitions window. It converts the file into a format that is unreadable for the marking TAs and which breaks the testing scripts, so it's better to reject the file up front than it is to accept it and then assign a mark of 0 after the testing is complete.

The other Scheme-particular check is to see if it is the case that file.ss and file.scm both are being accepted for this assignment and the user is attempting to submit both files. If this is, then both files are rejected. The rationale is that, if the two files have different contents, course staff have no sure way of determining which file was intended to be submitted. As such, it is preferable to reject both, and force the user to decide which one to submit.

After all other filtering has been done, there is a final filter to limit the number of files students are allowed to submit, which can help prevent flooding the course account with data. By default, no more than 20 files may be submitted. To override this, place the desired maximum number of files in the file /u/csXXX/handin/.max_files.

Logging Features

This filter script attempts to hack logging features into submit via a mechanism that was designed to do no such thing. Since the file filter examines single files one at a time instead of by submission, the logs also must be kept file by file instead of submission by submission. A very reasonable guess about which files were submitted together can be made by examining the timestamps in the log.

For every file submitted by every student, there is a log file /u/csXXX/course/assign/student which logs the following information in order in CSV format:

  • filename
  • timestamp
  • exit code of the filtering portion
  • file size
  • MD5sum of the file

This will allow a rough history of file submissions from students, including file rejections and, in the case of a serious error such as accidental file deletion by course staff (which is possible due to the strange file permissions submit leaves on directories), a reasonable way to check via md5sum if a file a student provides to us is the same one that was submitted.

Because the filter script runs setgid course account but remains the calling user, all files created will have ownership of the student calling submit. Because of this, it is essential that file permissions be crafted very carefully so students cannot gain access to these directories on their own, and cannot modify their own logs. Taking the following steps while logged into the course account will suffice:

            mkdir /u/csXXX/course/submitlog/assign
            chmod 710 /u/csXXX/course/submitlog
            chmod 770 /u/csXXX/course/submitlog/assign
         

Note that if the final_script submit_subversion_hook is used, this directory hierarchy will be created automatically the first time a submission is made and will be in place for every subsequent submission.

The set of exit codes returned by the default filter are as follows:

  • 1 - Course staff error prevented student submission
  • 5 - File was rejected because it surpassed the maximum size limit
  • 6 - File was rejected because it appeared to be binary instead of plain text
  • 7 - File was rejected because it was flagged in the rejection list
  • 8 - File was rejected because it appeared to be in PLT Scheme's unusable WXME format
  • 9 - File was rejected because an equivalent file from a common file-ending group also existed, and only one is permissible (for example, matching XX.ss and XX.scm)
  • 10 - File was rejected because the maximum number of allowed files was exceeded