file_filter — Perform filtering on files students submit
file_filter
Inside of the course's
/u/
there should be the following line:
csXXX
/.submitrc
file_filter=/u/isg/bin/file_filter
This is a script which defines a filter which is used on individual files passed into submit. Each course account needs to be configured to read a particular executable to perform the proper filtering as specified above.
This program first checks for file filter scripts in the following
locations:
/u/
and
csXXX
/bin/file_filter/u/
.
If a program is found in either of those locations, then it will be
executed and the behaviour specified by the rest of this script is
discarded. However, if the default filtering behaviour is still
desired, this auxiliary script can itself call
the original filter script; the full path to it will be provided
in the csXXX
/handin/assign/.file_filterISG_FILTER
environment variable.
It will then perform its
filtering tasks without again calling the auxiliary script. See the
man pages (man 4 file_filter) if you wish to
provide one of those filters. Make sure you exit with an appropriate
unique error code when errors occur so the logging system can
track this.
The first filter is against file size. Any files larger than 1 Mb are
rejected by default. To override this, simply put a maximum file size
(in bytes) inside of the file
/u/
.
csXXX
/handin/assign
/.max_size
There is also a filter for binary files. Perl attempts to verify that
the file is plain text, and if it does not appear to be, the file is
rejected. If you wish to allow binary files to be submitted for the
current assignment, then simply create a file
/u/
.
If you wish to allow only certain submitted files to be binary, then
a series of glob patterns can be placed in this file, in a format similar
to that of
csXXX
/handin/assign
/.binary_allow/u/
.
One common cause of files being flagged as binary is UTF-16 text
encoding; if the file appears to be in this format (by checking for a
byte order marker), then a customized message is provided to the user.
csXXX
/handin/assign
/.subfiles
Next is the rejection list. This is a file with a list of filenames
which unconditionally will not be accepted in a
submission. This is convenient if wildcards are used to specify
submitted files, but you wish to provide some files of your own. For
example, if *.java
is accepted, but you are going
to provide a file Apple.java
and do not want them
to be able to submit their own version. The list of files goes in the
file
/u/
.
csXXX
/handin/assign
/.reject
Then there are two Scheme-particular checks. If the file ends in .ss
or .scm
, then the filter tries to
determine if the file is in "WXME format." This is a special format
DrScheme uses to represent non-textual data, including images, comment
boxes, and non-integer numbers copied from the interactions window to
the definitions window. It converts the file into a format that is
unreadable for the marking TAs and which breaks the testing scripts,
so it's better to reject the file up front than it is to accept it and
then assign a mark of 0 after the testing is complete.
The other Scheme-particular check is to see if it is the case that
and
file
.ss
both are
being accepted for this assignment and the user is attempting to submit
both files. If this is, then both files are
rejected. The rationale is that, if the two files have different
contents, course staff have no sure way of determining which file
was intended to be submitted. As such, it is preferable to reject
both, and force the user to decide which one to submit.
file
.scm
After all other filtering has been done, there is a final filter to
limit the number of files students are allowed to submit, which can
help prevent flooding the course account with data. By default, no
more than 20 files may be submitted. To override this, place
the desired maximum number of files in the file
/u/
.
csXXX
/handin/.max_files
This filter script attempts to hack logging features into submit via a mechanism that was designed to do no such thing. Since the file filter examines single files one at a time instead of by submission, the logs also must be kept file by file instead of submission by submission. A very reasonable guess about which files were submitted together can be made by examining the timestamps in the log.
For every file submitted by every student, there is a log file
/u/
which logs the following information in order in CSV format:
csXXX
/course/assign
/student
This will allow a rough history of file submissions from students, including file rejections and, in the case of a serious error such as accidental file deletion by course staff (which is possible due to the strange file permissions submit leaves on directories), a reasonable way to check via md5sum if a file a student provides to us is the same one that was submitted.
Because the filter script runs setgid course account but remains the calling user, all files created will have ownership of the student calling submit. Because of this, it is essential that file permissions be crafted very carefully so students cannot gain access to these directories on their own, and cannot modify their own logs. Taking the following steps while logged into the course account will suffice:
mkdir /u/csXXX
/course/submitlog/assign
chmod 710 /u/csXXX
/course/submitlog chmod 770 /u/csXXX
/course/submitlog/assign
Note that if the final_script submit_subversion_hook is used, this directory hierarchy will be created automatically the first time a submission is made and will be in place for every subsequent submission.
The set of exit codes returned by the default filter are as follows:
XX
.ss and XX
.scm)