M10: Structures

Hint: If possible, make your browser wide enough to place the slides and commentary side-by-side.

Slides

Commentary

The idea of compound data has been around for a very long time. Our presentation of compound data in CS135 and the language we use to describe it is heavily influenced by Object-Oriented Programming, a programming paradigm you’ll learn more about in CS246 (typically a 2A course).

We have predicates we can use to idenfity other data types: number?, list?, boolean?, string? and symbol? consume an Any and produce a Bool. Can we do the same for a student?

We can’t just check for a three-element list. There are lots of lists that fit that description. What we’ll do is embed an extra value in the list that is highly unlikely to be in any other list. If that value is present, then we’ll assume the list represents a student. Otherwise, we’ll assume it does not.

The data definition allows us to use Std in contracts.

make-std makes a Std value. It takes the three attributes of a student (name, program, classes) and bundles them together with STD-TAG, that extra value discussed on the previous slide.

Here’s the student predicate. Note that it can consume anything at all. So we need to make sure it’s a non-empty list (cons?). A Std will have the three pieces of data (name, program classes) and that extra value (length 4). Finally, STD-TAG needs to be the first value.

An even more robust check would ensure that the name and program are strings and the last item is a list. We’re not doing that because the Racket feature we’re leading up to doesn’t do it either.

The function std-name selects the name part of a student value. For that reason it’s called an “selector function”.

std-name checks to ensure that it consumed a Std value. If it didn’t, it gives an error message. The built-in function error is useful for that. check-error is used to test that an error was produced appropriately.

You can hopefully appreciate, already, that having structures that hold related values is really useful. Having robust error checks that help identify errors early is also really useful.

But, it’s SO much work! Is it really worth it?

Great News! We can have all that really useful goodness with almost no work! All it takes is one line of Racket code. (define-struct ...) creates all the functions we laboriously defined over three slides – without the need to test them!

The define-struct is followed by a data definition that says what types each of the structure’s fields are expected to be. The data definition for a structure always follows the form

;; A <type-name> (<English-name>) is a (make-<sname> <type> ...)

where you fill in the parts between angle brackets. <sname> is the first argument to define-struct.

The relationship between the define-struct and its data definition is similar to a function and its contract. Just as you wouldn’t dream of writing a function without a contract, you shouldn’t have a structure definition without a data definition.

The data definition is just a comment, so it has no meaning to Racket. Therefore, Racket does not know what types we expect each field to have. It can’t check, for example, that the name and program are strings and that classes is a list.

If define-struct is given n field names, it will produce n+2 functions. Each function will include <sname> in its name, where <sname> is the first argument to define-struct.

The first of the created functions is make-<sname>. It’s called a “constructor” function and is used to construct values.

The second is a predicate.

Then there is one selector function for each of the n fields. Each is named <sname>-<fname> where <fname> is the name of a field.

Here’s an example that could turn up in homework.

The last note means that (make-std "Jo" "CS" (list "Math 137")) is a value, just like 3.14, "Hello, world!", and (list 1 2 3) are values.

We represented our version of a student as a list. Internally, DrRacket may do that; or it might not. We don’t know.

We need to figure out the substitution rules for structures. This slide formalizes what we’ve just said in preparation for those rules.

Self-Check:

Given (define-struct 3d (x y z)), which of the following are values?
Correct! Try again.

Here’s an example of the substitution rules in action.

The fields are listed in the same order in the template as in the define-struct so a new structure can be easily constructed with them.

Students who have programmed in an imperative language will often ask how to change the contents of a field. The answer is that you don’t. You create a new structure with the same information except for the field(s) you want to change.

Self-Check:

Given (define-struct 3d (x y z)), which of the following functions would normally appear in the template for functions known to be consuming a 3d structure?
Correct! Try again.

The following is worked example, as you might need to prepare for an assignment. However, the test data is repeated on each slide, which you obviously wouldn’t do.

There are two data definitions here. The first is for Std (a student) and is associated with the structure definition. It is required.

The second is the data definition for a Classlist. We could do without it by writing (listof Std) in contracts, but this is so much easier and more understandable. There is no Racket code that corresponds to this data definition.

In the examples, we can refer to all the information for a given student with just that student’s identifier (e.g. aj). That’s the big advantage of structures!

class-names consumes a Classlist which is simply a (listof Std). So the template to start with is the listof-X-template.

;; listof-X-template: (listof X) -> Any
(define (listof-X-template lox)
  (cond [(empty? lox) ...]
        [(cons? lox) (... (first lox)
                          (listof-X-template (rest lox)))]))

But notice that (first lox) is a student. So if we were to write out the full classlist-template we would apply the std-template:

;; classlist-template: Classlist -> Any
(define (classlist-template clst)
  (cond [(empty? clst) ...]
        [(cons? clst)  (... (std-template (first clst))
                            (classlist-template (rest clst)))])
)

There is a close correspondence between this version of the template and the code for class-names.

add-std is a straight-forward list-processing function. It uses the std-name selector function to access the name of each student on the list as well as the name of the student to add.

all-enrolled? could also be written without the cond as a pure Boolean expression.

Types like Student and Classlist are for our benefit only. They help us understand the data that our functions consume and produce.

However, DrRacket doesn’t know anything about them; they only appear in comments. That’s different than the structures ustd and gstd. Racket does know about them, thanks to the define-struct expressions.

An alternative for student-template would define and then use ustd-template and gstd-template:

(define (ustd-template s) 
   (... (ustd-name s) ... 
        (ustd-prog s) ... 
        (ustd-classes s) ...))
(define (gstd-template s) 
   (... (gstd-name s) ... 
        (gstd-prog s) ...
        (gstd-supervisor s) ... 
        (ustd-classes s) ...))

(define (student-template s) 
  (cond [(ustd? s) (... ustd-template s) ...)]
        [(gstd? s) (... gstd-template s) ...)]))

Implementing a function would then mean writing three functions based on these templates. For a simple function it may not be worth the work. But for a more complex function, breaking the problem down into parts makes the solution more approachable, easier to test, and in general easier.

filter-prog is a straight-forward application of the list template.

The mixed data comes into play with the helper function, in-prog?. It tests whether the student is an undergrad or a grad student, and applies the appropriate program selector function to select the program.

Towards the end of the course we’ll study “abstract list functions”, including the built-in function filter. Using filter we can substantially reduce the amount of code to solve this problem.

Self-Check:

Consider the following code and previously defined data definitions. What would be the most appropriate contract for f?

(define (f x)
   (cond [(gstd? x) x]
         [(ustd? x) (make-gstd (ustd-name x) 
                               (ustd-prog x) 
                               "Ian" 
                               empty)]
         [(symbol? x) 15]))
Correct! Try again.

The fact that constructor functions do not type-check their arguments means that we can write the following nonsense:

(define-struct ustd (name prog classes))

(define (dist s1 s2)
  (sqrt (+ (sqr (- (ustd-name s1) (ustd-name s2)))
           (sqr (- (ustd-prog s1) (ustd-prog s2)))
           (sqr (- (ustd-classes s1) (ustd-classes s2))))))

(check-expect (dist (make-ustd 0 0 0)
                    (make-ustd 3 4 0)) 5)

The above “works”. But is sure is confusing to use a student structure to do a 3D distance calculation!

The safe-make-ustd prevents this kind of misuse of a structure.

Why might you do this?

  • Catching errors as early as possible makes debugging easier.
  • Some applications are safety-critical; a bug could mean someone dies. This technique helps ensure that the program is using valid data, particularly at important points in the computation.
  • This can be done with regular functions, too. It doesn’t have to be a constructor function.

Here are two different ways to construct a class list. The first uses a list of structures. The second uses a list of lists.

Self-Check:

What is '()?
Correct! Try again.

Self-Check:

Which of the following produce true?
Correct! Try again.

Self-Check:

Which of the following produce true?
Correct! Try again.

Self-Check:

What is the representation of the following in list notation?

(cons 4 (cons "Rose" (cons 'big empty)))

Correct! Try again.

Self-Check:

What is the representation of the following in quoted list notation?

(cons 4 (cons "Rose" (cons 'big empty)))

Correct! Try again.