M08: More Lists

Hint: If possible, make your browser wide enough to place the slides and commentary side-by-side.

Slides

Commentary

We’re now returning to working with lists – but upping our game to include more complex situations.

Video: Download m08/m08.10_isort

video m08/m08.10_isort

Here’s the code for insertion sort, ready to pop into DrRacket.

;; (sort lon) sorts the elements of lon in nondecreasing order
;; Example:
(check-expect (sort (cons 3 (cons 4 (cons 2 (cons 5 (cons 1 empty))))))
              (cons 1 (cons 2 (cons 3 (cons 4 (cons 5 empty))))))

;; sort: (listof Num) -> (listof Num)
(define (sort lon)
  (cond [(empty? lon) empty]
        [else (insert (first lon) (sort (rest lon)))]))


;; (insert n slon) inserts the number n into the sorted list slon
;;     so that the resulting list is also sorted.
;; Example:
(check-expect (insert 3 (cons 1 (cons 4 (cons 5 empty))))
              (cons 1 (cons 3 (cons 4 (cons 5 empty)))))

;; insert: Num (listof Num) -> (listof Num)
;;     requires: slon is sorted in nondecreasing order
(define (insert n slon)
  (cond [(empty? slon) (cons n empty)]
        [(<= n (first slon)) (cons n slon)]
        [else (cons (first slon) (insert n (rest slon)))]))

We’ll come back to sorting in M15 when we study Quicksort. The end of this module discusses merging two sorted lists, which represents a large fraction of the Mergesort algorithm.

List abbreviations

Has is seemed like our notation for lists is rather laborious? Do we really need to write down cons over and over and over?

A fundamental part of lists is that they grow and shrink by one element at the front of the list. cons together with first and rest reinforces that.
With all the practise you’ve had, you are now ready for a more compact way to write lists. cons will still be important; this is just a more compact way to write lists of a known length.

List abbreviations are not available in the “Beginning Student” language level. You’ll need to change your language level. See M02 for a reminder of the steps.

Students often get confused by cons and list. Play with small examples in DrRacket until you can accurately predict what each does.

Self-Check: m08_010

(list ) is equivalent to which expression(s)?
Correct! Try again.

Self-Check: m08_020

(list empty) is equivalent to which expression(s)?
Correct! Try again.

Self-Check: m08_030

(cons "Hi" (cons 'bye (cons 3 empty))) is equivalent to which expression(s)?
Correct! Try again.

Lists containing lists

Remember that in (cons v lst), v is a value. Any value.

(cons 1 (cons 2 empty)) is a value. As a value, it can be put into a list just like any other value.

There’s a lot going on here! Take it slow and break it down into the separate pieces.

The primary concern is introducing the notion that a list can contain a list.

Second, we are just getting used to list. It was introduced because it’s really handy to write down lists that contain lists.

Third, lists that contain lists are hard to draw when the whole value (a list!) goes in a box (top diagram), so we’re also introducing a new diagramming convention (bottom diagram).

Consider the list (define lst (cons "a" (cons "b" empty))). lst is clearly a two-element list.

Replace "a" with (cons 1 (cons 2 empty)) and replace "b" with (cons 3 (cons 4 empty)) to get

(define lst (cons (cons 1 (cons 2 empty))
                  (cons (cons 3 (cons 4 empty))
                        empty))

This is still a two-element list and is the list shown in the slide except that it has a name, lst.

What is (first lst)? What is (rest lst)? Most students get this wrong, so check your answer with DrRacket. Use “Beginning Student with List Abbreviations” which will give you the answers using list rather than cons. Confused? Go back to the cons notation and work it out.

Refer back to tax-payable if you don’t remember it.

The payroll is shown visually using the two different diagramming notations. Both represent the same list. The payroll is a three element list. Each of those elements is a two-element list containing a name and a salary.

Recall the (listof X) data definition:

;; A (listof X) is one of:
;; * empty
;; * (cons X (listof X))

A payroll is the same as a (listof X) where X is a two-element list that we represent with (list Str Num). The first element is a string and the second is a number. Because the data definition has been made more specific, we give it a specific name: Payroll.

Intuitively, an empty list is a Payroll. If you want a longer Payroll, cons the name of an employee and their salary onto a Payroll.

Note the new notation: (list Str Num). That is, list can appear in data definitions just like cons and (sub1 n). It simply means a two-element list where the first element is a string and the second is a number. In general, we use (list X Y Z ...) to indicate a fixed-length list with the given types.

We will often use fixed-length lists to group information that belongs together – like a name and salary. In M10 we’ll see another way to group information, structures.

Instead of writing this data definition we could write (listof (list Str Num)) in contracts. But that doesn’t help us derive the template (and therefore code that consumes a Payroll) and a contract of Payroll -> TaxOwed is more expressive (easier to write and understand) than (listof (list Str Num)) -> (listof (list Str Num)).

We’ll use the Payroll data definition to create a template to help write our payroll code.

Our goal is to capture everything that’s known from the data definition in the template.

Recall that (first pr) produces the first element of the payroll. That first element is a two-element list with a name and salary. So (first (first pr)) produces the name while (first (rest (first pr))) produces the salary.

This code is not very readable. We could use (second (first pr)) instead of (first (rest (first pr))) but that’s only a minor improvement.

A better solution is to use well-named helper functions such as name and salary.

With a template in hand, we can now start to solve the actual problem of computing taxes for a payroll.

The slide shows the template after a number of steps in the design recipe:

  • Writing the purpose
  • Writing some examples (represented here by constants test-payroll and test-taxes; definitions aren’t shown)
  • Writing the header (including renaming payroll-template to something more problem-specific) and contract (note the use of the types we defined earlier, Payroll and TaxOwed)
  • Refining the purpose

The classic questions for a (listof X) function apply here. Their answers direct how to fill in the gaps in the template.

  1. What do you do in the base case?
    There are no taxes owed if no one earned a salary. The contract says to produce a (listof X). empty fits both observations.
  2. How do you transform the first element on the list?
    In this case, compute the taxes on the salary. Put that number together with the name into a two element list.
  3. What value is produced by the recursive processing of the rest of the list?
    A list of taxes owed by the employees on the rest of the list.
  4. How do you combine the value produced in 2 with the value produced in 3?
    Add the newly calculated taxes owed to the beginning of the list.

A really useful strategy is to break the problem into two parts:

  • the part that handles the list and
  • the part that handles one item from the list.

This makes clear the overall structure of the code is simply the (listof-X-template) and allows one to focus on the distinct part of how one item is handled.

This is very similar to the “refined” (listof X) template discussed in M06.

What we’ve just seen – a list of fixed-length flat lists – is just the beginning. That’s as complex a list structure as we’ll see in this module. In later modules we’ll explore more complex arrangements.

The remainder of this module will explore uses for lists of lists and processing multiple lists or a list and a number.

Dictionaries

An important aspect of the definition is that keys are unique. Given a key, we can look it up in a dictionary and get, at most, one value. We say “at most” because it’s possible the key isn’t there. But there won’t be two of them.

Values, on the other hand, may be duplicated. It’s unlikely but possible that the stocks for GOOGL and AAPL have exactly the same price.

lookup is also commonly called find or search. add is sometimes called insert.

You might start to think about what the contracts for these functions should be. In particular, what do they produce?

Computer scientists have come up with an amazing number of ways to implement a dictionary. The big advantage of an association list is that it is simple to implement. The big disadvantage is being increasingly slow as the dictionary gets larger (has more entries).

Other methods of implementing dictionaries trade off complexity for some set of other advantages, typically speed. Some might be blazingly fast to search but slow to insert or delete; others might try to balance those operations in different ways.

We’ll explore at least one other dictionary implementation strategy, binary search trees, in M11.

It may see easier to just write (listof (list Num Str)) in your contracts. But usually not. The term AL or “association list” helps the reader understand the intent of the parameter is probably to look something up. The listof format doesn’t convey that meaning. It’s also surprizing how quickly the keystrokes for (listof (list Num Str)) add up when writing contracts.

We have three options for documenting this data type. We could write the full data definition, as shown on the previous slide. This gives the maximum amount of help in writing the template and therefore the maximum help in writing the code. It also gives a name that is useful in writing contracts.

Because it’s a list number-string pairs, we could use (listof (list Num Str)) every place we need the data type, principally in contracts. However, he term AL or “association list” helps the reader understand the intent of the parameter is probably to look something up. The listof format doesn’t convey that meaning. It’s also surprizing how quickly the keystrokes for (listof (list Num Str)) add up when writing contracts. This form also ignores the requirement that keys are unique.

A middle ground that gives a meaningful name to use in contracts, includes the requires statement, but doesn’t help with the templates is to write

;; An AL (association list) is a `(listof (list Num Str))`
;;   Requires: each key (Num) is unique

This is very similar to the payroll template. Like it, helper functions could make this even more readable although the comments help a lot.

Returning either a string or false can’t be represented with our current contract notation. Changes coming right up! But first, let’s write the code.

This code is similar to the insert function we wrote earlier in that the template doesn’t tell the whole story. What both functions have in common is that there are two base cases; two situations in which we stop.

For insert the two stopping situations were (1) when we got to the end of the list and (2) when the item we were inserting belonged at the beginning of the list.

For lookup the two stopping situations are (1) when we get to the end of the list and (2) when we find what we’re looking for.

In both cases the template was a good starting point even though the final form of the function didn’t match it really well.

Self-Check: m08_100

Assuming the contract

foo: (anyof Num Str) (anyof 1 2 3) -> Any

which of the following are legitimate applications of foo?

Correct! Try again.

Self-Check: m08_110

Assuming the contract

foo: (listof (anyof Num Str)) -> Any

which of the following are legitimate applications of foo?

Correct! Try again.

Self-Check: m08_120

What is the most appropriate contract for lookup?
Correct! Try again.

2D data

Here is a slight revision of the countup-to code we saw in the previous module. Instead of counting to b it counts until b (that is, b is not included).

;; (countup-until n b) produces an increasing list from n to b-1
;; Example: 
(check-expect (countup-until 6 9) (cons 6 (cons 7 (cons 8 empty))))
(check-expect (countup-until 9 9) empty)

;; countup-until: Int Int -> (listof Int)
(define (countup-until n b)
  (cond [(>= n b) empty]
        [else (cons n (countup-until (add1 n) b))]))

cols-to is identical except for:

  • naming (function name, the counter is called c instead of n)
  • it has an extra parameter, r, that goes along for the ride
  • the counter, c, is transformed before being consed onto the list

cols-to constructs one row for the multiplication table by figuring out what value each column should have.

rows-to is also very similar to countup-until. This time the counter is called r. Again, we have one argument going along for the ride (nc, the number of columns that cols-to will produce for each row).

The big difference is that instead of just consing n onto the list we’re making a whole row with cols-to and consing that row/list onto the list. The result is a list of lists.

Finally, mult-table is just a wrapper function that calls rows-to with the correct values.

We will return to this example two more times. The first time we will “bundle” the two helper functions with mult-table to make a tidier package. The second time we will reduce these eight lines of code down to three lines.

Processing two lists simultaneously

In this section we’ll develop functions with two parameters, each of which is a list:

;; twolist-template: (listof X) (listof X) -> Any
(define (twolist-template lst1 lst2) ... )

Each of those two lists might be empty or not. So we might expect code similar to this:

(define (twolist-template lst1 lst2)
  (cons [(and (empty? lst1) (empty? lst2)) ...]
        [(and (empty? lst1) (cons? lst2)) ...]
        [(and (cons? lst1) (empty? lst2)) ...]
        [(and (cons? lst1) (cons? lst2)) ...]))

This is getting complicated fast! We can actually break it down into three cases:

  • We only need to process one list; the other “goes along for the ride”.
  • Both lists are the same length; one element from each is processed at each step.
  • The general case: unequal lengths, process one or the other or both.

We’ll take these three cases in order of increasing complexity with a concrete example for each.

In the first case of processing two lists, we only actually process one. The second one just goes along for the ride. That means that of the four tests identified above, we only need to consider two: whether the first list is empty or not.

Video: Download m08/m08.30_process_one_of_two

video m08/m08.30_process_one_of_two

In the second case of processing two lists, we only need to check one of the lists for empty or non-empty – but for a different reason than the first case (e.g. my-append).

Video: Download m08/m08.40_lockstep

video m08/m08.40_lockstep

The third case is the most general where the lists may be of different lengths and may be processed at different rates. As a result, either one or both could be empty. We’ll need to test all four possibilities.

We’ve focusing for the moment on the second and third possibilities where exactly one list is empty. In the template, extract the first element and the rest of the list for each non-empty list.

When one list is empty, it could be that all we need is the non-empty list without change. No recursion needed.

But it could be that the non-empty list needs further processing, leading to still more recursion.

Fortunately, each of these moves us closer to a base case. That’s important!

A short video to introduce the merging problem:

Video: Download m08/m08.50_merge

video m08/m08.50_merge

The first three examples are base cases: no further recursion is necessary because one of the consumed lists is the result. Note that either or both lists may be empty.

In the recursive cases, there is still more reasoning to be done.

If (first lon1) is the smaller one, then we need the first of the natural recursions ealier.

If (first lon2) is the smaller one, then we need the second of three natural recursions.

Here is the finished merge function.

After understanding this trace, consider the code on the previous slide again. A number of transformations are possible that reduce the amount of work the computer needs to perform.

First, note that in the first question/answer pair lon2 is empty and the answer produced could be replaced with lon2. The first two question/answer pairs are then

[(and (empty? lon1) (empty? lon2)) lon2]
[(and (empty? lon1) (cons? lon2)) lon2]

and it becomes apparent that both produce lon2 regardless of whether lon2 is empty or not. So the two pairs can be replaced with just

[(empty? lon1) lon2]

Second, the last question/answer pair can be replaced with else. We then have [else (cond ...)]. That’s considered bad style and the question/answer pairs of the inner cond can be promoted to the outer cond:

(define (merge lon1 lon2)
  (cond [(empty? lon1) lon2] ; first two cases
        [(empty? lon2) lon1] ; third case
        [(< (first lon1) (first lon2))
         (cons (first lon1) (merge (rest lon1) lon2))]
        [else (cons (first lon2) (merge lon1 (rest lon2)))]))

Is this “better” code? That depends on your metric.

It’s shorter. There are fewer instructions for the computer to perform making it faster. That’s good.

But it’s harder to understand and that’s bad.

If the code is in a library used by many people or performance really is critical, then this version is probably “better”. Otherwise, the original is probably “better”.

Aside: Mergesort

When we introduced insertion sort at the beginning of this module we mentioned that merge is a large fraction of the MergeSort algorithm. This is not officially part of CS135; but just in case you’re curious…

mergesort consumes an unsorted (listof Num) and produces a sorted list. merge produces a sorted list, too. If only we had a way to split the list that mergesort consumes and sort them, we could then merge them together to produce the entire sorted list.

Ah, but mergesort can sort those shorter lists. The only thing we need to finish mergesort are two functions to split the consumed list. one-half might produce the items in the even positions while other-half produces the items in the odd positions, for example.

Combined, we get:

(define (mergesort lon)
  (cond [(empty? lon) empty]
        [else (merge (mergesort (one-half lon))
                     (mergesort (other-half lon)))])
)

Note that this is not simple recursion! That’s because the lists consumed by the recursive application of mergesort are not one step closer to the base case. The code for merge is simple recursion.

mergesort is more complicated than insertion sort but on longer lists its speed will whip the speed of insertion sort.

List equality

It seems reasonable that someone might need a predicate to check whether two lists of numbers are equal or not. We’ll define “equal” to mean exactly the same elements in the same order. (list 1 2 3) and (list 2 3 4) are obviously not equal. (list 1 2 3) and (list 1 2 3) are equal.

(list 1 2 3) and (list 1 3 2) are not equal even though they have the same elements.

We also need to be able to compare lists with different lengths (which will always return false).

Handling unequal length lists leads to the same starting template code as we had for merge (shown in the slide). As with merge we can’t write a complete template because more problem-specific analysis is needed.

The observations on the previous slide lead directly to this code.

List & Number

Take a careful look at the contract. It consumes a list of anything. The lists here all have elements of the same type (because that’s the most common) but the lists could also have mixed type such as (list 1 'a 2 'b). Also note that the thing we’re looking for is of type Any – which may or may not match the type that’s in the list.

In the tests, it’s always true that there are at least 0 examples of anything at all on a list – even if the list is empty. 0 is one of the base cases.

;; (at-least? n elem lst) determines if elem appears
;;       at least n times in lst.
(check-expect (at-least? 0 'red (list 1 2 3)) true)
(check-expect (at-least? 3 "hi" empty) false)
(check-expect (at-least? 2 'red (list 'red 'blue 'red 'green)) true)
(check-expect (at-least? 3 'red (list 'red 'blue 'red 'green)) false)
(check-expect (at-least? 1 7 (list 5 4 0 5 3)) false)

;; at-least?: Nat Any (listof Any) -> Bool
(define (at-least? n elem lst)
  (cond [(and (zero? n) (empty? lst)) ...]
        [(and (zero? n) (cons? lst)) ...]
        [(and (> n 0) (empty? lst)) ...]
        [(and (> n 0) (cons? lst)) ...]))