# M11: Trees

Hint: If possible, make your browser wide enough to place the slides and commentary side-by-side.

### Commentary In this video we’ll look at some introductory examples (including one not on the slides) and the definition of a tree.     In some trees, each internal node will always have exactly two children while others may have an unlimited number. That’s a useful thing to know when writing the code! Similarly, it’s useful to know whether the order of the children matters or not.

We’ll be refering back to these characteristics for most of the trees that we discuss.

In the mean time, consider the binary expression tree. Each internal node has exactly two children. That’s implied by the “binary” in the name – it only deals with operators having a left and a right operand. If we added/allowed an unary minus then each internal node would have at most two children (because the unary minus would have only 1).

Later in the module we’ll look at another kind of arithmetic expression tree that can have any number of children. It represents `(+ 1 2 3 4)` as a “plus” node that has four children.

The order of the children matters in a binary expression tree. If we swapped the children of the division operator, we would get a different result when evaluating the expression. The order of the children in the phylogenetic tree does not matter.

All of the trees we’ve looked at so far derive their structure from the data. That won’t be the case for a binary search tree (BST). In that case the structure will be very important to enable fast searching, but it will be a structure that we decide rather than one that comes from the data itself. ## Binary Trees

Binary trees are about as simple as trees get. Characteristics:

• Each internal node has at most two children.
• Our examples will have labels on all the nodes, although that’s not a requirement of binary trees.
• Order of the children does not matter.
• Structure is for convenience. Work through the next four slides. The last one is deliberately incomplete. Attempt to complete the template on your own. Then watch the video to check your process and the result. The `BT` data definition is an example of mixed data, which we studied in the previous module. One case is a particular kind of list, an empty list. The other case is a structure.

We could have used many other values instead of `empty` – any value that we can distinguish from a `Node`. We could have used `0` or `false` or `'emptyTree`. We chose `empty` simply because its an existing value that strongly suggests “empty tree”. The fact that `empty` has significance for lists is not relevant here. Self-Check:

Hogwarts professors are debating how to start the binary tree template:
Correct! Try again.  It’s worth taking a look at the `check-expect`s to see how a tree is built. Use the structure constructor `make-node` and provide it with three values: the key at the root of this (sub)tree, the left subtree and the right subtree.

The subtrees may be `empty` or they may be a non-empty subtree constructed with (you guessed it!), `make-node`. Note the similarity to the template for `sum_keys`, `count-nodes`, and `increment`. That’s no accident!

It’s important to break tree problems like these into recursive subproblems. Each of these three short videos talks about the strategy for doing that. Self-Check:

Which pattern of recursion do `sum-keys`, `count-nodes`, and `increment` use?
Correct! Try again.  Self-Check:

How does the efficiency of `search-bt` compare to searching a list?
Correct! Try again.

`contains?` might be a better name for `search-bt`. It is a predicate, after all, and the question mark in `contains?` follows our convention for predicates.

Since it produces a Boolean, we might think about whether we can write this as a Boolean expression, without the `cond`.

Consider:

``````(define (contains? tree k)
(or (= k (node-key tree))
(contains? (node-left tree) k)
(contains? (node-right tree) k)))
``````

Self-Check:

Does this implementation of `contains?` fail?
Correct! Try again.  Remember the quote notation from the previous lecture module. `'(right right left)` is short-hand for `(list 'right 'right 'left)`.

What’s the path to `6`? Start with what’s given: the path from 6 to 9. Based on that, it seems reasonable that `'(right right)` would be the path from 6 to one of the 3’s and just `'(right)` is the path from 6 to 10. Making the path one element shorter, `'()` or `empty` gives the path from 6 to 6.

This is shown in the second `check-expect` on the next slide.  What’s the strategy for solving this problem?

Well, a tree has three important parts: the root, the left subtree and the right subtree.

If the root contains `k`, the value we’re searching for, produce `empty` and we’re done.

Otherwise, if only we could find a path to `k` in the left subtree, we could `cons` `'left` onto that path and we’d have the answer from the root.

If we can’t find `k` in the left subtree, we could look in the right subtree. If we can find a path there, all we need to do is `cons` the value `'right` on to that path and we’d have the path from the root to `k`.

If `k` isn’t in the root, isn’t in the left subtree, and isn’t in the right subtree, then it’s not in the tree at all and we should produce `false`.

How do we check if `k` is in the left subtree? Apply `search-bt-path`. If it produces a list, we know `k` was found. We could also check for the result to be not `empty`.

However, after we’ve checked if `k` is in the left subtree we still need to have the path to `k` so we can produce `(cons 'left path-to-k)`. To get the path to `k` we need to call `search-bt-path` a second time.

This strategy leads to the code shown on the slide. The code is correct and works as intended. However, that second application leads to incredibly inefficient run-time, as we saw in max-list. We’ll fix that in a moment.

Before we do that fix, note that this code doesn’t look that much like the binary tree template:

``````;; bt-template: BT -> Any
(define (bt-template t)
(cond [(empty? t) ...]
[(node? t) (... (node-key t)
(bt-template (node-left t))
(bt-template (node-right t)))]))
``````

There are some points of similarity:

• Both check for the empty tree as the first clause in the `cond`.
• Both apply themselves recursively to both the left and right subtrees.
• Both do something with the key at the root.
• The last four clauses of the `cond` could have been nested inside the 2nd clause of the template, making it look more similar.

The next version of `search-bt-path` actually looks much more like the template while also solving that efficiency issue. We solved `list-max` three different ways:

• The very, very slow version which applied `list-max` recursively multiple times for each recursive application.
• Using a helper function, `max` (which happens to be built-in).
• Using an accumulator.

We could use either a helper function or an accumulator to solve this efficiency problem, too. The helper function approach is simpler and is the approach pursued here.

The insight is that once a value is passed to a parameter it can be used multiple times without recalculating it. So we search for `k` in both the left and right subtrees, passing the results to the helper function. `choose-path` can use those values multiple times without incurring the efficiency penalty.

One might object that this always searches the entire tree. In particular, the right subtree is always searched even if `k` has already been found in the left subtree. Isn’t that wasteful? Yes. But not nearly as wasteful as those double applications we got rid of. We could improve this further, but it would be a pain. Besides, we’ll see better approaches in the next module.    Self-Check:

Which of these are not binary search trees? Correct! Try again.

Given:

• 0 nodes, there is exactly 1 binary search tree (the empty tree).
• 1 node, there is exactly 1 binary search tree.
• 2 nodes, there are two possible binary search trees.
• 3 nodes, there are five possible binary search trees.
• 4 nodes, there are fourteen possible binary search trees.

What’s the pattern? Can you write a (recursive) function that consumes a Nat and produces the number of binary search trees with that number of nodes?

Self-Check:

Given an unlabelled binary tree (e.g. just the picture without the numbers) with four nodes and the numbers 10, 18, 30, and 36, how many ways are there to place the numbers in the binary tree such that the result is a binary search tree?
Correct! Try again.   Once again, this function doesn’t look that much like the binary tree template. Like the template, it distinguishes between the empty tree (first clause) and non-empty tree (the other three clauses). Like the template, it recurses on both the left and right subtrees (but selectively, depending on the value of the key).

The function could be rewritten to echo the template more closely:

``````(define (search-bst n t)
(cond [(empty? t) false]
[else <boolean-expression>]))
``````

The question is, of course, what is the `<boolean-expression>`?. Once you’ve got that, it’s a simple matter to get rid of the `cond` complately. Remember the `increment` function we examined earlier? It consumed a tree, t, and produced a transformed tree, t'. This is doing essentially the same thing and will use the same techniques.      Refer back to the code for `search-bst` and note how little has changed:

• Add `val` to the `node` structure.
• If `(= k (node-key t))` is `true`, produce `(node-val t)` instead of `true`. Self-Check:

Suppose you wanted a dictionary to store stock market quotes. For example, “AAPL” (Apple Computer) is \$445 and “GOOG” (Alphabet, the parent company of Google) is \$1,494.

Which of these changes would you need to make to the `search-bst-dict` slide?

Correct! Try again.    If you’re curious about how these trees related to the COVID-19 pandemic, take a look at this video. It’s short and not required for CS135, but is on topic.       What’s the strategy for solving this problem? If `t` is a common ancestor, then count the number of current species in each of its two subtrees. Add those numbers together and we’re done. If `t` is a current species, we have exactly one current species (because there are no subtrees). So just return `1`.

How do we count the number of current species in a subtree? The same way we count an entire tree – see the previous paragraph.

Compare this function to the `evotree-template` we just wrote. What are the differences?

• Function names, of course.
• In `count-current`, throw away the unneeded fields from the structure and just return `1`.
• In `count-ancestor`, throw away two unneeded fields and add a `+`.  “Visiting” a node just means doing something with it. It might be checking to see if it has the value you’re looking for, collecting some information from it, or transforming it in some way. The key idea is that each node is visited exactly once.

The choice of “pre-order”, “in-order” or “post-order” traversal often affects the function’s result. That will be the case for listing the names contained in an EvoTree. The functions presented here are based on the `evotree-template` with functions renamed and irrelevant fields discarded.

The contract for `list-names` says that it produces a `(listof Str)`. If we look at the code for `list-names` we see that whatever `list-cnames` and `list-anames` produced is what `list-names` produces. Therefore, each of them must also produce a `(listof Str)`. Add that to their contracts.

In `list-cnames` the name is a single string. The natural thing to do is just produce that string, but that violates the contract. The contract now says it needs to produce a list of strings. So change the body to `(list (currrent-name cs))`.

In `list-anames` we again need to produce a list of strings. The recursive calls will do exactly that (assuming they implement their contracts). The easiest way to combine several lists into one list is with `append`. But that doesn’t work for the ancestor’s name (which is just a string). So like `list-cnames`, use `list` to turn it into a one-element list that can be used with `append`.

The finished code is:

``````;; list-names: EvoTree -> (listof Str)
(define (list-names t)
(cond [(current? t) (list-cnames t)]
[(ancestor? t) (list-anames t)]))

;; list-cnames: Current -> (listof Str)
(define (list-cnames cs)
(list (current-name cs)))

;; list-anames: Ancestor -> (listof Str)
(define (list-anames as)
(append (list (ancestor-name as))
(list-names (ancestor-left as))
(list-names (ancestor-right as))))
``````

Self-Check:

``````(define (list-anames as)
(append (list (ancestor-name as))
(list-names (ancestor-left as))
(list-names (ancestor-right as))))
``````

The code above produces which list for the following tree? Correct! Try again.

Self-Check:

``````(define (list-anames as)
(append (list-names (ancestor-left as))
(list (ancestor-name as))
(list-names (ancestor-right as))))
``````

The code above produces which list for the following tree? Correct! Try again.

Self-Check:

``````(define (list-anames as)
(append (list-names (ancestor-left as))
(list-names (ancestor-right as))
(list (ancestor-name as))))
``````

The code above produces which list for the following tree? Correct! Try again. When we implement a function to list the names using an accumulator, `list-names` becomes a wrapper function. The real work is done with `list-names/acc`. It uses a parameter, `names`, which is the list of names seen so far in the traversal of the tree.

As with our `evotree-template`, we split the problem into two subproblems: listing the names if the subtree is a current species and listing the names if it is an ancestor species.

Each of the helper functions consumes the list of names seen so far and produces that list plus the names in the (sub)tree `t`. `list-cnames` consumes the list of names seen so far and simply `cons`es on the name of the current species, thus producing a list of strings as indicated by the contract.

Figuring out `list-anames` is a little harder. It needs to apply `list-names/acc` to each of the subtrees. It also needs to supply the names of the nodes visited so far in each of those applications.

It’s easy (but wrong) to think that we just do it like we did with the first version:

``````(append (list (ancestor-name as))
(list-names/acc (ancestor-left as) names)
(list-names/acc (ancestor-right as) names))
``````

The problem with this is that when `list-names/acc` is applied the second time the `names` argument is wrong. Nodes have now been visited (the first application) that are not included in `names`.

The solution is to get rid of `append`. `(list-names/acc (ancestor-right as) names)` produces a list of names from the nodes that have been visited. Use that list in the second application of `list-names/acc`, as shown in the slide.

Self-Check:

Which traversal does the code shown in the slide implement?
Correct! Try again.  ## Binary expression trees

Binary expression trees were one of the motivating examples at the beginning of this module. To briefly recap, a binary expression tree structures arithmetic expressions into a tree which makes it easy to calculate the value of the expression. This core idea is used in programs that consume an expression from a user and then calculates the value. It’s also used in compilers and interpreters that consume source code (like your Racket programs!) and execute them. What’s in a name? Let’s start at the back:

• Tree: we’re organizing information into a tree structure
• Arithmetic Expression: The information we’re organizing are “arithmetic expressions” – expressions that represent arithmetic, like 5 + 2. There are other kinds of expressions. For example, we could represent Racket `cond` expressions in a tree. But we’re not trying to handle them at this time.
• Binary: We’re not even trying to handle all arithmetic expressions, only those that have two operands. So functions like square root or cosine are out. So is the unary minus operator. The smallest possible binary arithmetic expression tree is a single number (the first example). The next smallest tree is two numbers and an operator (see the second example). A binary arithmetic expression tree can grow to be arbitrarily large. Calculating the value of this BinExp is simple: calculate the value of the left subtree, calculate the value of the right subtree, and then divide.

Calculating the value of the right subtree, `(5 - 3)`, is similar: calculate the value of the left subtree, calculate the value of the right subtree, and then subtract. When the subtree is just a number, the value of the subtree is that number.

When an expression is represented this way, the tree automatically encodes the precedence rules. For example, `1 + 2 * 3` is interpreted as `1 + (2 * 3)` or 7 thanks to our usual rules. That corresponds to an expression tree with `+` at the root and `2 * 3` in the right subtree.

If we wanted the other interpretation, `(1 + 2) * 3`, the expression tree would have the `*` at the root and `1 + 2` in the left subtree. Either way, the tree eliminates the need for the parentheses or special rules. The template follows naturally from the data definition.

The `binode-template` could have been absorbed into the `binexp-template`. The `eval` function consumes a `BinExp` and produces its value as a number.

If the binary expression is just a number, the value is that same number. Otherwise, it’s a more complicated internal node with subtrees – so we apply a function based on the `binode-template`. For each of the four arithmetic operators, we check which one it is, calculate the values of the left and right subtrees (using `eval`) and then do the right thing for that operator using the built-in function.

Unfortunately, there is a lot of repeated code here:

• We evaluate the left subtree at four different places.
• We evaluate the right subtree at four different places.
• We extract the operator from the `binode` structure at four different places.

In the following slide we pull these three operations into `eval` and pass the results to `eval-binode` instead of the subtree itself. The result is much cleaner code without the repetition. Self-Check:

What is the contract for `eval-binode`?
Correct! Try again.

What happens if the binary arithmetic expression tree contains a modulus operation? Nothing evaluates to `true` in `eval-binode`'s `cond` expression, generating an error. It would also be easy to implement the modulus operator, but then what about exponentiation? It might be wise to add an `else (error "Unimplemented operator")` to make debugging this kind of error easier. Try it!

In Module 13 we’ll be able to take this one step farther and even eliminate testing which symbol we have in the expression tree. What’s the secret? We’ll store the function itself in the tree instead of a symbol. It’ll be fun! ## General trees

Handling any number of children needs a better solution than just adding a bunch more fields to the node structure. The simplest solution we have for handling an unknown number of things is a list.

That is, we’ll put the children (subtrees) into a list. In the example, note that `+` at the root has four children. This tree represents the Racket expression on the previous slide.

The term “general tree” simply means there aren’t restrictions on the number of children a node may have.

To evaluate this tree, we evaluate all of the subtrees (four of them!) and then apply the operator at the root (`+`) to those values. Note that the operator must be able to handle a variable number of children.  In these examples, everything comes in pairs: two (related) data definitions, two function templates, two functions.

Note that the `ainode` structure contains a list of arithmetic expressions to handle the variable number of children in the tree. That will add a little more complexity to our templates and functions. Remember the guidelines for writing templates! Most of these examples are using quote notation to conserve space.

To help visualize our data definition, lets draw on the diagrams we did earlier for lists and add a new element: structures. To represent a structure, we’ll use a rounded box for each field. Each box is labeled with the field’s name. The contents of the field will be drawn inside the box.

For example, here is an `ainode` structure corresponding to the third example. It has two fields, `op` and `args`. The contents of the first field is the `+` symbol. The contents of the second field is a list of numbers. As we saw earlier, when things get complicated in our diagrams we won’t nest one structure inside another. Instead, we’ll draw an arrow to it. Here is a representation of the fourth example: The point, here, is that the `args` field of the `ainode` structure contains a list. That allows us to handle a variable number of subtrees.

How would you diagram the first and last examples?

Self-Check:

Given the following data definitions, what needs to change to include Boolean expressions?

``````(define-struct ainode (op args))                              ; (1)
;; An AINode is a (make-ainode (anyof '* '+ (listof AExp)))   ; (2)
;; An AExp is (anyof Num AINode)                              ; (3)
``````
Correct! Try again. Here’s the template for functions that consume an Arithmetic Expression (`AExp`). Perhaps the most straight-forward development of the templates would have resulted in three functions: `aexp-template`, `ainode-template`, and `listof-aexp-template`.

`aexp-template` is as we’ve so often seen before. Just distinguish the two cases:

``````(define (aexp-template ae)
(cond [(number? ex) (... ex ...)]
[(ainode? ex) (... (ainode-template ex) ...)]))
``````

The `ainode-template` is a straight-forward structure template with the extension of asking what the type of the `args` field is and applying the relevant template:

``````(define (ainode-template ex)
(... (ainode-op ex)
... (listof-aexp-template (ainode-args ex))
))
``````

`listof-aexp-template` is just the list template, appropriately renamed.

We’ve chosen to combine `aexp-template` and `ainode-template` since `ainode-template` is a pretty simple structure template. It would be a mistake to try to combine all three into a single function. This `eval` function is a straight-forward application of the template to the problem at hand: evaluating the expression. The names of `eval` and `apply` are chosen deliberately because they exist in the full Racket language with similar purposes.

Note the examples:

• The first one is an instance of the smallest possible expression – a single number.
• The second one involves multiple arguments and is a typical case.
• The third one is interesting. It corresponds to the Racket code of `(+ )`. Our data definition does not prohibit this. The next Racket language level will allow it. So what is `(+ )`? Zero. One way to explain it is to work from an accepted example. `(+ 3 4)` is obviously 7. What would `(+ 3)` be? Seems like 7 - 4 or 3. And then `(+ )` would be 7 - 4 - 3 or 0. Notice that 0 is the identity for addition. What about `(eval (make-ainode '* '()))`? `apply` is based on the `listof-aexp-template`. It looks just a little different because we’re splitting the `else` clause into two checks, one for `'+` and one for `'*`.

In the base case, when the list of arguments is empty, we check the operator and produce the appropriate identity.

If the operator is `'+`, the list template suggests that we should evaluate the first thing (expression) on the list and then somehow combine that with the result of recursively processing the rest of the list. Evaluating the first expression and processing the rest of the list of arguments both produce numbers. When the operator is `'+`, the natural way to combine those two values is with the `+` function.

Multiplication is similar. This is a L-O-N-G trace, even in its condensed form. One approach is to just work your way through it from start to finish.

Another approach is work your way through the steps until you reach a recursive function application. Then assert the inductive hypothesis (that is, assume it will produce the correct value given its purpose). Then skip ahead and insert that value and finish the trace.                     