M04: Simple Data

Slides

Commentary

So far our programs have only been able to use numbers. We need to broaden that out to other kinds of data, most notably Boolean values (true or false; yes or no). Boolean values will be the key to programs that do different things, depending on the data they are given.

We’ll also look at symbols and strings.

A literal is the way you write something down. The literal for the number seven is 7.

The term Boolean comes from the name of the British mathematician George Boole, who defined an algebra on these values.

The sample application, (= x y), will give an error unless x and y have been defined. To test this in DrRacket, you might enter

(define x 3)
(define y 4)
(= x y)

x and y could also be provided by a function’s parameters.

and and or are actually special forms, like define. That is, they look like functions but actually are not because their arguments are not evaluated before the “function” is applied.

You may have noticed already that special forms are typeset in a bolder font in the slides. The special forms we’ve seen so far are define, and, and or. Another one, cond, will be introduced shortly.

The practise of only evaluating as many arguments as necessary is sometimes called “short circuiting”.

In the first example, a computer can determine if a number is odd or greater than 2 quickly and easily. In contrast, determining if a number is prime may involve a huge number of division operations making it a very slow test.

The following stepper shows the short-circuit evaluation but replaces prime? with a simpler test.

In the steep and not-steep examples, the expressions are logical opposites. For any pair of values for delta-x and delta-y exactly one of the two functions will be true and the other will be false.

steep? avoids division by zero because or stops evaluating arguments (and thus doesn’t reach the division by zero) when it finds that the first one is true – and thus the whole expression must be true.

Similarly, the and in not-steep? stops evaluating arguments when it finds that one is false – and thus the whole expression must be false.

A convention is something that isn’t necessary but is commonly done. In this case, it’s commonly done to improve the readability of the code.

We’re already seen predicates that do not follow this convention: <, =, etc.

The following video introduces conditional expressions, the key tool in Racket that allows us to vary the execution of the program depending on the conditions it encounters.

It uses a slightly more complex example with four question/answer pairs.

Video: Download m04/m04.50_cond

video m04/m04.50_cond

Properly nested brackets: [()]. Improperly nested brackets: [(]] or [(]).

The code shown on the slide would normally be included in a function. Here it’s named my-abs because Racket won’t let us redefine the built-in function.

(define (my-abs x)
   (cond [(< x 0) (- x)]
         [(>= x 0) x]))

The code shown is a straight-forward (and correct!) implementation of the conditions shown on the previous slide.

But when (>= grade 40) is evaluated in the second question/answer pair, we already know that grade is at least 40. Because of Racket’s top-to-bottom evaluation of a cond, we cannot reach the second question/answer pair unless grade is at least 40. We don’t need to test it again.

Similarly, we cannot reach the third question/answer pair unless grade is at least 50 (and therefore don’t need to test again).

Note that the code as given on the previous slide will produce the correct answer if the order of the question/answer pairs is rearranged. This code depends on the order. Rearrange it and you’ll get wrong answers.

Earlier we made the distinction between closed-box and open-box tests. Now that we have cond expressions, the need for open-box tests is more evident. We need to make sure that all of the code in the conditional is tested and thus must choose our test cases carefully with the specific questions in the conditional in mind.

Please watch the video for more details.

Video: Download m04/m04.60_testing

video m04/m04.60_testing

Closed-box testing is also known, more crypticly, as “black-box” testing. The corresponding term for “open-box” is “white-box”.

This tax computation is an extended example to pull together a bunch of things we’ve learned:

  • using a cond expression
  • defining constants
  • using a helper function
  • choosing test cases (and testing!)
  • using the design recipe process.

The background is using “interval” notation. A square bracket means the value is included while a round bracket means it’s excluded. So “(45000, 90000]” is a concise way of saying “all values x such that 45000 < x <= 90000”.

Note that we’re following the design recipe. We did a draft of the purpose; now examples.

Note that we have some “magic numbers” appearing – numbers like 0.15 and 45000. It might be good to make these into constants.

There are also some repeated calculations based on those proposed constants:
0.15 * 45000 occurs twice. If we did an example for $160,000, 0.15 * 45000 + 0.20 * (90000-45000) would show up multiple times as well. These might be good constants to have.

The tax payable calculation is split into a number of cases. That makes the calculation a natural fit for the cond expression. The question/answer pairs are “what’s your income?” and “here’s how much tax you’ll owe.”

The questions have already been simplified based on their order.

What is the minimum number of tests for this code? Refer back to the course-after-cs135 example if you need a refresher.

A helper function is simply a function that is defined to help simplify the definition of another function. More in a moment.

Reducing repeated code is sometimes referred to as “DRY”: “Don’t Repeat Yourself”.
Being DRY reduces the amount of code you need to write, debug, and maintain. If there is a bug, there’s only one place to go to fix it.

Factoring out complex calculations gives a separate function that is usually easier to test.

Humans can’t keep too much in our heads at once. We need to chunk things. Factoring out calculations into a helper function is one way to do that. It’s often worth it even if the helper function is only used once.

Having meaningful names in the code to identify an operation also helps with that chunking. “Oh yeah, I know what that does.”

Use judgement and don’t go overboard on providing short definitions just to introduce a name.

Racket’s grandparent language, Lisp, was created in 1958 largely out of frustration with the limited abilities of existing languages to handle non-numeric data.

The contract for symbol=? is Sym Sym -> Bool. It consumes two symbols and produces a Boolean result.

Comparing symbols for equality is pretty much the only thing we’ll do with them in CS135. That’s pretty limited, computationally! We can do so much more with numbers (addition, multiplication, comparison, etc). But the self-documenting nature of symbols makes the a worthwhile data type to have available.

Racket has many more functions on strings. See the online documentation or use the help desk in DrRacket. There’s no need to memorize all those functions. We’ll use a pretty small subset in CS135.

Symbols are like multiple choice; strings are like short answer.

Applying (symbol=? 'blue 100) breaks the contract that we gave earlier. We said that symbol=? consumes two symbols. If you break the contract and give it a symbol and a number, then it can do whatever it wants. In this case, it gives an error.

equal?, on the other hand, has a contract of Any Any -> Bool. That is, you can give it any pair of Racket values and it has to do the “right thing”. In the case of (equal? 'blue 100), the “right thing” is to produce false.