The Syntax and Semantics of PURPLE

Copyright © 1989 by Matthew Belmonte.

GRAMMAR FOR ARITHMETIC EXPRESSIONS:
Expr -> ArithExpr end-token
ArithExpr -> ArithExpr TermOp Term | Term
TermOp -> plus-token | minus-token
Term -> Term FactorOp Factor | Factor
FactorOp -> mult-token | div-token
Factor -> number-token | left-p-token ArithExpr right-p-token

ADDITIONS FOR BASIC PURPLE:
(remove Expr -> ArithExpr end-token)
Program -> StatementList end-token
StatementList -> StatementList semicolon-token Statement | Statement
Statement -> in-token identifier-token | out-token ArithExpr | identifier-token assign-token ArithExpr
Factor -> identifier-token

ADDITIONS FOR EXTENDED PURPLE:
Statement -> do-token BoolExpr then-token StatementList od-token
| if-token BoolExpr then-token StatementList fi-token
| if-token BoolExpr then-token StatementList else-token StatementList fi-token
BoolExpr -> BoolExpr ClauseOp Clause | Clause
ClauseOp -> and-token | or-token
Clause -> not-token PositiveClause | PositiveClause
PositiveClause -> ArithExpr RelOp ArithExpr
RelOp -> less-token | lesseq-token | greater-token | greatereq-token | eq-token | noteq-token

LEXICAL DEFINITIONS FOR ARITHMETIC EXPRESSIONS:
number-token: (0+1+2+3+4+5+6+7+8+9)+
plus-token: + (that is, the symbol `+', not the regular expression operator `+')
minus-token: -
mult-token: *
div-token: /
left-p-token: (
right-p-token: )
end-token: .

ADDITIONS FOR BASIC PURPLE:
semicolon-token: ;
in-token: IN
out-token: OU
identifier-token: A + B + C + ... + Z
assign-token: <-

ADDITIONS FOR EXTENDED PURPLE:
do-token: DO
then-token: ->
od-token: OD
if-token: IF
fi-token: FI
else-token: ||
and-token: &
or-token: |
not-token: ~
less-token: <
lesseq-token: <=
greater-token: >
greatereq-token: >=
eq-token: =
noteq-token: <>

ARITHMETIC SEMANTICS:
The arithmetic expressions part of the language is straightforward: to calculate the value of a tree, apply the operator represented at the root of the tree to the values of the left and right subtrees. The value of a leaf is simply the number stored in that leaf. Your evaluator should catch any attempt to divide by zero and report an error.

BASIC PURPLE SEMANTICS:
To evaluate a sequence of two statements joined by a semicolon, first evaluate the statement on the left, and then evaluate the statement on the right. To evaluate an IN statement, read a value from the keyboard and assign it to the identifier specified in the IN statement (modify the global variable bindings). To evaluate an OU statement, evaluate the associated arithmetic expression, and display its value on the screen. To evaluate an identifier, look up the value that has been associated with it (use the global variable bindings). To evaluate an assignment statement, find the value of the arithmetic expression on the right-hand side of the assignment operator and assign it to the identifier on the left-hand side (modify the global variable bindings).

EXTENDED PURPLE SEMANTICS:
To evaluate a Boolean expression, apply the operator represented at the root of the tree to the values of the left and right subtrees. To evaluate a DO statement, evaluate the Boolean condition. If the result is true, then evaluate the statement list. After you've evaluated the statement list, re-evaluate the Boolean condition and keep going around this loop until the condition becomes false. To evaluate an IF statement, evaluate the Boolean condition. If the result is true, then evaluate the first statement list and ignore the second (if it exists). If, on the other hand, the condition is false, then evaluate the second statement list (if it exists).

HOW TO DO IT:
You will be writing a scanner, a parser, and an evaluator for arithmetic expressions. Your scanner should be based on the lexical definitions given above, and your parser should take its structure from the structure of the grammar given above. Use as guides the sample scanner, parser, and evaluator for the WFF language, which were discussed in class. Remember, though, that you are writing an interpreter for PURPLE, not for WFFs---the basic principles will be the same but the details of the program will be completely different.

DO NOT DEVELOP YOUR PROGRAMS ON THE COMPUTERS. Write them out on paper first, and submit them for checking before typing them in. This will catch bugs and errors in style before they become buried beneath other segments of your program. Use EdScheme's comment feature where you think the program deserves some explanation; any line beginning with a semicolon is treated as a comment. In particular, you should annotate your more complex recursive functions with the invariants that you used to develop them.

After typing and debugging each segment of the program, have us check it on the computer to make sure that it runs correctly. When you've finished all three segments (scanner, parser, and evaluator) for arithmetic expressions, begin work on extending your scanner, parser, and evaluator to handle Basic PURPLE. Again, work on each of the three phases separately, and submit all changes and additions to us on paper before typing them into the computer. (You may want to print out a copy of your arithmetic expression evaluator and illustrate on it where the changes will be made.)

If you have time after you get Basic PURPLE working, you can start on Extended PURPLE. Same routine: no typing before it's checked on paper, and work on each phase separately.

IMPORTANT:

Before you start on Basic PURPLE, please submit diagrams of both the full parse trees and the abbreviated parse trees corresponding to the following PURPLE programs, so that we know that you know what you're doing (we'll endeavour to get them back to you as quickly as possible):

	IN X;
	Y <- 1;
	DO X > 0 ->
		Y <- Y*X;
		X <- X-1
	OD;
	OU Y.

	IN X;
	IN Y;
	IF X = 6 & Y = 9 ->
		OU 42
	||
		OU X*Y
	FI.

You may use the following pre-defined functions. (These are available for EdScheme for the Macintosh (unpack with UnStuffit), for MIT Scheme for Windows (unpack with pkunzip), for MIT Scheme for UNIX (unpack with gzcat and tar -xf -), and for Doctor Scheme for Windows.

PRE-DEFINED SCANNER FUNCTIONS:
init-scanner should be called before you use any of the other pre-defined functions. Call it only once, before you first run the scanner. If you make the mistake of calling init-scanner every time you run the scanner, then you will lose any saved characters.
next-char takes zero parameters and returns the next character typed from the keyboard.
space? takes a character as its parameter, and is true (#t) if that character is a space or a carriage return, false (#f) otherwise.
letter? takes a character as its parameter, and is true (#t) if that character is a capital letter of the alphabet A-Z, false (#f) otherwise. (The letters A-Z are the only legal variable names, or `identifiers', in PURPLE.)
digit? takes a character as its parameter, and is true (#t) if that character is a digit 0-9, false (#f) otherwise.
symbol? takes a character as its parameter, and is true (#t) if that character is a legal PURPLE symbol (i.e., something that is not a digit and not a letter but still a legal character, such as `+', `&', or `>'). You can use the predicates letter?, digit?, and symbol? to select among several helping functions that handle the scanning for these various types of characters.
char->num takes a digit and returns the number between 0 and 9 that that digit represents. Note that there's a difference between the digit `1' and the number one that is represented by that digit. In order to arithmetic on numbers that are typed in from the keyboard, you must first translate them from strings of digits into the computer's internal representation for numbers. If a number consists of n digits, then in order to translate it you must use char->num n times.
save-char takes as its parameter the character that has most recently been returned by next-char, and pushes it back out onto the input stream, so that it'll be returned by the next call to next-char. This is useful, for example, in a situation in which you've reached the end of a number and have read the character following the number. You don't want to consume this character inside your number-scanning function, because it needs to be available for the function that'll be scanning the token that follows the number. So you need a way to make the machine act as if this character had not been read. Having a lot of calls to save-char in your program would make it confusing, so use this function sparingly! Also, if you've just saved a character, you can't save another one on top of it; that is, at any time there can be only one character that has been saved and is waiting to be returned again from next-char. If you write your scanner clearly and elegantly, a single character should be all you ever need to save.
error takes any number of parameters, displays these parameters on the screen, and then halts execution of the program. error is useful for reporting error conditions that prevent further execution of your program, such as input that is not well-formed.
begin sequences the evaluation of Scheme expressions. Although in a purely functional language any pair of functional expressions f and g have the same values regardless of whether f is evaluated before g or vice versa, Scheme is not exclusively a functional language, and it is possible for the evaluation of functional expressions in Scheme to have side effects. For example, what value is returned by a call to next-token depends on when the call occurs during the course of tokenising the input stream. begin can be abused, especially by those of you who have programming experience in some imperative language such as Pascal. Check the sample programs for good examples of when--and when not--to use begin. Note that begin is not essential to Scheme; it's possible to define a version of begin that works for any pair of functional expressions f and g:

	(define begin				;evaluate f, but pass g to begin-help as is.
		(lambda (f g)
			(begin-help (f) g)))
	(define begin-help			;throw away f's result, and return g's result.
		(lambda value-of-f g)
			(g)))

In reality, the Scheme interpreter defines begin as a special form that works for any number of functional expressions, not just two.

let defines local variables. It takes two parameters, the first of which is a list of pairs of variables and bindings, and the second of which is a functional expression within which these bindings will be effective. So, for example, (let ((right-answer 42) (wrong-answer (* 6 9))) (- wrong-answer right-answer)) evaluates to 12. Like begin, let is useful for sequencing the evaluation of functional expressions that have side effects, since every expression in the bindings list is evaluated before the final expression. But the main use of let is to avoid re-computation of complex expressions by allowing these expressions to be evaluated only once, and then referred to several times. Take a moment to think for yourself about how let can be implemented using helping functions that bind their parameters to new names. Depending on how you write your scanner, you may never need to use let in your scanner. If you do use it in your scanner, you certainly won't need to use it a lot. It will come in handy, however, in many functions in your parser. Check the sample programs for examples of situations in which let is useful.

PRE-DEFINED PARSER FUNCTIONS:
init-parser performs a similar function for the predefined parsing-related functions. It in turn calls init-scanner, so when you write your parser you won't need to include a separate call to init-scanner.
save-token acts analogously to save-char, but for tokens instead of characters. Again, you can't have more than one token saved at any time.
get-next-token first checks to see whether a token has been saved and not yet returned. If so, it returns the saved token. If not, it calls the top-level scanner function next-token, which you define. Thus, since you'll find it convenient to use save-token occasionally in your parser, your parser should use get-next-token as an interface to the scanner, not plain old next-token. If you were to use next-token, you would be ignoring any saved token.

PRE-DEFINED EVALUATOR FUNCTIONS:
init-evaluator sets up bindings for the evaluator and calls init-parser (which in turn calls init-scanner). It defines a data structure called an assoc list, which stores all the variable bindings of the PURPLE program that's being evaluated. Look in The Schemer's Guide to learn about assoc lists, and look at the sample evaluator for WFFs to see how they can be used.
set! is a special form that re-sets the value of a Scheme variable (don't confuse this with a PURPLE variable!). So, for example, after evaluating the sequence `(define foo (* 6 9)) (set! foo 42)' the value of the Scheme variable foo would be 42 instead of 54. You should need to use set! only in one place, to modify the bindings. DON'T use it anywhere else. Imperative functions such as set! are dirty pool in a functional language such as Scheme and usually make for confusing programs, so it's best to stay away from them.
display takes any Scheme expression as a parameter and displays it on the screen. It's useful for issuing prompts and for printing values.

THINGS TO REMEMBER:
--The top-level function of your scanner should be called next-token (so that the predefined function get-next-token will be able to refer to it) and should take zero arguments and return the next token in the input stream.
--Don't use ultra-generic variable names such as a, b, x, y, l, &c. Use names that tell what kind of data your variables hold, and what the significance of that data is. Look at the sample programs for examples.
--Annotate all complex recursive functions with invariants, and think about these invariants before you write the functions.