Human Readable mathematical expressions, like "1+2*(3+4)-5" pose a moderate challenge to the programmer. Existing algorithms typically build a tree from the expression. Once built, the tree is easy to evaluate, but building the tree is complex and difficult. The author has devised a new algorithm which simplifies the programmer's task. The new algorithm is significant in its conceptual clarity, ease-of-implementation, and complete lack of recursion (explicit or implicit) or trees.
The algorithm was developed when the author was working on a children's math game. In the game, students were asked to discover a correct mathematical expression from a series of digits by inserting mathematical operators and parenthesis. While tree based methods are widely known, their complexity was undesirable for such a simple application, and the author began looking for other algorithms.
The algorithm revolves around a simple observation: a sorting operation can impose the necessary order-of-operations (parentheses included) for any mathematical expression. The initial version of the algorithm had an ugly partial-result-alias-resolving-table that was later obviated by the linked list described below. More information about the algorithm can be solicited from the author.
The algorithm has room for improvement. One obvious area is the sorting. While some sorting seems to be necessary, it should be noted that a "perfect" sort is not. Since partial results do not always affect each other immediately, different sorts can yield the same answer. Generalizing this property, however, is difficult, and remains an area of study.
Lastly, it should be pointed out that this algorithm is easily and obviously extensible to algebraic expressions with operators of arbitrary complexity. This functionality, while possible, will not be persued further here.
The algorithm can be broken up into three phases: text parsing, sorting, and evaluation. As in all programs, parsing the user's text robustly is the most challenging (if tedious) portion. However, the power of this algorithm is that the remaining steps are so simple that they can be completed through a single call to a standard library or just a few simple lines of code.
During text parsing, two data structures are allocated: a doubly linked list and an array. The doubly linked list contains terms and operators, each in its own record, in the same order as the human-readable expression. The array contains a record for each operand which in turn contains a pointer to the node in the linked list containing that operator, the parenthesis depth of the operator, and the number of operators to the left of the operator. The parenthesis depth can easily be maintained by incrementing and decrementing a variable as opening and closing parentheses are encountered. The number of operators to the left of this operator, during the building of the array, is equal to the record number.
The next step is sorting. The operator record array is sorted simultaneously on three fields. In order of sorting priority, those fields are descending parenthesis depth, descending operator priority, and ascending operator number. The first enforces parenthetical sub-expression evaluation, the second: order-of-operations, and the third: left-to-right evaluation. All are necessary to yield correct results, and any standard sorting routine can be used.
The priorities of operators are, in ascending priority (equal priority operators are listed together):
Evaluation follows sorting. Each sorted operand record is evaluated in turn. The operator is extracted from the linked list by way of the record's pointer. Depending on the operator, the relevant arguments are identified (by following links to adjacent nodes in the linked list) and deleted from the linked list. When deleting, all next and previous links must made coherent with respect to the remaining nodes. A value is computed from the operator and arguments, and the node in the list which contained the operator is over-written to contain the resulting value.
This process repeats until all operators have been evaluated. When finished, only one node will remain in the linked list, and that node will contain the value of the expression.
ExampleOriginal Expression: 1+2*(3+4)-5 Text ParsingThe doubly linked list would contain (note that only operators and terms are present):
The array of op records would be:
SortingA standard sorting algorithm sorts by descending parenthesis depth, descending operator priority, and increasing operator count. After sorting, the array of op records would be:
EvaluationEvaluation begins by examining each of the op records in order. Note that we no longer need either the Parenthesis depth or Operator count fields-- just the Pointer into the doubly linked list.
|
This algorithm is expected to largely replace tree-based implementations. While the sorting step is likely to make the algorithm more computationally expensive than tree-building algorithms, the simplicity ensures rapid and correct implementation. Further, great speed is not often needed by such algorithms, and most expressions will have relatively small numbers of operators, so the sorting penalty is likely to be unimportant.
The author places this algorithm in the public domain, but requests that the source of the algorithm be credited. Comments, observations, or suggestions may be directed to the author.
(C) 1998 by Edwin Olson