A New Algorithm for Evalutating Human-Readable Mathematical Expressions

(C) 1998 by Edwin Olson

Abstract

Human Readable mathematical expressions, like "1+2*(3+4)-5" pose a moderate challenge to the programmer. Existing algorithms typically build a tree from the expression. Once built, the tree is easy to evaluate, but building the tree is complex and difficult. The author has devised a new algorithm which simplifies the programmer's task. The new algorithm is significant in its conceptual clarity, ease-of-implementation, and complete lack of recursion (explicit or implicit) or trees.

History

The algorithm was developed when the author was working on a children's math game. In the game, students were asked to discover a correct mathematical expression from a series of digits by inserting mathematical operators and parenthesis. While tree based methods are widely known, their complexity was undesirable for such a simple application, and the author began looking for other algorithms.

The algorithm revolves around a simple observation: a sorting operation can impose the necessary order-of-operations (parentheses included) for any mathematical expression. The initial version of the algorithm had an ugly partial-result-alias-resolving-table that was later obviated by the linked list described below. More information about the algorithm can be solicited from the author.

The algorithm has room for improvement. One obvious area is the sorting. While some sorting seems to be necessary, it should be noted that a "perfect" sort is not. Since partial results do not always affect each other immediately, different sorts can yield the same answer. Generalizing this property, however, is difficult, and remains an area of study.

Lastly, it should be pointed out that this algorithm is easily and obviously extensible to algebraic expressions with operators of arbitrary complexity. This functionality, while possible, will not be persued further here.

Description

The algorithm can be broken up into three phases: text parsing, sorting, and evaluation. As in all programs, parsing the user's text robustly is the most challenging (if tedious) portion. However, the power of this algorithm is that the remaining steps are so simple that they can be completed through a single call to a standard library or just a few simple lines of code.

During text parsing, two data structures are allocated: a doubly linked list and an array. The doubly linked list contains terms and operators, each in its own record, in the same order as the human-readable expression. The array contains a record for each operand which in turn contains a pointer to the node in the linked list containing that operator, the parenthesis depth of the operator, and the number of operators to the left of the operator. The parenthesis depth can easily be maintained by incrementing and decrementing a variable as opening and closing parentheses are encountered. The number of operators to the left of this operator, during the building of the array, is equal to the record number.

The next step is sorting. The operator record array is sorted simultaneously on three fields. In order of sorting priority, those fields are descending parenthesis depth, descending operator priority, and ascending operator number. The first enforces parenthetical sub-expression evaluation, the second: order-of-operations, and the third: left-to-right evaluation. All are necessary to yield correct results, and any standard sorting routine can be used.

The priorities of operators are, in ascending priority (equal priority operators are listed together):

  • equality operators, e.g., = < <= !=
  • +, -
  • *, /
  • ^ (exponentiation)
  • unary operators, e.g., ! (factorial)

    Evaluation follows sorting. Each sorted operand record is evaluated in turn. The operator is extracted from the linked list by way of the record's pointer. Depending on the operator, the relevant arguments are identified (by following links to adjacent nodes in the linked list) and deleted from the linked list. When deleting, all next and previous links must made coherent with respect to the remaining nodes. A value is computed from the operator and arguments, and the node in the list which contained the operator is over-written to contain the resulting value.

    This process repeats until all operators have been evaluated. When finished, only one node will remain in the linked list, and that node will contain the value of the expression.

    Example


    Original Expression:

    1+2*(3+4)-5

    Text Parsing

    The doubly linked list would contain (note that only operators and terms are present):

    Memory AddressABCDEFGHI
    Value1+2*3+4-5

    The array of op records would be:

    Record NumberPointerParenthesis depthOperator Count
    0B00
    1D01
    2F12
    3H03

    Sorting

    A standard sorting algorithm sorts by descending parenthesis depth, descending operator priority, and increasing operator count. After sorting, the array of op records would be:

    Record NumberPointerParenthesis depthOperator Count
    0F12
    1D01
    2B00
    3H03

    Evaluation

    Evaluation begins by examining each of the op records in order. Note that we no longer need either the Parenthesis depth or Operator count fields-- just the Pointer into the doubly linked list.

  • First record's pointer is F
  • F is "+"=addition
  • F.previous=E=3
  • F.next=G=4
  • 3+4=7
  • Delete E and G from linked list and set F to 7. This leaves the linked list:
    Memory AddressABCDFHI
    Value1+2*7-5

  • Next record's pointer is D.
  • D is "*"=multiplication
  • D.prev=C=2
  • D.next=F=7
  • 2*7=14
  • Delete C and F from linked list and set D to 14. This leaves the linked list:
    Memory AddressABDHI
    Value1+14-5

  • Next record's pointer is B.
  • B is "+"=addition
  • B.prev=A=1
  • B.next=D=14
  • 1+14=15
  • Delete A and D from linked list and set B to 15. This leaves the linked list:
    Memory AddressBHI
    Value15-5

  • Last record's pointer is H.
  • H is "-"=subtraction
  • H.prev=B=15
  • H.next=I=5
  • 15-5=10
  • Delete B and I from linked list and set H to 10. This leaves the linked list:
    Memory AddressH
    Value10

  • We have no more records.
  • The value of the expression is the first element of the linked list, H, 10.

  • This algorithm is expected to largely replace tree-based implementations. While the sorting step is likely to make the algorithm more computationally expensive than tree-building algorithms, the simplicity ensures rapid and correct implementation. Further, great speed is not often needed by such algorithms, and most expressions will have relatively small numbers of operators, so the sorting penalty is likely to be unimportant.

    The author places this algorithm in the public domain, but requests that the source of the algorithm be credited. Comments, observations, or suggestions may be directed to the author.

    (C) 1998 by Edwin Olson