I’m enjoying a closer look at the history of algorithms development in electronic records management systems.
The book I’m reading at the moment uses Pascal to describe algorithm movements—that’s right, it was written in 1988—but the high-level descriptions prompt me to think of my experiences with ForeMost a decade ago, and Documentum, FileNet, and SharePoint since then (no offense meant to other fine ERMSs, but these are the tools I know). Some of the excerpts herald today’s lessons learned from ERMS implementations.
Random notes pop out at me. For example:
On abstract data types—think search results.
The defining characteristic of an abstract data type is that nothing outside of the definitions of the data structure and the algorithms operating on it should refer to anything inside, except through function and procedure calls for the fundamental operations. Stacks and queues are classic examples of abstract data types…
On trees—think classification.
Each node (except the root) has exactly one node above it which is called its parent; the nodes directly below a node are called its children. Nodes with no children are sometimes called leaves, or terminal nodes. Any node is the root of a subtree consisting of it and the nodes below it. A set of trees is called a forest…
On recursion—think visual representation.
[There is a] general method of algorithm design where we solve a problem by first solving trivial subproblems, then combining those solutions to solve slightly bigger subproblems, etc. until the whole problem is solved. This approach might be called “combine and conquer.” While it is always possible to get an equivalent nonrecursive implementation of any recursive program, it is not always possible to rearrange the computations in this way—many recursive programs depend on the subproblems being solved in a particular order. This is a bottom-up approach as contrasted with the top-down orientation of divide and conquer. Recursively defined geometric patterns are sometimes called fractals.
On frameworks—think search parameters.
It is not difficult to find an upper bound on the running time of a program—the challenge is to find the best running bound, one which could actually be achieved if the worst input were encountered…the average case normally requires a rather sophisticated mathematical analysis. In principle, the performance of an algorithm often can be analyzed to an extremely precise level of detail, limited only by uncertainty about the performance of the computer or by the difficulty of determining the mathematical properties of the some of the abstract quantities.
On selecting an algorithm—think system performance.
The most common mistake made in selecting an algorithm is to ignore performance characteristics. Faster algorithms are often more complicated, and implementers are often willing to accept a slower algorithm to avoid having to deal with complexity. But a faster algorithm is often not much more complicated, and dealing with slight added complexity is a small price to pay to avoid dealing with a slow algorithm.
On priority queues—think objects.
In many applications, records with keys must be processed in order, but not necessarily in full sorted order and not necessarily all at once. Often a set of records must be collected, then the largest processed, then perhaps more records collected, then the next largest processed, and so forth. An appropriate data structure in such an environment is one which supports the operations of inserting a new element and deleting the largest element. Applications…include simulations systems, job scheduling and numerical computations.
I’m about halfway through the book, and since I’m not a natural programmer, it’s a slow read. Big heap Records and Information Manager is trying, though. I can’t help but think to myself that the history of algorithms is an important component to forensic computing in my organization. You might think so too.
*With sincerest thanks to Mr. Robert Sedgewick.
#ElectronicRecordsManagement #forensics #electronicrecordsmanagementsystems #algorithms