Review for midterm -- Pic10C, Spring '98

Below is a list of topics which should serve as a study outline for you as you make sure you've covered all of the bases in studying for the midterm. It may or may not be completely comprehensive -- I am making it up from the -- reading assignment. If you have any questions about any of this, please email me. Where page numbers are given, they are references to Carrano (unless otherwise stated) and refer you to the part of the reading assignments dealing with the particular issue. --Ami

Contents:


2-3 Trees (p.563ff)
The defining property: Each node is either a 2-node or a 3-node. If it is a 2-node, it has 1 data item and 2 children. If it is a 3-node, it has 2 data items and 3 children. In any case all of the children must have the exact same height (which we allow to be zero, in which case the node is a leaf). In any case, a data item must be greater than all of the data items in any children to its left, and less than all of the data items in any children to its right.
For traverse and search pseudo-code, see Carrano, p.566-567. Here's pseudo-code to extract a range of values:

VisitRange(T, fromKey, toKey)
// Visits each item in T with key between fromKey and toKey
if (T's root is a leaf)
    if (fromKey<=T.ldata <=toKey) // visit T.ldata
    if (fromKey<=T.rdata <=toKey) // visit T.rdata
else    
    if (T has 2 items)
      if (fromKey < T.ldata)
         VisitRange(T.lChild, fromKey, toKey)
      if (fromKey<=T.ldata <=toKey) // visit T.ldata
      if ((fromKey < T.rdata)&&(toKey > T.ldata))
         VisitRange(T.mChild, fromKey, toKey)
      if (fromKey<=T.rdata <=toKey) // visit T.rdata
      if (toKey > T.rdata)
         VisitRange(T.rChild, fromKey, toKey)
    else // T has 1 data item
      if (fromKey < T.ldata)
         VisitRange(T.lChild, fromKey, toKey)
      if (fromKey<=T.ldata <=toKey) // visit T.ldata
      if (toKey > T.ldata) 
         VisitRange(T.mChild, fromKey, toKey)

Insertion: First locate the leaf at which the search for the value to insert would terminate. Insert the value there. If the leaf now contains only 2 data items, we are done. Else, we split the leaf into two new leaves, one containing the smallest value, and one containing the greatest, attaching the new leaves to the parent, and moving the middle data item to the parent. In case the parent now has 3 data items, we re-do the above splitting, until we hit the root (whose parent has no data items so cannot overflow). For code for this, see my code for the first programming assignment.

Deletion: Find the value to delete in the tree. If it is not in a leaf, swap it with its inorder successor (which *must* live in a leaf -- make sure you understand why this is!). Then the value to delete lives in a leaf. If it is not the only value in that leaf, we may just delete it. However, if it is the only value in the leaf, we delete it and then "fix" the resulting tree (since the leaf now contains no data items and as such is not a valid 2-3 Node). To fix, you either redistribute from the parent and sibling or merge with a sibling as illustrated on p.580. Again, for code for this, see my code for the first programming assignment.

Effeciency: One should note that although a 2-3 tree is not always guaranteed to be "full" in the sense of a full binary search tree (i.e., every node has the maximal number of children and data items possible), it is guaranteed to be "semi-full", in that one never has a node with children of different heights, so no linear "branches" can develop. This means that all table operations can be completed in O(log(n)) time.


Back to table of contents


2-3-4 Trees (p.583ff)
The defining property: Each node is either a 2-node, a 3-node, or a 4-node. If it is a 2-node, it has 1 data item and 2 children. If it is a 3-node, it has 2 data items and 3 children. If it is a 4-node, it has 3 data items and 4 children. In any case all of the children must have the exact same height (which we allow to be zero, in which case the node is a leaf). In any case, a data item must be greater than all of the data items in any children to its left, and less than all of the data items in any children to its right.
Traverse and search for 2-3-4 Trees are simple extensions of the same functions for 2-3 trees, as is Range Extraction. You should make sure you can write pseudo-code to perform these three functions without to much heartache.

Insertion: Insertion is

NOT

a simple extension of 2-3 Tree insertion. The insertion in 2-3 trees requires two trips to be made in the tree: once down to find the leaf and once back up to fix anything that might have been upset. The trick to avoiding the second trip in 2-3-4 trees is to split nodes that might overflow on the way down to guarantee that no overflow happens. The algorithm is simple: Traverse the tree from the root searching for the leaf where the value to be inserted should go (i.e., the leaf where a search for that value would terminate). While performing this traversal, if you come upon a 4-node, split it as illustrated in figure 12-28, 12-29, and 12-30 on p.587-588. Once you reach the leaf, insert the value there. The leaf is guaranteed not to overflow (if it was already a 4-node, it would have been split), so we are done. For code for this, see my solution to the first programming assignment.

Deletion: Similarly to the deletion from 2-3 trees, we begin by swapping the value to be deleted with its inorder successor if the value doesn't live in a leaf. So we always delete values out of leaves. If the leaf is a 3-node or a 4-node, we can just delete the value and be done. However, if there is a 2-node there, we are in trouble. We circumvent this problem by transforming each 2-node we see on the way from the root to the leaf into a 3 or 4-node, depending on what siblings are available, according to the figures 12-28, 12-29, and 12-30, with the left-to-right arrows being reversed.


Back to table of contents


Red-Black Trees (p.589ff)
The defining property: We liked 2-3-4 trees because they allow us to delay the process of re-balancing the tree somewhat (instead of rebalancing the entire tree after every insertion we get to perform ~log(N) insertions before having to rebalance, and even then only rebalance the relevant parts), and still guarantee a pretty good worst-case performance. The down side is that the memory overhead to maintain data members and child pointers to non-existent entities is noticeable and inelegant. The way around this is to use Red-Black trees. Red-Black trees are normal binary trees except that each of a node's child pointers can be either red or black "colored." We signify a 3-node or a 4-node in a 2-3-4 tree by using red pointers in the red-black tree, and use black pointers to signify normal children relationships in the 2-3-4 tree structure. So 3-nodes are represented as:
                                 
  [ S L ]            \         [L]                [S] 
  /  |  \    ---------\       {   \              /   }
 /   |   \   ---------/      [S]   c     OR:    a     [L]
a   b    c           /       / \                      / \
                            a   b                    b   c
and 4-nodes as:
                                                 
                                                      
  [ S M L ]              \          [M]               
  /  | |  \     ----------\        {   }              
 /   | |   \    ----------/      [S]   [L]            
a    b c    d            /       / \   / \            
                                a   b c   d           
(where the {'s and }'s denote red pointers). Insertion and Deletion in Red-Black trees are inherited from 2-3-4 trees, by converting a red-black tree to 2-3-4, operating there, and converting back. Note that one 2-3-4 tree may have more than one red-black representation, but a given red-black tree has precisely one 2-3-4 tree corresponding to it.


Back to table of contents


AVL Trees (p.592ff)
The defining property: An AVL Tree is a binary search tree where each node's children's heights can differ by no more than one. The forces almost complete balance on the tree, while requiring little to no storage overhead (depending on implementation) beyond that of a normal binary search tree.

Search and Traverse are exactly the same as for a binary search tree.

Insertion and deletion are implemented similarly to BST's as well. To insert or delete a value in an AVL tree, you first insert or delete it as if the tree was a binary search tree. If the balance condition was not upset, we are done. Otherwise, we work from the leaf we inserted into/deleted from up towards the root, and at each step, if the node is unbalanced (i.e., the left subtree's height is more than one away from the height of the right subtree), we either perform a double or single rotation. For diagrams of the two cases, see figures 12-38 through 12-43, p.595-597. You can see some (unguaranteed and *very* shabby) implementation of AVL trees by me from a couple of years ago for illustration.


Back to table of contents


C++ Classes, Constructors, Members, Access, etc. (Stroustrup, p.223ff)
Here are the key concepts from this chapter (ch.10 in Stroustrup). If you know the words, what they mean, and what surrounds them, you should be OK. Otherwise, *READ* the sections pertaining to what you're having trouble with. Of course, you can always email/ask me or the professor:
Back to table of contents

Inheritance (p.344)
Inheritance refers to the ability of one class to derive properties from a previously defined one. That means that if you have a group of classes that have internal relationships (A *is a* B, B *has a* C, etc.), you can represent these relationships at the class definition level. Again, here is a list of key concepts. You should make sure you understand all of them, including how they apply in the context of inheritance/polymorphism.
Back to table of contents

Virtual Functions and Late Binding (p.354)
A virtual member function in a derived class can override a virtual member function in the base class if they have the same declaration. The reason to do this is to allow the compiler to perform late-binding. For example, if you have the classes:
class Base {
  public:
    void mytype() { cout << "My type is Base" << endl;};
}
class Derived : public Base {
  public:                               
    void mytype() { cout << "My type is Derived" << endl;};
}
and then you had a sequence of calls like the following:

Derived d;
Base *bptr=&d; 
Derived *dptr=&d;

bptr->mytype();
dptr->mytype();                                           
then even though the same function was called (d.mytype()), two different functions will be executed, and the computer will print out Base the first time and Derived the second. If we change the definition of mytype() in Base to:
                      
    virtual void mytype() {cout << "My type is Base" << endl;};
then the compiler will decide which function to execute at run time, deciding according to the type of object that each pointer is pointing to, rather than at compile time, based on what type of object the pointer was declared to be pointing at.


Back to table of contents


Class Templates (p.371ff)
Recall the syntax for a template class:
template < class TYPE >
class ClassName {
  ...
}
Recall the syntax for a template member function:
template 
[return-type] ClassName < TYPE > ::FunctionName( parameters ) {
   ...
}
[return-type] Specification of the return type. ClassName:: The scope operation tells the compiler that the function being defined is part of the class "ClassName."

You should have seen and gotten comfortable with templates in PIC 10B. If you haven't, see me asap.


Back to table of contents


Overloaded Operators (p.377ff)
Key concepts:
Back to table of contents