## AVL Trees

An ordered tree (binary search tree) is used when we wish to store objects with (numerical) keys in a binary tree so that lookups can be done in order log2N time, where N is the number of objects in the tree. But an ordered tree that is seriously ``unbalanced,'' that is, where paths from the root to the leaves have dramatically different lengths, will ruin the desired lookup behavior.

The worst-case example of an unbalanced ordered tree is the tree built by inserting a sorted sequence of objects (we show the numerical keys only; the objects attached with the keys are unimportant):

```1 2 3 4
```
The tree looks like this:
```   1
/ \
.   2
/ \
.   3
/ \
.   4
/ \
.   .
```
Obviously, a lookup in this tree is just a linear search, which is slower than log-time.

How can we maintain an ordered tree so that, regardless of the order of insertions, the tree remains balanced? There are several sophisticated technques for doing so; here we consider one of the most elegant, AVL trees.

#### Definition of an AVL tree

An AVL-tree is an ordered tree that has the height-balanced property. Here are the basic definitions:
The height of a tree is the length of the longest path from the tree's root to one of its leaves.

A Node is balanced if the height of its left subtree is plus-or-minus-one the height of its right subtree.

A binary tree has the height-balanced property if all of its Nodes are balanced.

Here is an example of an AVL-tree, whose root holds 44:

```          44
/         \
17           78
/  \         /  \
.   32      50    88
/ \    /  \   / \
.   .  48  62 .  .
/ \ / \
.  . .  .
```
For clarity, we redraw the tree and write the heights next to each Node, in parentheses:
```         44(4)
/         \
17(2)        78(3)
/  \         /    \
.  32(1)   50(2)    88(1)
/ \    /  \      / \
.   . 48(1) 62(1) .  .
/ \   / \
.  .  .  .
```
Notice that, for every node in the tree, the heights of its subtrees are the same, plus-or-minus one.

(By the way, the initials, ``AVL,'' denote the two Russian researchers, G.M. Adel'son-Vel'skii and Y.M. Landis, who developed the key definitions and algorithms for the tree format.)

The example shows that an AVL-tree is not ``exactly'' balanced, or complete (like a heap), but it is ``balanced well enough''. Here is the reason why is it is ``well enough'':

An AVL-tree with N nodes has a height that is less than (2 log2N) + 2.
This implies that a lookup takes on the order of log2N node comparisons. A lookup in a complete ordered tree, where the balancing is perfect, would take, at worst, (log2N)+1 comparisons. So, an AVL-tree is only twice as ``slow'' as the optimal ordered tree --- this is still very good!

(For example, if the tree held 2048 objects, then lookup in a complete tree takes 12 comparisons, worst case, whereas AVL lookup takes 24 comparisons, worst case. Linear search would take 1024 comparisons, on average, and 2048 comparisons, in worst case.)

#### Insertion into AVL-trees: Insertion then Rotation

We must find an efficient way to do an insertion into an AVL-tree that preserves orderedness and the height-balanced property. Our strategy will be to do an insertion in the usual, naive way, and then rebuild the tree to recover the height-balanced property. The second step is called rotation.

Let's reconsider the above tree and insert 54 into it. We know how to perform insertion into ordered trees, and this gives us:

```         44(5)!
/         \
17(2)        78(4)!
/  \         /    \
.  32(1)   50(3)    88(1)
/ \    /  \      / \
.   . 48(1) 62(2) .  .
/ \   /  \
.  .  54(1) .
/  \
.    .
```
Again, the heights are listed next to the nodes. The insertion of 54 has made unbalanced the two nodes marked by ! --- they do not have the height-balanced property. In general, many nodes, all located along the path from the root to the insertion position, might have their heights changed and become unbalanced. We must rebuild the tree so that the unbalanced nodes become balanced again.

#### A general picture of an imbalance

Why did the above AVL tree become unbalanced? Some careful thinking will convince us that an insertion generates an unbalanced node when the inserted number is placed in a subtree that is already ``full'' and already has a height that is one greater than its sibling tree.

A general diagram of the problematic situation looks like this: Say that an AVL tree has a subtree, Z, that has two differently heighted subtrees, and say that the larger-height subtree, Y, is ``full'':

```      root
/    \
.      .
.        .
Z(n+2)
/     \
*(n)   Y(n+1)
/ \     /  \
*(n) *(n)
/\   /\
```
(As usual, we write the heights in parentheses next to the nodes. Subtree Y is``full'' in the sense that both of its subtrees have the same height, and adding one more value to a subtree will cause the subtree's height to increase.) Say that we must insert a new number, k, and the insertion places k inside subtree Y, and this generates an imbalance:
```      root
/    \
.      .
.        .
Z(n+3)!
/     \
*(n)   Y(n+2)
/ \     /  \
*(n) *(n+1)
/\  /\
k
```
Subtree Y is still balanced, but since its height increased by one, this ruins Z's balance.

#### Rotation

The above diagram suggests that ``too many'' objects are placed within subtree Y; perhaps we can find a way of using some of the ``empty space'' within Z's left subtree and reducing Y's (and Z's) height downwards by one. If we can do this, then we have rebalanced the tree.

Is it remarkable that we can repair the situation by moving --- rotating --- not k but the nodes above it. Indeed, we will rotate just three nodes.

Here is the clever strategy:

1. Locate the unbalanced node closest to the insertion point; call it ``Node Z''.
2. Of Node Z's two subtrees, locate the subtree in which the insertion was done and call the root of this subtree, ``Node Y.''
3. Of Node Y's two subtrees, locate the subtree in which the insertion was done and call the root of this subtree, ``Node X.'' (Note that Node X might be the object just inserted.)
For the earlier example, we have this arrangement: Node Z holds 78, Y holds 50, and X holds 62; here again is the example:
```         44(5)!
/         \
17(2)      Z78(4)!
/  \         /    \
.  32(1)  Y50(3)    88(2)
/ \    /  \      / \
.   .48(1) X62(2) .  .
/ \   /  \
.  .  54(1) .
/  \
.    .
```
Here is a more general diagram of the situation, with node heights indicated in parentheses:
```             Z(n+3)!
/     \
*(n)   Y(n+2)
/ \     /  \
*(n) X(n+1)
/\  /\
k
```
That is, the insertion of the new object has caused Node X's height to increase by one, and this makes Node Y's height increase by one. (Node that Node X might itself be the new object, and note that Node Y is still balanced. Hence, the heights must be related as shown.) The result is that Node Z's height has increased from n+2 to n+3 and is unbalanced.

Rather than try to move Node k, our objective is to rearrange Nodes Z, Y, and X, so that the subtree becomes balanced again and recovers the height of n+2, which it had prior to the insertion. All three nodes might be moved, but their subtrees, and the rest of the tree, will be unaltered.

Now remember that each of Nodes Z, Y, and X hold numerical values, and remember that the three nodes are arranged in a path, a linear sequence, in the tree, like this:

```   Node Z
\
Node Y
/
Node X
```
To reduce the height of this structure, it makes sense to rearrange the three nodes into this pattern:
```        Node b
/   \
Node a   Node c
```
This would reduce the height by one! But which of Z, Y, X, should be Node b? Node a? Node c?

The answer is simple: We compare the values held in Nodes Z, Y, and X. The node that holds the smallest number will be ``Node a''; the node the holds the middle number will be ``Node b''; and the node the holds the largest number will be ``Node c''.

Another way of identifying Nodes a,b, and c is to visually ``read'' the tree from left to right; the leftmost node is Node a; the middle is Node b; and the rightmost is Node c. (This left-to-right ``reading'' can be coded as an in-order tree traversal.)

The above picture is incomplete, because Nodes Z, Y, and X have their own subtrees. We must label the the four subtrees attached to Nodes a, b, and c, as T0, T1, T2, and T3. Again, we can do this by ``reading'' the tree from left to right.

But here is a more precise statement of the algorithm: Given Nodes Z, Y, and X:

1. Attach the label, ``Node a'', to the Node that has the smallest value; attach the label, ``Node b'', to the Node that has the middle value; and attach the label, ``Node c'', to the Node that has the largest value;
2. Label Node a's left subtree, ``T0''.
3. If Node a's right subtree is not Node b, then label it ``T1''; else label Node b's left subtree ``T1''.
4. If Node b's right subtree is not Node c, then label it ``T2''; else label Node c's left subtree ``T2''.
5. Label Node c's right subtree, ``T3''.

Here is the labelling for the example:

```                 Z78c
/    \
Y50a      88T3
/  \      / \
48T0  X62b   .  .
/ \  /  \
.  . 54T1 .T2
/  \
.    .
```
Now, given Nodes a, b, c, and given subtrees T0, T1, T2, T3, we rotate the nodes and reconnect them to their subtrees like this:
```                  b
/     \
a       c
/   \   /   \
T0   T1 T2   T3
```
For the example, the result of the rotation looks like this:
```                  62b
/    \
50a       78c
/  \      / \
48T0  54T1  .T2 88T3
/ \  /  \      /  \
.  . .    .    .    .
```
If you carefully consider the algorithm we used to attach the labels, a, b, and c, and T0, T1, T2, and T3, you can verify that the rotated tree is still an ordered tree. Further, you can visually verify that the height of the tree has been reduced by one!

#### The four forms of rotation

To validate the reduction in height, we can draw diagrams of all the possible combinations of Nodes a,b, and c that can arise, and we see that each rotation reduces the height of the root node by one. There are four cases:

```CASE 1:

Node Z a           ==>           Y
/   \                         /     \
T0   Node Y b                 Z       X
/  \                    /  \    /  \
T1  Node X c            T0  T1  T2  T3
/  \
T2  T3
```
```CASE 2:

Node Z c           ==>           Y
/        \                     /     \
Node Y b   T3                  X       Z
/       \                     /  \    /  \
Node X a   T2                  T0  T1  T2  T3
/  \
T0  T1
```
```CASE 3:

Node Z a           ==>           X
/   \                         /     \
T0   Node Y c                 Z       Y
/    \                 /  \    /  \
Node X b    T3              T0  T1  T2  T3
/  \
T1  T2
```
```CASE 4:

Node Z c           ==>           X
/        \                     /     \
Node Y a   T3                  Y       Z
/       \                     /  \    /  \
T0    Node X b                T0  T1  T2  T3
/  \
T1  T2
```

#### Node deletion from an AVL-tree

The basic idea: as before, consider a node deletion as the deletion of the root (of a subtree). Replace the deleted root by an ``innermost'' node, as described in the lecture on ordered trees.

The promotion of the innermost node to the root might make one half of the rebuilt tree too shallow and unbalance the tree, and (repeated) rotations might be needed to repair the problem. The rotations would proceed along the path from the location of the former innermost node to the root of the overall tree.

So, starting from the leaf that replaced the promoted innermost node, search along the path from the leaf to the overall root, examining each node to see if it is unbalanced. If an unbalanced node is located, call it Node Z.

We must revise the definition of the Y-X-nodes:

1. Of Node Z's two subtrees, locate the subtree with the greater height and call the root of this subtree, ``Node Y.''
2. Of Node Y's two subtrees, locate the subtree with the greater height and call the root of this subtree, ``Node X.'' (If both subtrees have the same height, then choose either one.)
Now, the rotation proceeds as before.

Unfortunately, additional rotations might be necessary, so the technique is repeated for the parent node of the newly installed rotated root, and its parent, etc., until the root of the entire tree is balanced. Although this is more work, there are at most on the order of log2N rotations, so the deletion operation is relatively efficient.