What is this data structure

lep

As the title says i’m interested if anyone has used a data structure similar to this. I tried to search around a bit but i couldn’t find much so guess i will just put out a description and let the internet tell me everything that i’ve missed.

I wanted a data structure which i could do range queries against, that is find me elements between two keys (in my case timestamps but it could be any Ord instance). But my elements are Semigroups. Hence i actually want the “concatination” of all the elements in the range-query. Now my idea was that i wanted to keep the computation for the query on the lower side because it will be queried many more times than added to. To achieve this i took a balanced binary tree and augmented it with a “summary” value which can be thought of as the sconcat of all the elements in that subtree. I chose an AA Tree because it seemed the least hassle to implement but you could use any old BST ¹.

Now let’s look at the definition:

data AATree k v
  = Nil
  | Node { key :: k
         , bounds :: (Min k, Max k)
         , value :: v
         , level :: Int32
         , summary :: v
         , left :: AATree k v
         , right :: AATree k v
         }

Standard BST stuff: we store the key, value and both sub-trees. The level field comes from the AA Tree so my additions are the summary and bounds fields. The bounds field also gives a kind of summary of which range the keys of that tree span. Using those both in conjuction we can potentially eliminate much computation.

Let’s see now how it is actually done:

query :: (Ord k, Semigroup v) => k -> k -> AATree k v -> Maybe v
query qlo qhi Nil = Nothing
query qlo qhi x
-- case 1
  | qhi < hlo && qhi < hlo =
        Nothing
  
-- case 2
  | qlo > hhi && qlo > hlo =
        Nothing

-- case 3
  | qlo <= hlo && hhi <= qhi =
        Just $ summary x
  
-- case 4
  | qlo <= key x && key x <= qhi =
       Just (value x)
    <> query qlo (key x) (left x)
    <> query (key x) qhi (right x)
    
-- case 5
  | key x <= qlo && key x <= qhi =
        query qlo qhi (right x)

-- case 6
  | qlo <= key x && qhi <= key x =
        query qlo qhi (left x)
  where
    (hlo, hhi) = (getMin *** getMax) (bounds x)

We have a bunch of cases which i think can be better understood by not looking at the code. Let Q be the query interval, H be the interval given by the bounds field of the current tree and k the key of the current tree.

Now we can handle a few cases very easily.

Q ∩ H = ∅ means we have no overlap and just return Nothing (case 1,2).
H ⊆ Q, that is our query space is larger than (or equal) the trees interval we return Just $ summary x because we have exhausted the search space (case 3).
k < Q, we recurse into the right sub-tree because of the usual BST rules that the keys of the right sub-tree are bigger than the key of the parent tree (case 5).
k > Q, we recurse into the left sub-tree for the same reason as above (case 6).
k ∈ Q: this is the most complicated case because we couldn’t really restrict the search space yet so we just recurse into each subtree and combine their results using <> ². (case 4)

So that’s my problem and how i solved it. For now i’m happy with it but as i said im interested how others would (or have) solve this problem. So if you’ve used something like that feel free to post it here.

I guess ideally you would use some B-Tree instead.↩︎
Note how we use the Semigroup instance of Maybe. Nice.↩︎

lep . blog