What is this data structure

lep

As the title says i’m interested if anyone has used a data structure similar to this. I tried to search around a bit but i couldn’t find much so guess i will just put out a description and let the internet tell me everything that i’ve missed.

I wanted a data structure which i could do range queries against, that is find me elements between two keys (in my case timestamps but it could be any Ord instance). But my elements are Semigroups. Hence i actually want the “concatination” of all the elements in the range-query. Now my idea was that i wanted to keep the computation for the query on the lower side because it will be queried many more times than added to. To achieve this i took a balanced binary tree and augmented it with a “summary” value which can be thought of as the sconcat of all the elements in that subtree. I chose an AA Tree because it seemed the least hassle to implement but you could use any old BST 1.

Now let’s look at the definition:

data AATree k v
  = Nil
  | Node { key :: k
         , bounds :: (Min k, Max k)
         , value :: v
         , level :: Int32
         , summary :: v
         , left :: AATree k v
         , right :: AATree k v
         }

Standard BST stuff: we store the key, value and both sub-trees. The level field comes from the AA Tree so my additions are the summary and bounds fields. The bounds field also gives a kind of summary of which range the keys of that tree span. Using those both in conjuction we can potentially eliminate much computation.

Let’s see now how it is actually done:

query :: (Ord k, Semigroup v) => k -> k -> AATree k v -> Maybe v
query qlo qhi Nil = Nothing
query qlo qhi x
-- case 1
  | qhi < hlo && qhi < hlo =
        Nothing
  
-- case 2
  | qlo > hhi && qlo > hlo =
        Nothing

-- case 3
  | qlo <= hlo && hhi <= qhi =
        Just $ summary x
  
-- case 4
  | qlo <= key x && key x <= qhi =
       Just (value x)
    <> query qlo (key x) (left x)
    <> query (key x) qhi (right x)
    
-- case 5
  | key x <= qlo && key x <= qhi =
        query qlo qhi (right x)

-- case 6
  | qlo <= key x && qhi <= key x =
        query qlo qhi (left x)
  where
    (hlo, hhi) = (getMin *** getMax) (bounds x)

We have a bunch of cases which i think can be better understood by not looking at the code. Let Q be the query interval, H be the interval given by the bounds field of the current tree and k the key of the current tree.

Now we can handle a few cases very easily.

So that’s my problem and how i solved it. For now i’m happy with it but as i said im interested how others would (or have) solve this problem. So if you’ve used something like that feel free to post it here.


  1. I guess ideally you would use some B-Tree instead.↩︎

  2. Note how we use the Semigroup instance of Maybe. Nice.↩︎

lep . blog