As the title says i’m interested if anyone has used a data structure similar to this. I tried to search around a bit but i couldn’t find much so guess i will just put out a description and let the internet tell me everything that i’ve missed.
I wanted a data structure which i could do range queries against,
that is find me elements between two keys (in my case timestamps but it
could be any Ord
instance). But my elements are Semigroup
s.
Hence i actually want the “concatination” of all the elements in the
range-query. Now my idea was that i wanted to keep the computation for
the query on the lower side because it will be queried many more times
than added to. To achieve this i took a balanced binary tree and
augmented it with a “summary” value which can be thought of as the sconcat
of all the elements in
that subtree. I chose an AA Tree because it
seemed the least hassle to implement but you could use any old BST 1.
Now let’s look at the definition:
data AATree k v
= Nil
| Node { key :: k
bounds :: (Min k, Max k)
, value :: v
, level :: Int32
, summary :: v
, left :: AATree k v
, right :: AATree k v
, }
Standard BST stuff: we store the key, value and both sub-trees. The
level
field comes from the AA Tree so my additions are the
summary
and bounds
fields. The
bounds
field also gives a kind of summary of which range
the keys of that tree span. Using those both in conjuction we can
potentially eliminate much computation.
Let’s see now how it is actually done:
query :: (Ord k, Semigroup v) => k -> k -> AATree k v -> Maybe v
Nil = Nothing
query qlo qhi
query qlo qhi x-- case 1
| qhi < hlo && qhi < hlo =
Nothing
-- case 2
| qlo > hhi && qlo > hlo =
Nothing
-- case 3
| qlo <= hlo && hhi <= qhi =
Just $ summary x
-- case 4
| qlo <= key x && key x <= qhi =
Just (value x)
<> query qlo (key x) (left x)
<> query (key x) qhi (right x)
-- case 5
| key x <= qlo && key x <= qhi =
query qlo qhi (right x)
-- case 6
| qlo <= key x && qhi <= key x =
query qlo qhi (left x)where
= (getMin *** getMax) (bounds x) (hlo, hhi)
We have a bunch of cases which i think can be better understood by
not looking at the code. Let Q be the query interval, H be the interval given by the
bounds
field of the current tree and k the key
of the
current tree.
Now we can handle a few cases very easily.
Nothing
(case
1,2).Just $ summary x
because we have exhausted the search space (case 3).<>
2. (case 4)So that’s my problem and how i solved it. For now i’m happy with it but as i said im interested how others would (or have) solve this problem. So if you’ve used something like that feel free to post it here.
lep . blog