Stream (Con)fusion


Stream (Con)fusion

Sorry the title is a bad pun. I've been learning about stream fusion lately. It looks like there are a few common techniques:

  • stream/unstream: text
  • stream/unstream + bundle: vector
  • foldr/build: Data.List

I've noticed that the stream/unstream approach has a performance issue that becomes pretty noticable as you chain together more and more operations. I've put together a gist with a benchmark that illustrates the problem with text. I'll explain what I believe is happening.

Stream is defined as:

data Stream a = forall s. Stream (s -> Step s a) -- stepper function !s -- current state !Size -- size hint 

Now let's look at the streaming version of append:

data E l r = L !l | R !r append :: Stream Char -> Stream Char -> Stream Char append (Stream next0 s01 len1) (Stream next1 s02 len2) = Stream next (L s01) (len1 + len2) where next (L s1) = case next0 s1 of Done -> Skip (R s02) Skip s1' -> Skip (L s1') Yield x s1' -> Yield x (L s1') next (R s2) = case next1 s2 of Done -> Done Skip s2' -> Skip (R s2') Yield x s2' -> Yield x (R s2') 

As you chain together appends, the existentially quantified s in Stream is going to become a nested E a (E b (E c (E ...))). This ends up degrading performance considerably when you append a lot of short strings.

When we run the benchmark in the gist I provided earlier, these are the results:

benchmarking Text Append Fusion/Five Chars time 187.8 ns (187.5 ns .. 188.3 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 188.0 ns (187.7 ns .. 188.5 ns) std dev 1.213 ns (737.3 ps .. 1.771 ns) benchmarking Text Append Fusion/Ten Chars time 580.9 ns (580.3 ns .. 581.6 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 581.2 ns (580.5 ns .. 582.6 ns) std dev 2.938 ns (1.813 ns .. 5.212 ns) benchmarking Text Append Fusion/Ten Chars (left-associated append) time 480.8 ns (479.9 ns .. 481.9 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 480.6 ns (480.0 ns .. 481.5 ns) std dev 2.312 ns (1.708 ns .. 3.078 ns) benchmarking Text Append Fusion/Twenty Chars time 2.207 μs (2.204 μs .. 2.212 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.214 μs (2.208 μs .. 2.224 μs) std dev 26.14 ns (16.05 ns .. 38.77 ns) benchmarking Text Append Fusion/Twenty Chars (left-associated append) time 1.739 μs (1.736 μs .. 1.742 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.739 μs (1.737 μs .. 1.746 μs) std dev 12.50 ns (5.356 ns .. 24.04 ns) 

That is, chaining append exhibits non-linear performance characteristics. (I am uncertain why left associating append improves things somewhat). I expect that all of the functions in Data.Text.Internal.Fusion.Common that layer additional data structures like this would have similar performance characteristics. This includes cons,drop,take,intersperse, and others.

The benchmark I provided is contrived. It's simple to rewrite all of the functions in it to avoid the use of so many appends. But there are other algorithms that I'm interested in where I do need to use append like this. I'm particularly interested in responses (or links to papers/blogs) that address:

  • How do approaches other than stream/unstream (like foldr/build) compare on this particular issue?
  • What trade-offs have to be made to improve this? For example, I'm pretty sure that pipes, which uses monadic bind (the >> one without the equals at the end) to accomplish the same kind of appending, doesn't have to build up a nested data type. I assume that the free monad-inspired abstraction pipes uses is unsuitable for stream fusion though, but I'm not sure why.

Submitted September 11, 2016 at 11:20PM by andrewthad
via reddit http://ift.tt/2cP7OF3

New to Haskell, wondering how to make this code more concise.


New to Haskell, wondering how to make this code more concise.

Hi everyone, I've got some helper functions that all do very similar things, and I'm wondering what the best way to go about combining some of them. I have a feeling b and d could be made into one function, and a, c, and e can be another.

feedback :: [Card] -> [Card] -> (Int, Int, Int, Int, Int) feedback guess ans = ((a guess ans), (b guess ans) ,(c guess ans) , (d guess ans), (e guess ans)) a [] _ = 0 a (x:xs) ans | x `elem` ans = 1 + (a xs ans) | otherwise = (a xs ans) b _ [] = 0 b guess (x:xs) | (minimum (map rank guess)) > (rank x) = 1 + (b guess xs) | otherwise = (b guess xs) c [] _ = 0 c (x:xs) ans | (rank x) `elem` (map rank ans) = 1 + (c xs ans) | otherwise = (c xs ans) d _ [] = 0 d guess (x:xs) | (minimum (map rank guess)) < (rank x) = 1 + (d guess xs) | otherwise = (d guess xs) e [] _ = 0 e (x:xs) ans | (suit x) `elem` (map suit ans) = 1 + (e xs ans) | otherwise = (e xs ans) 

Thanks for your help in advance.

Submitted September 11, 2016 at 10:40PM by TwelveAndWhatIsThis
via reddit http://ift.tt/2cbBbS1