Build Your Own Transducer and Impress Your Cat - Part 3
This post is a part of a serie:
- Introduction to transducers
- Anatomy of a transducer
- Stateful transducers (this post)
- Early termination in transducers
- Functions which are using transducers
- Transducer exercises and solutions
Stateful transducers
This article describes briefly how to implement a stateful transducer.
When do we need them
We need them each time some output elements depend on multiple input elements.
Some information related to previous input values need to be stored until the transducer has enough of them to start outputting values.
The information stored could simply be the input values, or it could be a derived information (which usually takes less space in memory).
Stringifier Transducer
Suppose that we have a stream of chars representing zero terminated strings and we want to have a stream of strings.
(defn string-builder-transducer [separator]
; The local state *should not* be defined here.
(fn [rf]
; The local state of the transducer comes here.
(let [state (volatile! [])]
(fn ([] (rf))
([result] (-> result
(rf (apply str @state))
(rf)))
([result input]
(let [chars @state]
(if (= separator input)
(do (vreset! state [])
(rf result (apply str chars)))
(do (vreset! state (conj chars input))
result))))))))
(into []
(string-builder-transducer 0)
(list \H \e \l \l \o 0 \C \l \o \j \u \r \e 0 \w \o \r \l \d \!))
Three notes:
- The position where the local state should be introduced matters. If you introduce it at the wrong place, some local state will linger when the transducer is used in multiple pipelines and cause bugs.
volatile!
andvreset!
are used instead of atoms for efficiency.- In this example, the transducer simply stores the encountered chars until the end of each chunk and then used them all at once to create a string. It does not have to always be this way (e.g. the next paragraph).
Chunk Sum Transducer
Suppose that we have a stream of numbers separated by a special keyword :|
and we want to compute the sum of each chunk of numbers.
(defn chunk-sum-transducer [separator]
(fn [rf]
(let [state (volatile! 0)]
(fn ([] (rf))
([result] (-> result
(rf @state)
(rf)))
([result input]
(let [acc @state]
(if (= separator input)
(do (vreset! state 0)
(rf result acc))
(do (vreset! state (+ acc input))
result))))))))
(into []
(chunk-sum-transducer :|)
(list 1 2 3 4 :| 42 :| :| 3 5))
Note that our transducer is efficient in terms of memory as it only keep a sum of numbers in its local state instead of all the elements of the current chunk.
Packet Transducer
Another example a little more complex, this transducer groups messages together in packets of data so that the size of the packet does not exceed its maximum limit (if possible, otherwise it uses bigger packets).
Here we use a map containing multiple values as the local state.
(defn packet-transducer [max-size]
(fn [rf]
(let [state (volatile! {:packet []
:size 0})]
(fn ([] (rf))
([result]
(let [{:keys [packet size]} @state]
(cond-> result
(pos? size) (rf packet) ; Flush the local state to output.
:always (rf)))) ; Transmit the flush signal.
([result input]
(let [{:keys [packet size]} @state
input-size (count input)
new-size (+ size input-size)]
(if (<= new-size max-size)
(do (vreset! state {:packet (conj packet input)
:size new-size})
result)
(do (vreset! state {:packet [input]
:size input-size})
(cond-> result
(pos? size) (rf packet))))))))))
(into []
(packet-transducer 5)
(list [1 1] [2 2] [3 3 3] [4 4] [5] [6 6 6 6 6]))
What’s next
In the next part of this blog serie, I talk about how to support early termination in a data pipeline by using reduced
and reduced?
.