Benchmarks indicate that malloc accounts for a significant amount of the
runtime of queries. The message buffer accounts for ~half of that (the
other being channels), and this change should eliminate it.
Our codec implementation originally just parsed single messages out of
the stream buffer. However, if a query returns a bunch of rows, we're
spending a ton of time shipping those individual messages from the
connection back to the Query stream. Instead, collect blocks of unparsed
messages that are as large as possible and send those back.
This cuts the processing time of the following query in half, from ~10
seconds to ~5:
`SELECT s.n, 'name' || s.n FROM generate_series(0, 9999999) AS s(n)`
At this point, almost all of the remainder of the time is spent parsing
the rows.
cc #450
copy_in data was previously copied ~3 times - once into the copy_in
buffer, once more to frame it into a CopyData frame, and once to write
that into the stream.
Our Codec is now a bit more interesting. Rather than just writing out
pre-encoded data, we can also send along unencoded CopyData so they can
be framed directly into the stream output buffer. In the future we can
extend this to e.g. avoid allocating for simple commands like Sync.
This also allows us to directly pass large copy_in input directly
through without rebuffering it.