tree: de9f68a87187851f5135d1cc537a99306a27d021 [path history] [tgz]
  1. _generate/
  2. cmd/
  3. testdata/
  4. .gitignore
  5. cpuid_amd64.go
  6. cpuid_amd64.s
  7. decode.go
  8. decode_amd64.go
  9. decode_amd64.s
  10. decode_other.go
  11. decode_test.go
  12. encode.go
  13. encode_all.go
  14. encode_amd64.go
  15. encode_best.go
  16. encode_better.go
  17. encode_go.go
  18. encode_test.go
  19. encodeblock_amd64.go
  20. encodeblock_amd64.s
  21. LICENSE
  22. README.md
  23. s2.go
  24. s2_test.go
s2/README.md

S2 Compression

S2 is an extension of Snappy.

S2 is aimed for high throughput, which is why it features concurrent compression for bigger payloads.

Decoding is compatible with Snappy compressed content, but content compressed with S2 cannot be decompressed by Snappy. This means that S2 can seamlessly replace Snappy without converting compressed content.

S2 is designed to have high throughput on content that cannot be compressed. This is important so you don't have to worry about spending CPU cycles on already compressed data.

Benefits over Snappy

  • Better compression
  • Concurrent stream compression
  • Faster decompression
  • Ability to quickly skip forward in compressed stream
  • Compatible with reading Snappy compressed content
  • Offers alternative, more efficient, but slightly slower compression mode.
  • Smaller block size overhead on incompressible blocks.
  • Block concatenation
  • Automatic stream size padding.
  • Snappy compatible block compression.

Drawbacks over Snappy

  • Not optimized for 32 bit systems.
  • Uses slightly more memory (4MB per core) due to larger blocks and concurrency (configurable).

Usage

Installation: go get -u github.com/klauspost/compress/s2

Full package documentation:

godoc

Compression

func EncodeStream(src io.Reader, dst io.Writer) error {
    enc := s2.NewWriter(dst)
    _, err := io.Copy(enc, src)
    if err != nil {
        enc.Close()
        return err
    }
    // Blocks until compression is done.
    return enc.Close() 
}

You should always call enc.Close(), otherwise you will leak resources and your encode will be incomplete.

For the best throughput, you should attempt to reuse the Writer using the Reset() method.

The Writer in S2 is always buffered, therefore NewBufferedWriter in Snappy can be replaced with NewWriter in S2. It is possible to flush any buffered data using the Flush() method. This will block until all data sent to the encoder has been written to the output.

S2 also supports the io.ReaderFrom interface, which will consume all input from a reader.

As a final method to compress data, if you have a single block of data you would like to have encoded as a stream, a slightly more efficient method is to use the EncodeBuffer method. This will take ownership of the buffer until the stream is closed.

func EncodeStream(src []byte, dst io.Writer) error {
    enc := s2.NewWriter(dst)
    // The encoder owns the buffer until Flush or Close is called.
    err := enc.EncodeBuffer(buf)
    if err != nil {
        enc.Close()
        return err
    }
    // Blocks until compression is done.
    return enc.Close()
}

Each call to EncodeBuffer will result in discrete blocks being created without buffering, so it should only be used a single time per stream. If you need to write several blocks, you should use the regular io.Writer interface.

Decompression

func DecodeStream(src io.Reader, dst io.Writer) error {
    dec := s2.NewReader(src)
    _, err := io.Copy(dst, dec)
    return err
}

Similar to the Writer, a Reader can be reused using the Reset method.

For the best possible throughput, there is a EncodeBuffer(buf []byte) function available. However, it requires that the provided buffer isn't used after it is handed over to S2 and until the stream is flushed or closed.

For smaller data blocks, there is also a non-streaming interface: Encode(), EncodeBetter() and Decode(). Do however note that these functions (similar to Snappy) does not provide validation of data, so data corruption may be undetected. Stream encoding provides CRC checks of data.

It is possible to efficiently skip forward in a compressed stream using the Skip() method. For big skips the decompressor is able to skip blocks without decompressing them.

Single Blocks

Similar to Snappy S2 offers single block compression. Blocks do not offer the same flexibility and safety as streams, but may be preferable for very small payloads, less than 100K.

Using a simple dst := s2.Encode(nil, src) will compress src and return the compressed result. It is possible to provide a destination buffer. If the buffer has a capacity of s2.MaxEncodedLen(len(src)) it will be used. If not a new will be allocated. Alternatively EncodeBetter can also be used for better, but slightly slower compression.

Similarly to decompress a block you can use dst, err := s2.Decode(nil, src). Again an optional destination buffer can be supplied. The s2.DecodedLen(src) can be used to get the minimum capacity needed. If that is not satisfied a new buffer will be allocated.

Block function always operate on a single goroutine since it should only be used for small payloads.

Commandline tools

Some very simply commandline tools are provided; s2c for compression and s2d for decompression.

Binaries can be downloaded on the Releases Page.

Installing then requires Go to be installed. To install them, use:

go install github.com/klauspost/compress/s2/cmd/s2c && go install github.com/klauspost/compress/s2/cmd/s2d

To build binaries to the current folder use:

go build github.com/klauspost/compress/s2/cmd/s2c && go build github.com/klauspost/compress/s2/cmd/s2d

s2c

Usage: s2c [options] file1 file2

Compresses all files supplied as input separately.
Output files are written as 'filename.ext.s2'.
By default output files will be overwritten.
Use - as the only file name to read from stdin and write to stdout.

Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt

Options:
  -bench int
    	Run benchmark n times. No output will be written
  -blocksize string
    	Max  block size. Examples: 64K, 256K, 1M, 4M. Must be power of two and <= 4MB (default "4M")
  -c	Write all output to stdout. Multiple input files will be concatenated
  -cpu int
    	Compress using this amount of threads (default CPU_THREADS])
  -faster
    	Compress faster, but with a minor compression loss
  -help
    	Display help
  -pad string
    	Pad size to a multiple of this value, Examples: 500, 64K, 256K, 1M, 4M, etc (default "1")
  -q	Don't write any output to terminal, except errors
  -rm
    	Delete source file(s) after successful compression
  -safe
    	Do not overwrite output files

s2d

Usage: s2d [options] file1 file2

Decompresses all files supplied as input. Input files must end with '.s2' or '.snappy'.
Output file names have the extension removed. By default output files will be overwritten.
Use - as the only file name to read from stdin and write to stdout.

Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt

Options:
  -bench int
    	Run benchmark n times. No output will be written
  -c	Write all output to stdout. Multiple input files will be concatenated
  -help
    	Display help
  -q	Don't write any output to terminal, except errors
  -rm
    	Delete source file(s) after successful decompression
  -safe
    	Do not overwrite output files

Performance

This section will focus on comparisons to Snappy. This package is solely aimed at replacing Snappy as a high speed compression package. If you are mainly looking for better compression zstandard gives better compression, but typically at speeds slightly below “better” mode in this package.

Compression is increased compared to Snappy, mostly around 5-20% and the throughput is typically 25-40% increased (single threaded) compared to the Snappy Go implementation.

Streams are concurrently compressed. The stream will be distributed among all available CPU cores for the best possible throughput.

A “better” compression mode is also available. This allows to trade a bit of speed for a minor compression gain. The content compressed in this mode is fully compatible with the standard decoder.

Snappy vs S2 compression speed on 16 core (32 thread) computer, using all threads and a single thread (1 CPU):

FileS2 speedS2 ThroughputS2 % smallerS2 “better”“better” throughput“better” % smaller
rawstudio-mint14.tar12.70x10556 MB/s7.35%4.15x3455 MB/s12.79%
(1 CPU)1.14x948 MB/s-0.42x349 MB/s-
github-june-2days-2019.json17.13x14484 MB/s31.60%10.09x8533 MB/s37.71%
(1 CPU)1.33x1127 MB/s-0.70x589 MB/s-
github-ranks-backup.bin15.14x12000 MB/s-5.79%6.59x5223 MB/s5.80%
(1 CPU)1.11x877 MB/s-0.47x370 MB/s-
consensus.db.10gb14.62x12116 MB/s15.90%5.35x4430 MB/s16.08%
(1 CPU)1.38x1146 MB/s-0.38x312 MB/s-
adresser.json8.83x17579 MB/s43.86%6.54x13011 MB/s47.23%
(1 CPU)1.14x2259 MB/s-0.74x1475 MB/s-
gob-stream16.72x14019 MB/s24.02%10.11x8477 MB/s30.48%
(1 CPU)1.24x1043 MB/s-0.70x586 MB/s-
10gb.tar13.33x9254 MB/s1.84%6.75x4686 MB/s6.72%
(1 CPU)0.97x672 MB/s-0.53x366 MB/s-
sharnd.out.2gb2.11x12639 MB/s0.01%1.98x11833 MB/s0.01%
(1 CPU)0.93x5594 MB/s-1.34x8030 MB/s-
enwik919.34x8220 MB/s3.98%7.87x3345 MB/s15.82%
(1 CPU)1.06x452 MB/s-0.50x213 MB/s-
silesia.tar10.48x6124 MB/s5.67%3.76x2197 MB/s12.60%
(1 CPU)0.97x568 MB/s-0.46x271 MB/s-
enwik1021.07x9020 MB/s6.36%6.91x2959 MB/s16.95%
(1 CPU)1.07x460 MB/s-0.51x220 MB/s-

Legend

  • S2 speed: Speed of S2 compared to Snappy, using 16 cores and 1 core.
  • S2 throughput: Throughput of S2 in MB/s.
  • S2 % smaller: How many percent of the Snappy output size is S2 better.
  • S2 "better": Speed when enabling “better” compression mode in S2 compared to Snappy.
  • "better" throughput: Speed when enabling “better” compression mode in S2 compared to Snappy.
  • "better" % smaller: How many percent of the Snappy output size is S2 better when using “better” compression.

There is a good speedup across the board when using a single thread and a significant speedup when using multiple threads.

Machine generated data gets by far the biggest compression boost, with size being being reduced by up to 45% of Snappy size.

The “better” compression mode sees a good improvement in all cases, but usually at a performance cost.

Incompressible content (sharnd.out.2gb, 2GB random data) sees the smallest speedup. This is likely dominated by synchronization overhead, which is confirmed by the fact that single threaded performance is higher (see above).

Decompression

S2 attempts to create content that is also fast to decompress, except in “better” mode where the smallest representation is used.

S2 vs Snappy decompression speed. Both operating on single core:

FileS2 Throughputvs. SnappyBetter Throughputvs. Snappy
rawstudio-mint14.tar2117 MB/s1.14x1738 MB/s0.94x
github-june-2days-2019.json2401 MB/s1.25x2307 MB/s1.20x
github-ranks-backup.bin2075 MB/s0.98x1764 MB/s0.83x
consensus.db.10gb2967 MB/s1.05x2885 MB/s1.02x
adresser.json4141 MB/s1.07x4184 MB/s1.08x
gob-stream2264 MB/s1.12x2185 MB/s1.08x
10gb.tar1525 MB/s1.03x1347 MB/s0.91x
sharnd.out.2gb3813 MB/s0.79x3900 MB/s0.81x
enwik91246 MB/s1.29x967 MB/s1.00x
silesia.tar1433 MB/s1.12x1203 MB/s0.94x
enwik101284 MB/s1.32x1010 MB/s1.04x

Legend

  • S2 Throughput: Decompression speed of S2 encoded content.
  • Better Throughput: Decompression speed of S2 “better” encoded content.
  • vs Snappy: Decompression speed of S2 “better” mode compared to Snappy and absolute speed.

While the decompression code hasn't changed, there is a significant speedup in decompression speed. S2 prefers longer matches and will typically only find matches that are 6 bytes or longer. While this reduces compression a bit, it improves decompression speed.

The “better” compression mode will actively look for shorter matches, which is why it has a decompression speed quite similar to Snappy.

Without assembly decompression is also very fast; single goroutine decompression speed. No assembly:

FileS2 ThroughputS2 throughput
consensus.db.10gb.s21.84x2289.8 MB/s
10gb.tar.s21.30x867.07 MB/s
rawstudio-mint14.tar.s21.66x1329.65 MB/s
github-june-2days-2019.json.s22.36x1831.59 MB/s
github-ranks-backup.bin.s21.73x1390.7 MB/s
enwik9.s21.67x681.53 MB/s
adresser.json.s23.41x4230.53 MB/s
silesia.tar.s21.52x811.58

Even though S2 typically compresses better than Snappy, decompression speed is always better.

Block compression

When compressing blocks no concurrent compression is performed just as Snappy. This is because blocks are for smaller payloads and generally will not benefit from concurrent compression.

An important change is that incompressible blocks will not be more than at most 10 bytes bigger than the input. In rare, worst case scenario Snappy blocks could be significantly bigger than the input.

Mixed content blocks

The most reliable is a wide dataset. For this we use webdevdata.org-2015-01-07-subset, 53927 files, total input size: 4,014,526,923 bytes. Single goroutine used.

*InputOutputReductionMB/s
S24014526923106228248973.54%861.44
S2 Better401452692398122128475.56%399.54
Snappy4014526923112866773671.89%741.29
S2, Snappy Output4014526923109378481572.75%843.66

S2 delivers both the best single threaded throuhput with regular mode and the best compression rate with “better” mode.

When outputting Snappy compatible output it still delivers better throughput (100MB/s more) and better compression.

As can be seen from the other benchmarks decompression should also be easier on the S2 generated output.

Standard block compression

Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns. So individual benchmarks should only be seen as a guideline and the overall picture is more important.

These micro-benchmarks are with data in cache and trained branch predictors. For a more realistic benchmark see the mixed content above.

Block compression. Parallel benchmark running on 16 cores, 16 goroutines.

AMD64 assembly is use for both S2 and Snappy.

Absolute PerfSnappy sizeS2 SizeSnappy SpeedS2 SpeedSnappy decS2 dec
html228432111116246 MB/s17438 MB/s40972 MB/s49263 MB/s
urls.10K3354922873267943 MB/s9693 MB/s22523 MB/s26484 MB/s
fireworks.jpeg123034123100349544 MB/s273889 MB/s718321 MB/s827552 MB/s
fireworks.jpeg (200B)1461558869 MB/s17773 MB/s33691 MB/s52421 MB/s
paper-100k.pdf8530484459167546 MB/s101263 MB/s326905 MB/s291944 MB/s
html_x_4922342111315194 MB/s50670 MB/s30843 MB/s32217 MB/s
alice29.txt88034859755936 MB/s6139 MB/s12882 MB/s20044 MB/s
asyoulik.txt77503796505517 MB/s6366 MB/s12735 MB/s22806 MB/s
lcet10.txt2346612206706235 MB/s6067 MB/s14519 MB/s18697 MB/s
plrabn12.txt3192673179855159 MB/s5726 MB/s11923 MB/s19901 MB/s
geo.protodata233351869021220 MB/s26529 MB/s56271 MB/s62540 MB/s
kppkn.gtb69526653129732 MB/s8559 MB/s18491 MB/s18969 MB/s
alice29.txt (128B)80826691 MB/s15489 MB/s31883 MB/s38874 MB/s
alice29.txt (1000B)77477412204 MB/s13000 MB/s48056 MB/s52341 MB/s
alice29.txt (10000B)6648693310044 MB/s12806 MB/s32378 MB/s46322 MB/s
alice29.txt (20000B)12686135747733 MB/s11210 MB/s30566 MB/s58969 MB/s
Relative PerfSnappy sizeS2 size improvedS2 SpeedS2 Dec Speed
html22.31%7.58%1.07x1.20x
urls.10K47.78%14.36%1.22x1.18x
fireworks.jpeg99.95%-0.05%0.78x1.15x
fireworks.jpeg (200B)73.00%-6.16%2.00x1.56x
paper-100k.pdf83.30%0.99%0.60x0.89x
html_x_422.52%77.11%3.33x1.04x
alice29.txt57.88%2.34%1.03x1.56x
asyoulik.txt61.91%-2.77%1.15x1.79x
lcet10.txt54.99%5.96%0.97x1.29x
plrabn12.txt66.26%0.40%1.11x1.67x
geo.protodata19.68%19.91%1.25x1.11x
kppkn.gtb37.72%6.06%0.88x1.03x
alice29.txt (128B)62.50%-2.50%2.31x1.22x
alice29.txt (1000B)77.40%0.00%1.07x1.09x
alice29.txt (10000B)66.48%-4.29%1.27x1.43x
alice29.txt (20000B)63.43%-7.00%1.45x1.93x

Speed is generally at or above Snappy. Small blocks gets a significant speedup, although at the expense of size.

Decompression speed is better than Snappy, except in one case.

Since payloads are very small the variance in terms of size is rather big, so they should only be seen as a general guideline.

Size is on average around Snappy, but varies on content type. In cases where compression is worse, it usually is compensated by a speed boost.

Better compression

Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns. So individual benchmarks should only be seen as a guideline and the overall picture is more important.

Absolute PerfSnappy sizeBetter SizeSnappy SpeedBetter SpeedSnappy decBetter dec
html228431983316246 MB/s7731 MB/s40972 MB/s40292 MB/s
urls.10K3354922535297943 MB/s3980 MB/s22523 MB/s20981 MB/s
fireworks.jpeg123034123100349544 MB/s9760 MB/s718321 MB/s823698 MB/s
fireworks.jpeg (200B)1461428869 MB/s594 MB/s33691 MB/s30101 MB/s
paper-100k.pdf8530482915167546 MB/s7470 MB/s326905 MB/s198869 MB/s
html_x_4922341984115194 MB/s23403 MB/s30843 MB/s30937 MB/s
alice29.txt88034732185936 MB/s2945 MB/s12882 MB/s16611 MB/s
asyoulik.txt77503668445517 MB/s2739 MB/s12735 MB/s14975 MB/s
lcet10.txt2346611905896235 MB/s3099 MB/s14519 MB/s16634 MB/s
plrabn12.txt3192672708285159 MB/s2600 MB/s11923 MB/s13382 MB/s
geo.protodata233351827821220 MB/s11208 MB/s56271 MB/s57961 MB/s
kppkn.gtb69526618519732 MB/s4556 MB/s18491 MB/s16524 MB/s
alice29.txt (128B)80816691 MB/s529 MB/s31883 MB/s34225 MB/s
alice29.txt (1000B)77474812204 MB/s1943 MB/s48056 MB/s42068 MB/s
alice29.txt (10000B)6648623410044 MB/s2949 MB/s32378 MB/s28813 MB/s
alice29.txt (20000B)12686115847733 MB/s2822 MB/s30566 MB/s27315 MB/s
Relative PerfSnappy sizeBetter sizeBetter SpeedBetter dec
html22.31%13.18%0.48x0.98x
urls.10K47.78%24.43%0.50x0.93x
fireworks.jpeg99.95%-0.05%0.03x1.15x
fireworks.jpeg (200B)73.00%2.74%0.07x0.89x
paper-100k.pdf83.30%2.80%0.07x0.61x
html_x_422.52%78.49%0.04x1.00x
alice29.txt57.88%16.83%1.54x1.29x
asyoulik.txt61.91%13.75%0.50x1.18x
lcet10.txt54.99%18.78%0.50x1.15x
plrabn12.txt66.26%15.17%0.50x1.12x
geo.protodata19.68%21.67%0.50x1.03x
kppkn.gtb37.72%11.04%0.53x0.89x
alice29.txt (128B)62.50%-1.25%0.47x1.07x
alice29.txt (1000B)77.40%3.36%0.08x0.88x
alice29.txt (10000B)66.48%6.23%0.16x0.89x
alice29.txt (20000B)63.43%8.69%0.29x0.89x

Except for the mostly incompressible JPEG image compression is better and usually in the double digits in terms of percentage reduction over Snappy.

The PDF sample shows a significant slowdown compared to Snappy, as this mode tries harder to compress the data. Very small blocks are also not favorable for better compression, so throughput is way down.

This mode aims to provide better compression at the expense of performance and achieves that without a huge performance penalty, except on very small blocks.

Decompression speed suffers a little compared to the regular S2 mode, but still manages to be close to Snappy in spite of increased compression.

Best compression mode

S2 offers a “best” compression mode.

This will compress as much as possible with little regard to CPU usage.

Mainly for offline compression, but where decompression speed should still be high and compatible with other S2 compressed data.

Some examples compared on 16 core CPU:

* enwik10
Default... 10000000000 -> 4761467548 [47.61%]; 1.098s, 8685.6MB/s
Better... 10000000000 -> 4225922984 [42.26%]; 2.817s, 3385.4MB/s
Best... 10000000000 -> 3667646858 [36.68%]; 35.995s, 264.9MB/s

* github-june-2days-2019.json
Default... 6273951764 -> 1043196283 [16.63%]; 431ms, 13882.3MB/s
Better... 6273951764 -> 950079555 [15.14%]; 736ms, 8129.5MB/s
Best... 6273951764 -> 846260870 [13.49%]; 8.125s, 736.4MB/s

* nyc-taxi-data-10M.csv
Default... 3325605752 -> 1095998837 [32.96%]; 324ms, 9788.7MB/s
Better... 3325605752 -> 960330423 [28.88%]; 602ms, 5268.4MB/s
Best... 3325605752 -> 794873295 [23.90%]; 6.619s, 479.1MB/s

* 10gb.tar
Default... 10065157632 -> 5916578242 [58.78%]; 1.028s, 9337.4MB/s
Better... 10065157632 -> 5650133605 [56.14%]; 2.172s, 4419.4MB/s
Best... 10065157632 -> 5246578570 [52.13%]; 25.696s, 373.6MB/s

* consensus.db.10gb
Default... 10737418240 -> 4562648848 [42.49%]; 882ms, 11610.0MB/s
Better... 10737418240 -> 4542443833 [42.30%]; 3.3s, 3103.5MB/s
Best... 10737418240 -> 4272335558 [39.79%]; 38.955s, 262.9MB/s

Decompression speed should be around the same as using the ‘better’ compression mode.

Concatenating blocks and streams.

Concatenating streams will concatenate the output of both without recompressing them. While this is inefficient in terms of compression it might be usable in certain scenarios. The 10 byte ‘stream identifier’ of the second stream can optionally be stripped, but it is not a requirement.

Blocks can be concatenated using the ConcatBlocks function.

Snappy blocks/streams can safely be concatenated with S2 blocks and streams.

Format Extensions

Repeat offsets must be encoded as a 2.2.1. Copy with 1-byte offset (01), where the offset is 0.

The length is specified by reading the 3-bit length specified in the tag and decode using this table:

LengthActual Length
04
15
26
37
48
58 + read 1 byte
6260 + read 2 bytes
765540 + read 3 bytes

This allows any repeat offset + length to be represented by 2 to 5 bytes.

Lengths are stored as little endian values.

The first copy of a block cannot be a repeat offset and the offset is not carried across blocks in streams.

Default streaming block size is 1MB.

LICENSE

This code is based on the Snappy-Go implementation.

Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.