Jetson AGX Orin 64GB | JetPack 6.2.2 | CUDA 12.6 | MAXN mode | All results roundtrip verified
Decompression Throughput (MB/s) — In-Memory Performance
End-to-end performance including disk I/O.
| Method | Processor | Compress MB/s | Decompress MB/s | Ratio | Integrity |
|---|---|---|---|---|---|
| nvCOMP LZ4 | GPU | 517 | 4,258 | 1.98x | PASS |
| zstd-1 | CPU | 1,094 | 733 | 2.00x | PASS |
| zstd-3 | CPU | 1,014 | 741 | 2.00x | PASS |
Raw algorithm throughput without disk I/O bottleneck.
| Method | Processor | Compress MB/s | Decompress MB/s | Integrity |
|---|---|---|---|---|
| nvCOMP LZ4 | GPU | 705 | 8,537 | PASS |
| nvCOMP Snappy | GPU | 1,615 | 5,756 | PASS |
| zstd-1 | CPU | 1,747 | 2,001 | PASS |
A 1GB model checkpoint restores in ~0.12 seconds in-memory. A forward-deployed AI node that reboots in the field is back to full inference in under a second.
Below 10 MB, kernel launch overhead dominates. Above that, GPU decompression is 4.3x faster than CPU. Large ML datasets and model weights see the biggest gains.
CPU zstd-1 at 1,747 MB/s compress and 2,001 MB/s decompress in-memory. HammerIO only uses GPU when the overhead is worth it.