#Motivation
High-frequency trading platforms generate enormous volumes of market data in the form of PCAP (Packet Capture) files, containing granular packet-level logs of activity such as order book updates, trades, and cancellations. These files are structured for parsing, not storage efficiency — resulting in bloated datasets dominated by redundant protocol headers and repeated market events.
#Solution
I developed a high-performance C++ system to compress IEX DEEP market data PCAPs without losing any information. This is a CLI tool that takes in a large PCAP file and restructures and compresses it, achieving over 70% reduction in storage space.
#What Are PCAP Files?
PCAP (Packet Capture) files are the industry-standard format for recording raw network traffic. Commonly used in cybersecurity, networking, and finance, they store complete packet-level data — including protocol headers and payloads — for later analysis and replay. In high-frequency trading systems, PCAPs are used to log every market message (quotes, trades, cancellations) with microsecond precision. While this fidelity is invaluable for backtesting and forensics, the uncompressed format quickly grows massive, making compression essential for scalability.

For this project, I used IEX as the trading exchange platform and DEEP as the data format. DEEP stands for Depth of Book and it provides a real-time snapshot of the order book.
#System Architecture
- ▸PCAP Ingestor: Reads PCAPs via libpcap and filters UDP payloads carrying IEX-TP DEEP messages
- ▸Packet Parser: Parses Ethernet/IP/UDP headers and extracts trading messages using the IEX protocol spec. All redundant network headers are removed
- ▸Event Separator: Groups packets by symbol and DEEP message type (e.g. Quote Update, Trade Report)
- ▸Compressor: Strips redundant headers and applies per-field optimizations (delta encoding, LZ4, Zstd)
- ▸File Store: Saves compressed binary streams in a structured directory layout by symbol and event type
#Packet Parser / Decoder Details
At the core of the system is a high-performance packet parser designed specifically for IEX DEEP feeds. Traditional PCAP tools (like tcpdump or Wireshark) capture traffic generically, but they are not optimized for parsing domain-specific binary protocols such as IEX-TP. Our parser decodes Ethernet, IP, and UDP headers to isolate valid IEX DEEP payloads. From there, it extracts fields like symbol, timestamp, event type, price, and size using a byte-level understanding of the IEX message format.
This parser was built from scratch to enable downstream modules — like the event separator and compressor — to operate independently and efficiently. Each message is normalized into structured binary records, making it easy for other systems to coexist.
This specialization makes it vastly more efficient than using systems like Wireshark for batch processing or commercial tick data tools, which often treat PCAP files as opaque blobs. By deeply integrating domain-aware parsing into the pipeline, I preserve semantic meaning (e.g., trade vs. quote) while laying the groundwork for optimized compression and fast replay.
I go from this hex blob:

To this after parsing:

#File Splitting by Symbol and Event Type
After parsing, each packet is routed into a hierarchical storage structure based on its associated symbol (e.g., AAPL, MSFT) and event type (e.g., quote update, trade report, price level change). This segmentation is essential for both compression and downstream analytics. Market data is highly redundant within symbols and message types — storing them together exposes patterns like repeated prices, small timestamp deltas, and repeated flags that compression algorithms can exploit.
In contrast to monolithic PCAP files or streaming column stores, this design balances storage efficiency with operational flexibility. Each symbol/event pair is stored as a compact binary log — ready for compression, replay, or statistical analysis independently of the rest of the dataset.

#Final Steps: Compression and Reconstruction
Once packets are split and organized by symbol and event type, targeted compression algorithms are applied. Depending on the data characteristics, this includes delta encoding for timestamps and prices and lightweight block compression (e.g., LZ4) for general redundancy. These domain-aware techniques significantly reduce file size while maintaining full fidelity.
To preserve all the data in the original format, I also designed a reconstruction process which allows users to go from compressed small symbol-trade based files to the original packet capture file. This ensures all systems are reversible and the data is entirely lossless.
#Development Highlights
- ▸Wrote high-performance C++ code to handle low-level packet decoding and binary I/O efficiently
- ▸Architected a modular pipeline with clear stages: ingest, parse, partition, compress, and decompress
- ▸Optimized for large-scale PCAP files common in financial systems, minimizing memory and disk usage
- ▸Implemented custom binary formats for compact, structured storage of event-level trading data
- ▸Tested against real-world IEX DEEP captures to ensure data consistency and replayability