PCI Express (PCIe) has been the backbone of high-speed interconnects for nearly two decades. While the Transaction Layer and Physical Layer often get attention, the Data Link Layer (DLL) is where the “invisible” reliability and flow control magic happens.
In this article, we’ll deep dive into how the DLL evolved from PCIe Gen1 → Gen7, and walk through a dummy payload as it travels from TX to RX.
PCIe communication is structured in 3 main layers:
- Transaction Layer – Generates high-level requests (Memory Read/Write, Config, I/O).
- Data Link Layer (DLL) – Ensures reliable, ordered delivery over an unreliable physical medium.
- Physical Layer (PHY) – Handles actual transmission (8b/10b, 128b/130b, PAM4, etc).
The DLL sits in the middle – hiding PHY errors from the Transaction Layer and guaranteeing data reliability. From Gen4 to Gen7, the DLL has undergone three major transitions:
- Gen4–5: Classic ACK/NAK + Replay system.
- Gen6: Transition to FLIT mode, removal of ACK/NAK, use of FEC.
- Gen7: Optimized FLIT + stronger FEC for even higher PAM4 speeds.
PCIe Gen4 DLL (16 GT/s, NRZ)
Core DLL Features
-
DLLPs (Data Link Layer Packets):
- ACK/NAK DLLPs: Acknowledge successful or failed packet reception.
- Flow Control DLLPs: Advertise buffer credits for TLPs/DLLPs.
- Power Management DLLPs.
-
Replay Buffer:
- Stores every outgoing TLP until ACKed.
- If NAK received, replay from buffer.
-
Sequence Numbers:
- Each TLP tagged with SeqNum.
- Ensures ordering + replay tracking.
-
CRC / LCRC:
- Each packet has 32-bit LCRC.
- Protects against bit errors.
Reliability Model
- ACK/NAK + Replay.
- Works well because NRZ BER (bit error rate) is ~1e-12 (low).
- Retransmissions are rare, so replay latency is minimal.
Overhead
- LCRC (4B) per TLP.
- Extra ACK/NAK DLLPs on the link.
PCIe Gen5 DLL (32 GT/s, NRZ, 128b/130b)
What Changes from Gen4
- Signaling doubles (16 → 32 GT/s).
- Encoding efficiency improves (still 128b/130b like Gen3/4).
- DLL structure stays the same → Still packet mode, ACK/NAK, Replay buffer.
DLL Functions
-
Same as Gen4:
- SeqNum, LCRC, Replay buffer.
- ACK/NAK DLLPs.
- Flow Control DLLPs.
Challenges Emerging
-
At 32 GT/s, NRZ still works, but:
- Retransmission penalties increase (high-latency retries).
- DLL overhead + ACK/NAK signaling → consumes more link bandwidth.
- Replay buffer depth = larger (due to bigger pipelines).
✅ Still manageable, but this design wouldn’t scale to Gen6.
PCIe Gen6 DLL (64 GT/s, PAM4, FLIT Mode)
Gen6 introduces the biggest shift in DLL history.
Why? PAM4 at 64 GT/s has much higher BER (~1e-6). Old “retry on error” model → constant replays.
When PCIe moved from packet mode (Gen4/5) to FLIT mode (Gen6/7), all control information that was previously sent as DLLPs (ACK, NAK, Flow Control, Sequence numbers, CRC, etc.) had to be folded into the FLIT header.
That’s where the 6-byte (48-bit) DLP header comes in.
With the move to 64 GT/s PAM4 signaling, error rates increased, so PCIe Gen6 redesigned the DLL:
- FLIT = 256B fixed size (242B payload + 14B overhead).
- Each FLIT carries a 6-byte Data Link Payload (DLP), replacing external DLLPs.
- Frame Details (New with PCIe 6.0):
- Flit Info & Sequence Number field added at the start.
- DLLP Payload (DLP Bytes 0–5) embedded directly into the FLIT.
- Subfields include:
- Flit Type (FT): Identifies payload, idle, or NOP FLIT.
- PF (Prior Flit): Indicates if prior flit is NOP/Idle or Payload.
- Replay Cmd (RC): ACK, NAK, or replay request.
- Sequence Number (Seq#): Maintains replay order.
- Optional Flow Control/Markers: Used for credit updates.
👉 Not all DLLP types from prior generations are valid anymore (e.g., ACK/NAK DLLPs are no longer standalone packets, they are embedded in the DLP field).
In Gen6, DLLP info is embedded inside the FLIT as a DLP, eliminating separate DLLPs.
- Error protection:
- CRC-32 + Forward Error Correction (FEC).
- Replay only triggered when FEC cannot fix the error.
Byte-by-Byte Breakdown
DLP Byte 0 (Bits 47:40)
- Contains Flit Info + Sequence Number [47:32]
- Encodes high-level info about which Flit this DLLP belongs to.
- Also holds the Sequence Number field used for ordering and replay.
DLP Byte 1 (Bits 39:32)
- Continuation of the Sequence Number.
- Together, Bytes 0 and 1 = 16-bit Sequence Number.
- Sequence numbers are critical for detecting missing Flits and enabling replay.
DLP Byte 2 (Bits 31:24)
- DLLP Type + Prior Flit (PF) indicator + Replay Command (RC)
-
Fields:
- Type [47:46] – Identifies DLLP type (ACK, NAK, Update_FC, Flit Marker).
- PF (Prior Flit) [45] – Indicates relation to previous Flit (NOP, Payload Flit, Idle).
-
RC (Replay Cmd) [44:43] – Replay status:
- 00 = ACK
- 01 = NAK of all outstanding Flits
- 10 = ACK for single Flit
- 11 = NAK for single Flit
DLP Byte 3 (Bits 23:16)
- Reserved / part of DLLP Payload depending on encoding.
- Can carry Flow Control Updates or Flit Marker (FM) info.
DLP Byte 4 (Bits 15:8)
- Part of the DLLP Payload.
- Used for credits update (Update_FC) or marker signals for boundaries.
DLP Byte 5 (Bits 7:0)
- Last byte of DLLP Payload.
- Example: For Update_FC DLLPs → may carry new VC/SC buffer credit values.
Key Notes
-
Not all legacy DLLPs are supported in Flit Mode.
- ACK/NAK DLLPs are now encoded within these 6 Bytes instead of being standalone packets.
-
DLLP Payload (Bytes 2–5) varies depending on DLLP type:
- Update_FC → Credit updates.
- Flit Marker → Indicates start of new transaction boundary.
- NAK/ACK → Replay control.
⚡ So in short:
- Bytes 0–1 → Sequence Number & Flit Info
- Byte 2 → Type + PF + Replay Command
- Bytes 3–5 → DLLP Payload (Flow Control, Flit Marker, or Replay details
Key Innovations:
-
FLIT Mode (Fixed Size Packets):
- All data in 256B FLITs (≈242B payload + ≈14B header/CRC).
- Simplifies buffer management and ensures predictable bandwidth.
-
ACK/NAK Redesign:
- Separate DLLP ACK/NAK removed.
- Instead, each FLIT header carries embedded ACK/NAK sequence tracking fields.
- Retransmission is at FLIT granularity.
-
Reliability via FEC:
- Forward Error Correction corrects most single-bit errors inline.
- If still uncorrectable → NAK info in FLIT header triggers retransmission.
- Replay buffer now stores FLITs, not TLPs.
-
DLLPs Simplified:
- Flow Control still present.
- NOP FLITs for keep-alive.
- ACK/NAK DLLPs eliminated → less traffic.
Overhead:
- ~20B per 256B FLIT (~7.8%).
- No separate ACK/NAK DLLPs.
✅ Deterministic latency.
✅ Higher throughput scaling.
✅ Mandatory for CXL 2.0/3.0.
PCIe Gen7 DLL (128 GT/s, PAM4, Optimized FLIT + Advanced FEC)
Gen7 doubles signaling rate again (64 → 128 GT/s).
DLL Characteristics:
- FLIT Mode (256B fixed).
-
Stronger FEC:
- Likely LDPC-style improvements.
- Pipeline-optimized to reduce decode latency.
-
Low-latency path:
- PHY can flag early errors, avoiding wasted FLIT decode.
-
ACK/NAK Embedded:
- Same FLIT header mechanism as Gen6.
- Faster detection and retry of corrupted FLITs.
-
DLLPs Minimal:
- Flow Control, NOP.
- No standalone ACK/NAK packets.
Overhead & Challenges:
- Slightly larger FEC parity.
- Balancing stronger correction with latency.
✅ Reliable scaling to 128 GT/s (~512 GB/s per x16).
DLL Comparison (Gen4 → Gen7)
Feature | Gen4 (16 GT/s) | Gen5 (32 GT/s) | Gen6 (64 GT/s, PAM4) | Gen7 (128 GT/s, PAM4) |
---|---|---|---|---|
Mode | Packet mode | Packet mode | FLIT mode (256B) | FLIT mode (256B) |
Error Handling | ACK/NAK + Replay | ACK/NAK + Replay | FEC + embedded ACK/NAK + Replay (FLIT-level) | Stronger FEC + embedded ACK/NAK + Replay |
Replay Buffer | Yes (TLPs) | Yes (TLPs) | Yes (FLITs) | Yes (FLITs) |
DLLPs | ACK, NAK, FlowCtrl, PM | ACK, NAK, FlowCtrl, PM | FlowCtrl, NOP (ACK/NAK in header) | FlowCtrl, NOP (ACK/NAK in header) |
CRC | 32-bit LCRC | 32-bit LCRC | Per-FLIT CRC | Per-FLIT CRC |
Overhead | LCRC + DLLPs | LCRC + DLLPs | ~20B / 256B (~7.8%) | ~20B+ / 256B (~8–9%) |
BER Tolerance | ~1e-12 | ~1e-12 | ~1e-6 (via FEC) | ~1e-5 (stronger FEC) |
Latency | Variable (retries) | Higher (more retries) | Deterministic (FEC + FLIT retry) | Deterministic (optimized FEC) |
Scalability | OK | Stressful | Excellent | Excellent |
Key Takeaways
- Gen4 & Gen5 DLL: Classic model with ACK/NAK + Replay. Works fine with NRZ signaling and low error rates.
- Gen6 DLL: Breakthrough redesign → FLIT mode, FEC replaces replay, deterministic latency.
- Gen7 DLL: Evolution of Gen6 → stronger FEC, tuned for 128 GT/s PAM4.
In short:
- Gen4–5 = “Detect & Retry”.
- Gen6–7 = “Correct Inline”.
This shift in the Data Link Layer is what enables PCIe to scale from 16 GT/s → 128 GT/s while maintaining reliability and predictable performance.
Example Walkthrough: TLP Flow Across DLL in Gen4 → Gen6 → Gen7
We’ll assume the Transaction Layer generates a Memory Read Request TLP of size 64 bytes.
Now let’s see how it flows downward into the DLL, across the link, and upward.
🟢 PCIe Gen4 DLL Flow (DLLPs + Replay Buffer)
-
Transaction Layer:
- Creates a 64B Memory Read Request TLP.
-
Data Link Layer (Sender):
- Adds Link CRC (LCRC) to TLP.
- Stores TLP in Replay Buffer (until ACK received).
- Sends TLP downstream.
-
Receiver DLL:
- Verifies LCRC.
- If correct → sends ACK DLLP.
- If corrupted → sends NAK DLLP.
-
Sender DLL:
- On ACK → removes TLP from Replay Buffer.
- On NAK → retransmits TLP.
-
Overhead:
- Extra DLLPs inserted for ACK/NAK + Flow Control Updates.
- Latency penalty if retransmission required.
✅ Reliability = Good
❌ Overhead = High (DLLPs + replay on errors)
🟡 PCIe Gen5 DLL Flow (Faster, Same Mechanism)
- Same flow as Gen4, but with higher signaling speed (32 GT/s).
Key Difference:
- Replay buffer must handle larger bursts.
- Retransmission hurts more (wider bandwidth wasted if NAK).
✅ Same mechanism
❌ Even more sensitive to errors → replay storms possible
🔴 PCIe Gen6 DLL Flow (FLIT + FEC + Embedded ACK/NAK)
-
Transaction Layer:
- Creates a 64B TLP.
-
Data Link Layer (Sender):
- FLIT Packaging: Groups multiple TLPs into a 242B FLIT.
- Our 64B TLP fits inside one FLIT (with padding or other TLPs).
- Adds 6B DLP header (includes CRC + sequence tracking).
- Stores FLIT in Replay Buffer (until ACK/NAK received in FLIT header).
-
FEC Processing:
- Across groups of 3 FLITs, adds 24B FEC parity.
- FEC corrects most single-bit errors inline.
-
Receiver DLL:
- Performs CRC check on each FLIT.
- Uses FEC to correct correctable errors.
- If uncorrectable → requests retransmission using embedded NAK field in FLIT header.
-
Sender DLL:
- On ACK → retires FLIT from Replay Buffer.
- On NAK → retransmits entire FLIT (not just TLP).
-
Overhead:
- Fixed FLIT header + FEC overhead (~7.8%).
- No standalone ACK/NAK DLLPs.
✅ Reliability = Higher (FEC corrects most errors, fewer replays)
❌ Efficiency = Fixed FLIT structure reduces usable payload
🟣 PCIe Gen7 DLL Flow (FLIT + Optimized FEC + Fast Replay)
-
Transaction Layer:
- Creates a 64B TLP.
-
Data Link Layer (Sender):
- Same FLIT packaging (242B payload + header).
- Improved packing efficiency reduces wasted space.
- Stores FLIT in Replay Buffer.
-
FEC Processing:
- Enhanced low-latency FEC (stronger coding, faster correction).
- Corrects single-bit errors inline, detects multi-bit faster.
-
Receiver DLL:
- Performs FLIT CRC + FEC correction.
- If unrecoverable → embedded NAK in FLIT header triggers retransmission.
-
Sender DLL:
- On ACK → retires FLIT.
- On NAK → retransmits FLIT with lower latency than Gen6.
✅ Reliability = Even better (stronger FEC + faster retries)
✅ Efficiency = Better packing, fewer wasted bytes
❌ Still has fixed FLIT overhead
📊 Example Comparison: 64B TLP Through DLL
Step | Gen4/Gen5 | Gen6 | Gen7 |
---|---|---|---|
Packaging | TLP + LCRC | TLP in 242B FLIT | TLP in 242B FLIT (better packing) |
Reliability | ACK/NAK DLLPs + Replay | FEC + embedded ACK/NAK + Replay | Stronger FEC + embedded ACK/NAK + Replay |
Replay Unit | TLP | FLIT | FLIT |
Control Signaling | DLLPs (ACK/NAK, FC) | Flow Control + FLIT header | Flow Control + FLIT header |
Error Handling | NAK → replay TLP | FEC fixes most, NAK → replay FLIT | FEC fixes most, NAK → faster replay |
Efficiency | High (no fixed FLIT) | Medium (FLIT overhead) | Higher (optimized packing) |
📝 Final Takeaway
- In Gen4/Gen5, reliability = ACK/NAK DLLPs + replay buffer, but at higher bandwidth this becomes inefficient.
- In Gen6, DLLPs are removed, replaced by FLIT mode + FEC, which absorbs most errors without replay.
- In Gen7, the same FLIT/FEC model is optimized for speed and packing, reducing overhead further.