# **FPGA-based Network Interface Cards Implementing Real-time Data Transport for HEP Experiments**

R. Ammendola<sup>1</sup>, A. Biagioni<sup>2</sup>, O. Frezza<sup>2</sup>, G. Lamanna<sup>3</sup>, F. Lo Cicero<sup>2</sup>, A. Lonardo<sup>2</sup>, M. Martinelli<sup>2</sup>, P. S. Paolucci<sup>2</sup>, E. Pastorelli<sup>2</sup>, L. Pontisso<sup>4</sup>, D. Rossetti<sup>5</sup>, F. Simula<sup>2</sup>, M. Sozzi<sup>4</sup>, P. Cretaro<sup>2</sup>, P. Vicini<sup>2</sup>

<sup>1</sup>Sezione di Tor Vergata, Istituto Nazionale di Fisica Nucleare, Rome, Italy, <sup>2</sup>Sezione di Roma, Istituto Nazionale di Fisica Nucleare, Rome, Italy, <sup>3</sup>Laboratori Nazionali di Frascati, Istituto Nazionale di Fisica Nucleare, Frascati (Rome), Italy,<sup>4</sup>Sezione di Pisa, Istituto Nazionale di Fisica Nucleare, Pisa, Italy, <sup>5</sup>nVIDIA Corp, Santa Clara, CA, USA

### Abstract

NaNet is a modular design of a family of FPGA-based PCIe Network Interface Cards implementing low-latency, real-time data transport between its network channels and the the host CPU and GPU accelerators memories.

The design feature a network stack protocol offloading module that operating in conjunction with a high performance PCIE Gen2/3 X8 core yields a low and predictable communication latency, making NaNet suitable for real-time applications.

A reconfigurable processing module is also available to implement application-specific processing on inbound/outbound data streams with highly reproducible latency.

As of now NaNet design has been specialized in the NaNet-1 (single 1GbE ports) configurations employed in the GPU-based real-time trigger of the CERN NA62 experiment, and in the NaNet3 (four 2.5 Gbit optical channels) configuration adopted in the data acquisition system of the KM3NeT-Italia underwater neutrino telescope.

### NaNet Design

### **Case Study: NA62 RICH Detector**



#### **GPUDirect P2P/RDMA**

GPUDirect allows direct data exchange on the PCIe bus with no CPU involvement (zero copy) -> Latency reduction for small messages



### □ I/O Interface

- Multiple link.
- Multiple network protocols.
  - Off-the-shelf: 1GbE, 10GbE
  - Custom: APElink (34 gbps/QSFP), KM3link
- □ Router
  - Dynamically interconnects I/O and NI ports.
- Network Interface
  - Manages packets TX/RX from and to CPU/GPU memory.
- TLB & Nios II Microcontroller • Virtual memory management
- □ PCIe X8 Gen2 Core
  - CPU BW: 2.8 GB/s Read ÷ 2.5 GB/s Write • GPU BW: 2.5 GB/s Read & Write.
- □ Finalizing PCIe X8 Gen3 Core





L62

E

Ċ

TDC

2048



- □ Pion-Muon discrimination
- □ 70 ps time resolution
- □ 10 MHz event rate
- □ 20 photons detected on average per single ring event (hits on photo-detectors)
- □ 40 Byte per event



#### □ 4 TEL62 for RICH detector

- 8×1GbE links for data r/o
- 4×1GbE trigger primitives
- 4×1GbE GPU trigger
- □ Events rate: 10 MHz
- □ L0 trigger rate: 1 MHz
- □ Max Latency: 1 ms

## NaNet-1 in RICH low level trigger processor

- Implemented on Altera Stratix IV dev board
- TTC daughtercard with HSMC connector for timing (clock, SOB/EOB) and trigger signals



- /dev/nane Kernel space Linux Kernel Driver **NaNet Device** 
  - CLOP management,...) Linux Kernel Device Driver
  - □ NaNet Device
    - Nios II Microcontroller: single process software (bare metal) performing system configuration & initialization tasks

# **KM3NeT-Italia** experiment

space



- European deep-sea research infrastructure hosting a new generation of a neutrino telescope with a volume of several cubic kilometres located at the bottom of the Mediterranean Sea (~100km off-shore, ~3500m under the level of the sea).
- Data produced by OMs, hydrophones, and instruments, are collected by an electronic board contained in a vessel at the centre of the floor (FCM board)
- NaNet<sup>3</sup> manages communication between the on-shore lab and the underwater devices, also distributing the timing information (GPS clock) and signals received from the on-shore equipment
- Deterministic latency links are required to obtain a common timing and known delay for the spatially distributed readout



### NaNet<sup>3</sup>

Is the on-shore endpoint for 4 offshore readout cards.

#### NaNet3 On-shore (StratixV)



- Merger time depends on data size. Working on task speed up:
  - NOW performed on GPU
  - Finalizing FPGA implementation (tens of cycles latency)
- $\Box$  Computing time (K20c):
  - $\sim 1 \ \mu s \ per \ event$





Rings pattern recognition and fit also performed on GPU:

- □ New algorithm ("Almagest") developed for trackless, fast, and high resolution ring fitting
- Rough detection of particle speed (radius) and direction (centre).
- $\Box$  0.5 µs per event (on nVIDIA K20x)

## NaNet-10 (four 10GbE SFP+ Ports)

RS422

(RJ45 with 2 LEDs)

Altera Stratix V GX

- ALTERA Stratix V Terasic DE5-NET dev board
- 4 SFP+ ports (Link speed up to 10 Gb/s)
- Implemented on Terasic DE5-NET board
- GPUDirect P2P/RDMA capability



- Implemented on the Terasic DE5-NET Stratix V FPGA dev board
- 4 custom 2.5 Gbps deterministic latency optical channels
- Link speed up to 10 Gb/s
- GPUDirect P2P/RDMA capability
- Deterministic latency link: employs Altera Deterministic Latency Transceivers with an 8B/10B encoding scheme as Physical Link Coding and Time Division MultiPlexing (TDMP) data transmission protocol











capability

#### **Contacts:**

NaNet project: <u>http://apeqate.roma1.infn.it/nanet</u> APE project: <a href="http://apegate.roma1.infn.it/APE">http://apegate.roma1.infn.it/APE</a> Presenter Contact: michele.martinelli@roma1.infn.it NaNet coordinator: alessandro.lonardo@roma1.infn.it

