HomeDesign Services FPGA Boards FMC ModulesIP CoresZ-RAY Modules AccessoriesOnline StoreHow To BuyAbout UsDesign ToolsSearch



Product Updates
 

              

10G TCP/IP Offload Engine (TOE) IP Core


Integration of 10Gbps TOE + 10 GEMAC + PCIe allows this highly flexible and customizable IP core to be used for layer-3, layer 4-7 network infrastructure and network security systems applications. Some applications include high performance Servers, NICs, SAN/NAS and data center equipment design applications. This IP core provides key building blocks for very high performance 10-Giga bit Ethernet implemented in ASIC/ASSP/FPGAs.

The IP core can process TCP/IP sessions as client/server in mixed session mode and other protocols for Network equipment and in-line network security appliances, simultaneously, at 10-G-bit rate. This relieves the host CPU from costly TCP/IP software related session setup/tear down, data copying and maintenance tasks thereby delivering 4x to 8x TCP/IP network performance improvement when compared with TCP/IP software.

Wide range of TOE processing hardware cores  is offered for 10-GE to 1-GE applications using PCI Express or embedded system interfaces. TOE products support full TCP offload as well as conventional NIC mode operation (in TCP Bypass Mode) and feature advanced software support (optional) where applications need no modification to take advantage of TOE acceleration.

TOE’s design versions

1) Generic TOE for Network infrastructure design applications:

     a) 4 Session with Payload FIFO of 8/16 K Bytes.
     b) 16 Session with Payload FIFO of 16/32 K Bytes
     c) 64 Session with scalable FIFO of 64K/256K bytes.
     d) 65+ sessions depend upon onchip memory or DDRx interface (available upon request).
     e) Optional Very high performance DMA blocks also available to integrate with high performance PCIe Gen 2 interface.

 2) TOE with enhanced Security features 

     a) All of the options available in Generic TOE plus;

i. Protocol filter block can selectively direct traffic for any known application level protocol to any selected MAC port; e.g. all 
                IM/chat traffic, SMTP (email), Web (http) traffic, VoIP etc. can be filtered and directed to selected ports.
                ii. IP and Port number filter block
                iii. Specific IP and Port Filtered traffic routed to optional selected MAC interface/s or PCIe interface or Memory interface directly at 
                line rate without CPU involvement.
                iv. MAC Filter block, traffic routed to any of the selected interfaces


Simplified Block Diagram

 Specifications brief:

• Original 1 G TOE Functionality Proven in multiple IDS/IPS appliances
• Complete header and flag processing of TCP/IP sessions in hardware (accelerates by 10x – 20x)
• TCP Offload Engine- 20-G b/s Wire-speed performance
• Scalable to 40 G b/s
• TCP + IP check sum- hardware
• TCP segmentation/reassembly in hardware (opt)
• Multiple ‘slot storage’ for fragmented packets (opt)
• Out of sequence packet detection/storage/Reassembly (opt)
• TCP port address tracking/automatic DMA
• MAC Address search logic/filter (opt)
• IP address search logic/filter (opt)
• Accelerate security processing, Storage Networking- TCP
• RDMA- Data placement in Applications buffer (reduces CPU utilization by 90 % )
• Future Proof- Flexible implementation of TCP Offload
• Accommodates future Specifications changes.

APIs

Network applications use the Socket API. Typically OS implements the Socket API with a software stack. In its basic mode the TOE core implements a standard Hardware API which allows next higher level applications to fully take advantage of TOEs complete benefits. Optionally, to achieve higher performance, an equivalent Socket API (TOE Socket API) can be implemented  in order to enable plug and play acceleration through a simple intercept of standard Socket calls.

Hardware API: Enables dedicated processing in the FPGA for application specific acceleration

• Ideal for Very high performance specialized, differentiable ASICs or FPGAs for Network security or Network infrastructure applications
• Fully verified using comprehensive verification methodology for ASIC ports and Network system tested core.
• Smallest logic foot print; less than 20,000 Xilinx slices, Altera ALMs or 250,000 ASIC gates + on-chip memory
• Fully integrated 10 G bit high performance Ethernet MAC.
• Scalable MAC Rx FIFOs and Tx FIFOs make it ideal for optimizing system performance.
• Hardware implementation of TCP/IP stacks’ control plane and data plane.
• Hardware implementation of ARP protocol processing.
• Extended ARP table creation, deletion management (optional)
• Adheres to RFCs; 793, 1500, 1700, 813, 791, 2001
•
‘Sliding Window’ mechanism implemented in hardware allowing total Flow Control
•‘Slow start’ transfer control in hardware
• Non-TCP Bypass mode lets all Non TCP/IP related traffic go directly to host interface via user_fifo for TCP/IP software to handle
• Can be deployed behind a gateway which will respond to Gateway-IP request as opposed to ARP request (optional)
• On-chip DDR or SSRAM memory controller which can address from 4K Bytes to 4 MB Bytes on chip or 256 MB off chip memories (optional)
• Simple User Side interface for easy hardware integration or a little more complicated for more power full and controlled ‘Streaming’ data transfers.
• Many trade-offs for some functions performed in hardware or software
• Configurable Packet buffers, session table buffers On-chip or Off-chip memories,  attached DDR I/II interface. Depending on system, performance, 
   ASIC/FPGA size requirements
• Interfaces directly to XGMII, 10 G Bit serial interface
• Architecture can be scaled up to 40-G bits
• Customizable to handle jumbo frames
• Integrated PCIe x 4 bus interface. x8 and x16 (opt)
• Integrated AMBA 2.0 interface or Xilinx’s PLB bus for Local Processor control.
• User programmable/ prioritize-able interrupts
• Performs connection/session management
• Monitors, Stores, Maintains and processes up to 1024 live TCP sessions. Customizable to implement more, depending upon on-chip memory
   availability and other FPGA limitations.
• Extendable to 4K TCP sessions. Internal Memory dependent.
• Wire-speed 20-Gbps performance in full duplex
•
Multiple TOEs can process up to 4K connections per second
•TCP + IP check sum generation and check performed in hardware in less than 6 clks (30 ns at 200 MHz) vs 1-2 us by typical software TCP-stack
• Connection set up and tear down/termination without CPU involvement.
• User programmable Session table parameters
• Dedicated set of hardware Timers for each TCP/IP session or customizable for sharing one set of common timers for all stale sessions.
• Multiple ‘slot storage’ for fragmented packets. More slots allocated when more On-chip Memory available. Self-checking available memory logic.
   (optional)
• Out of sequence packet detection/storage and Reassembly/Segmentation (optional)
• Direct Data placement in Applications buffer at full wire speed without CPU (reduces CPU’s buffer copy time and utilization by 95% )
• Support VLAN Bypass mode (optional)
• Easily customizable for filtering various IP and TCP traffic Protocols, directed towards any port or IP (Ideal for security appliances)
• Implements Full TCP/IP offload or By-Pass mode. (Optional)
• Future Proof- Flexible implementation of TCP Offload
• Accommodates future Specifications changes.
• Basic mini API available for easy integration with Linux/windows. Others OSs/CPUs also available

Deliverables

•  Verilog Source Codes or NetList.
•  Test Bench, ,vcd files, configuration code/API for easy Linux port
•  Verilog models for various components e.g. TCP/IP Client and Server models, transaction model (optional)
• 10-GEMAC
• External memory interface/model (optional).
• TCP Model (optional)
• Verification suite (optional)
• Test packet-traffic suite (optional)