direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

CluMPzy

Lupe

With the new Zynq silicon Xilinx offers powerful hardware to realize cutting edge hardware/software codesigns. Besides the tremendous logic fabric capabilities and the integrated state of the art embedded processors there are splendid interfaces to add local peripherals and communication infrastructure. We want to use these capabilities to design and realize a cluster node based on the most powerful device of the Zynq family from Xilinx.

Node

Design sketch
Lupe

As depicted in the design sketch our goal is to design a reduced and focussed computational cluster node. We want to emphasize the high speed interconnection and the local memory capabilities. The sketch is not true to scale and some components are not shown but it should be seen as a design proposal.

Where the gigabit ethernet port is only seen either as a configuration interface to set up or update the node or as monitoring interface to collect analytics data accumulated by the processor cores the data exchange between different nodes should happen by using the SFP+ interconnections. With eight of them we have enough to realize different powerful network topologies.

For local memory we plan to realize a two level memory sub system. One the one hand there will be DDR 3 SDRAM and on the other hand QDR II+ SRAM will be added.

Interconnection

3D Torus
Lupe

Eight of the GTX high speed serial FPGA links are used to connect the nodes with data sources and sinks. Six of them are singled out to connect nodes (and therefore FPGAs) directly using a 3D Torus topology. The remaining two can be used for 10Gb Ethernet (in case of a hierarchical cluster with a top switch based topology) or FiberChannel storage systems or other purpose. To achieve this flexibility level for use of the high speed serial links we choose SFP+ infrastructure as physical connectors. In total 160 Gbps bandwidth is at one node disposal.

Tasks

Several tasks have to be investigated and implemented:

     

  • using node spanned logic resources vs. per node computing elements
  • data flow monitoring and profiling supported by hardware counter
  • unified address space models for local and global data flow
  • frame based vs. streaming data access
  • low latency interfaces.

Applications

  • HEVC encoder hardware
  • many core platform simulation
  • engineering (CFD/FEM)
  • neural networks
  • big scale hardware/software codesign

People Involved

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions