Tesla Data Center Solutions
Based on the CUDA architecture codenamed "Fermi," the Tesla M-class GPU computing Modules are the world's fastest parallel computing processors for high performance computing (HPc). Tesla GPU's high performance makes them ideal for seismic processing, biochemistry simulations, weather and climate modeling, signal processing, computational finance, CAE, CFDand data analytics.
Accelerate your science with NVIDIA Tesla 20-series GPUs. A companion processor to the CPU in a server, Tesla GPUs speed up HPC applications by 10x. Based on the NVIDIA CUDA GPU architecture codenamed "Fermi", Tesla 20-series GPUs feature up to 665 gigaflops of double precision performance, 1 teraflop of single precision performance, ECC memory error protection, and L1 and L2 caches.
FEATURES AND BENEFITS
HUNDREDS oF CUDA CORES
Delivers up to 665 Gigaflops of double-precision peak performance in each GPU, enabling servers from leading OEMs to deliver more than a teraflop of double-precision performance per 1 RU of space. Single precision peak performance is over one Teraflop per GPU.
ECC MEMORY ERROR PROTECTION
Meets a critical requirement for computing accuracy and reliability for workstations. Offers protection of data in memory to enhance data integrity and reliability for applications. Register files, L1/L2 caches, shared memory, and DRAM all are ECC protected.
UP TO 6GB OF GDDR5 MEMORY PER GPU
Maximizes performance and reduces data transfers by keeping larger data sets in local memory that is attached directly to the GPU.
SYSTEM MONITORING FEATURE
Integrates the GPU subsystem with the host system's monitoring and management capabilities such as IPMI or OEM-proprietary tools. IT staff can thus manage the GPU processors in the computing system using widely used cluster/grid management solutions.
L1 AND L2 CACHES AS PART oF THE NVIDIA PARALLEL DATACACHE
Accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand.
NVIDIA GIGATHREAD ENGIN
Maximizes the throughput by faster context switching that is 10X faster than previous architecture, concurrent kernel execution, and improved thread block scheduling.
ASYNCHRONOUS TRANSFER WITH DUAL DMA ENGINES
Turbocharges system performance by transferring data over the PCIe bus while the computing cores are crunching other data.
FLEXIBLE PROGRAMMING ENVIRONMENT WITH BROAD SUPPORT OF PROGRAMMING LANGUAGES AND APIs
Choose C, C++, OpenCL, DirectCompute, or Fortran to express application parallelism and take advantage of the innovative "Fermi" architecture.