Technology

Market Needs

image-01
  • Rapidly growing computing demands for training and inference
  • Exponential growth model data size and model complexity – Break the link between model size and raw compute power
  • Increasing need for highly programmable and efficient AI processors with the right computational density designed from the ground up
  • Evolution to Software 2.0

Tenstorrent has anticipated this growth and has designed an architecture inherently built to handle these massive workloads effectively and efficiently.

  • Fully programmable architecture that supports fine-grain
    conditional execution, dynamic sparsity handling, and an
    unprecedented ability to scale via tight integration of
    computation and networking
  • Difficult communication and synchronization bottlenecks
    addressed
  • Compiler based flexible parallelization
  • Fine grain adaptation of compute graph at runtime
  • Consistently high performance / utilization, regardless of
    model type, batching
  • Facilitating machines to go beyond pattern recognition and into cause-and-effect learning

Grayskull – High Performance AI Processor

007
High performance AI processor for workloads of today and tomorrow
Array of powerful Tensix(TM) processing cores providing total of 368 TOPS; each Tensix core contains:
  • Fully C++ programmable, multi-threaded, front-end
  • Highly area and power efficient matrix compute engine
  • Powerful and flexible SIMD engine
Custom NOC with unprecedented multi-cast flexibility and low software overhead data transfer
Support for the largest models of today, and for upcoming larger models
  • A large 120MB on-chip SRAM,
  • 8 channels of LPDDR4 DRAM,
  • 16 lanes of PCI-e Gen 4 host interface,
Flexible data format support, including INT8, and many flavours of FP
Dynamic data compression for over 8x improvement in bandwidth efficiency for sparse workloads
Full support for Tenstorrent's conditional execution and dynamic graph optimization technologies for over 10x improvement in throughput on optimized graphs

Grayskull Performance –

All numbers below are achieved with the TDP target set to 65W

Performance BenchMark

Tenstorrent Grayskull

Raw TOPS

368 TOPS

Resnet 50, 224x224

22,431

BERT base, SQUAD

2,830 sentences/sec

BERT base, SQUAD + conditional features

10,150 sentences/sec

BERT base, SQUAD + conditional features + low prec FP

23,345 sentences/sec

Grayskull Software Platform

diagram