The Chimera SDK

Develop, Simulate, Profile, and Deploy. The power of a code-driven processor with the convenience of no-code graph compilation.

Inference FrameworkPyTorch / TensorFlow

Graph CompilerOptimization

C++ CompilerLLVM

BinaryExecutable

SoCDeployment

Inference Framework

Graph Compiler

C++ Compiler

Binary

SoC

Try our SDK

Development Workflow

Complete Development Environment

From AI models to deployment-ready code in a single unified workflow

The Quadric Chimera Software Development Toolkit (SDK) is a comprehensive environment for the development of complex application code targeting the Chimera processor. The Quadric SDK enables the mixing and matching of any data parallel algorithm whether it is expressed as machine learning / AI graph or as traditional C++ code.

Preparing an AI model to run on a Chimera core is simple and tool-driven

No special network preparation
No substitution of operators
No removal of layers
No reliance on Quadric to prepare models

Quadric users can download a complete Docker image of the Chimera SDK for deployment on private clouds or on-premise systems. Once installed on your compute resources, you can compile ONNX graphs into C++ using the Chimera Graph Compiler, compile your own proprietary C++ code, combine them and profile on the cycle accurate ISS model of the Chimera GPNPU.

SDK Workflow Overview

Try DevStudio

Graph Compilation

Quadric Chimera Graph Compiler

Powerful graph conversion and code optimization for AI inference models

The Quadric Chimera Graph Compiler (CGC) is a powerful graph conversion and code optimization tool that inputs a machine learning inference model created from the leading AI training frameworks, performs numerous optimizations, and outputs an optimized C++ code representation of the AI graph utilizing the Chimera Compute Library (CCL) for later compilation by the Chimera LLVM C++ compiler.

Graph Import

CGC accepts input graphs in ONNX format for networks developed in PyTorch, TensorFlow or other frameworks. A number of optimizations are performed as part of the graph import phase:

Graph simplification / canonicalization
Constant propagation - removing operators with purely constant arguments, if possible
Operator legalization / conversion – converting to GPNPU-specific forms of AI operators

The shape and structure of the network graph is optimized to simplify operators where possible. Compatibility checks are performed to determine if all operators are supported. When an unsupported or custom operator is present in the source graph, the user has the option to partition the graph around the custom operator, write a C++ kernel for the unsupported operator, and reintegrate the new custom operator C++ kernel with code auto-generated by CGC.

Graph and Memory Optimization

CGC creates a full intermediate representation of the input graph and performs multiple passes of optimization with the twin aims of optimizing performance and memory bandwidth utilization.

Memory optimization techniques include:

Tensor format layout analysis (both L2 and LRM)
Fusion in Local Memory (FILM) - merging of operations to preserve data within LRM and avoid costly intermediate activation writeback.
Array-level memory adapters & tensor shape transformations
Bank conflict minimization, defragmentation
Predictive weight prefetching

OPERATOR FUSION

Automatic Graph Optimization

The compiler automatically identifies and merges compatible operators. Fused operations keep data in local PE registers, eliminating L2 memory round-trips and dramatically reducing power consumption.

OPTIMIZATION PASSES

Multi-Stage Transformation

Each pass analyzes and transforms the graph for the Chimera architecture.

Compilation & Simulation

Chimera LLVM C++ Compiler & Instruction Set Simulator

Industry-standard LLVM infrastructure with cycle-accurate simulation and profiling

The Chimera LLVM C++ compiler utilizes the industry state of the art LLVM compiler infrastructure with a Quadric-specific code generation back end that emits assembly code specific to the Chimera instruction set.

The Chimera Instruction Set Simulator (ISS) is a pipeline-accurate executable model of the Chimera processor core. The ISS can be utilized both in a standalone mode to profile and tune application code in isolation, or as a callable System C transaction-level model bundled into a more comprehensive virtual prototype of an SoC where more accurate memory model behavior can be used to fine-tune your Chimera code more precisely.

Comprehensive Profiling

The Chimera ISS provides extensive profiling of cycle counts, data bandwidth usage, and power profiles of each unique application workload.

Try DevStudio