Gateworks and NXP Semiconductors Unveil M.2 AI Acceleration Card

Updated Date: March 11, 2026

322

Gateworks and NXP Launch USA-Made M.2 AI Acceleration Card

The GW16168 AI acceleration card is being announced by Embedded World, Gateworks and NXP. The process of developing and implementing your own AI is an expensive undertaking, be it in terms of time, manpower or even finances. It is common to have to redesign your whole stack of hardware, including single-board computers (SBCs) and even a custom cooling system to incorporate newer AI acceleration hardware. It is normally an expensive and complicated undertaking, which is usually combined with the present market alternatives. A process that is aggravated by the recurrent need to change or upgrade hardware. This is the gap in the market that Gateworks have identified and has launched the GW16168 to eliminate some of the challenges that engineers and businesses consider when they make a choice to run in-house AI.

The GW16168 has support via the more developed Ara240 SDK, including a complete compiler toolchain and support for TensorFlow, PyTorch and ONNX, as well as model-conversion utilities that ease the process of migrating existing AI models into edge deployment. He continues.

Thermal management is one of the largest problems in the deployment of AI. Computerised systems based on AI may require substantial energy to operate, and the power consumption may be high when executing complex tensors. This means that in many cases thermals become the limiting factor, particularly in.

It is the end of the days when you are forced to select your whole compute platform depending on the AI chip. A strong feeling of the team at Gateworks. The engineering decisions of the engineers at Gateworks have been made to separate their M.2 card with no particular hardware or environmental restrictions; with the power profile, the M.2 2280 M-Key format and passively cooled Aar240 DNPU.

The GW16168 by Gateworks is the perfect example of why decoupled AI architectures will become the future of edge computing. With NXP Ara240 DNPU and the industrial grade design, the customers will be able to scale the AI performance without redesigning their entire hardware platform. This adds both flexibility, long duration and cost efficiency to real world AI deployments. Neural Processing Units, NXP Semiconductors, Vice President/General Manager, Ravi Annavajjhala.

For industrial designs that are space-constrained, advanced cooling techniques can soon prove expensive and impractical. To permit a typical power consumption of 6.6 W, Gateworks has implemented its M.2 card in one with a passively cooled Aar240 DNPU combined with carefully engineered power circuitry. This reduced power envelope minimises heat accumulation, allowing it to be reliably used in sealed, fanless systems and retaining thermal properties comparable to AI hardware of industrial quality. There is also a decade-long life cycle of the GW16168 modules reported by Gateworks, and wear of the modules was minimised by their sophisticated thermal management.

In most cases, hardware support of high-performance AI has not been an option. You had to make a decision between used GPUs which necessitated a complete redesign of the entire system or actual running inference on embedded CPUs and NPUs at the expense of extreme thermal constraints and gross latency.

The Performance. The GW16168 is meant to increase the overall capability and not just rebalance it. The module provides up to 40 eTOPS, which can be described as reasonably close to the same performance as a GPU, but with a significantly smaller power consumption. In place of the conventional constraints of existing or older generation edge accelerators, the design is built around sustained throughput, coupled with ruggedized power delivery that remains stable even when peak inference loads of up to 40 TOPS are sustained.

The GW16168 and development kit will be sold by DigiKey, Braemac, RoundSolutions and Avnet. Shipping late May.