NVIDIA’s Jetson Nano – Democratizing and Disrupting Edge Machine Learning

NVIDIA’s Jetson Nano and Jetson Nano Development Kit.

Low cost, yet very powerful, AI optimized compute resources such as NVIDIA’s Jetson Nano brings machine learning to the masses, and also has the potential of replacing the dominant paradigm of centralized, machine learning training and inferencing architectures.

Synopsis

During his keynote session at the recent NVIDIA GTC event in San Jose, California, NVIDIA founder and CEO Jensen Huan introduced the NVIDIA Jetson Nano, a small, powerful, low power edge computing platform for machine learning (ML) inferencing. The Nano is NVIDIA’s latest addition to the Jetson family of embedded computing boards following the release of the Jetson TX1 (2015), the TX2 (2017), and the Jetson AGX Xavier (2018) platforms.

Salients

General Specs – The Jetson Nano is powered by quad-core ARM A57 processor running at 1.43 GHz, supported by a 128-core Maxwell GPU. The platform delivers 472 GFLOPS of compute performance, while using just 5W of power. HEVC video encode and decode is supported up to 4K 60. The Nano also comes with 4 GB of Low-Power DDR SDRAM.

Two Versions – The Jetson Nano comes in two flavors. The first, the “Jetson Nano Development Kit”, includes a carrier board with 40 general-purpose input/outputs (GPIOs), and provides a host of connectors, ports and interfaces for HDMI, gigabit Ethernet, WiFi, USB (4), and MIPI CSI for cameras. Storage comes by way of an SD card. The production-ready (in NVIDIA parlance) “Jetson Nano” ships minus a carrier board, but includes 16 GB of eMMC Flash. The “Jetson Nano Development Kit” can be had for only US $99 and is available now, while the “Jetson Nano”, priced at US $129 in quantities over 1,000, will become available in June 2019.

AI on the Edge – With its small size and numerous connectivity options, the Jetson Nano is ideally suited as an IoT edge device. But like NVIDIA’s TX1 and TX2 platforms, the Jetson Nano is primarily engineered to support AI on the edge – really machine learning / deep learning on the edge. NVIDIA specifically markets the Nano as such. The Nano has taken its place in NVIDIA’s Jetson machine learning solution continuum.

Analysis

Bringing the Edge AI Competition – Some industry pundits have stated that the Jetson Nano is a competitor low cost, single-board, computer targeted to the maker community such as the Raspberry Pi 3. But the real competition is with other high performance, edge machine learning inference enablers, such as Google’s Coral development board and Intel’s Up Squared AI Vision X Developer Ki. The Intel Up Squared development platform is based on the company’s Atom X7 processor and Movidius X accelerator, includes 8 GB LPDDR4 RAM and is priced at $420. The Google Coral board is built on an ARM Cortex-A53 processor and Google Edge TPU. It ships with 1 GB of RAM and cost $149.

Seeding the Market – NVIDIA has a long history of developing and supporting a developer community for its products. The company repeatedly cites a figure of “more than 200,000” Jetson developers worldwide. The Nano is the prefect platform for enlarging the Jetson developer community. The system’s performance attributes approximate that of the higher-end Jetson models making it appropriate for commercial work, while the low cost of the Jetson Nano makes it attractive for researchers, educators, the maker community and other technology enthusiasts seeking an entry-level AI platform.

An ‘Open’ Platform Play – The Nano is the latest entry into the NVIDIA’s Jetson portfolio of embedded computing platforms, joining the Jetson TX2 and Jetson Xavier. While each platform is optimized for certain application classes – low cost, edge ML for the Nano, vision processing for the Jetson TX2, and robotics and autonomous systems for the Xavier – all Jetson family members are enabled and supported by the NVIDIA’s programming model and solution stack. For example, all Jetson developers can utilize NVIDIA’s CUDA-X GPU acceleration libraries for data science and machine learning. The same holds for NVIDIA’s JetPack (including CUDA, cuDNN, and TensorRT) and DeepStream (soon) SDKs. The Jetson family is also agnostic regarding machine learning frameworks, supporting the most widely used frameworks such as TensorFlow, PyTorch, Caffe, and MXNet, their ‘lite’ equivalents, well as less common libraries and tools.

Sun Sets on Jetson TX1 – The Jetson Nano is a pivot on the Jetson TX1. Apart from the module size and GPU – a 128-core Maxwell in the case of the Jetson Nano, and 256-core Maxwell for the TX1 – the platforms are remarkably similar. NVIDIA has confirmed as much and has indicated that developers should opt for the Jetson Nano or one of the three Jetson TX2 variants for future work (the TX1 will still be supported for the foreseeable future, however). The Jetson TX2 family includes the original Jetson TX2, the Jetson TX2i (industrial temperatures), and the 4GB Jetson TX2.

Disrupting Cloud Machine Learning – Market research company IoT Analytics pegs the number of IoT edge devices at 7 million, a figure that excludes smartphones, tablets and laptops, and this figure is expected to rise to 22 billion by 2025. Computing power within these edge devices is also increasing dramatically. At the same time, connectivity, latency and security issues are driving the exploding interest, work and investment in machine learning on the edge (Edge ML). Machine learning is expanding from the datacenter to the edge, and Edge ML will become the dominant ML architecture going forward. But even more importantly, that dominant role will include both inferencing AND training.

Currently, the primary edge machine learning training and execution paradigm is highly centralized, with ML modeling, training and optimization taking place in the datacenter on banks of servers employing arrays of graphics processing units (GPUs) or other AI optimized processors. The resulting machine learning applications are distributed to cell phones, tablets and other mobile devices which either must have access software that runs off-device on distributed servers in the cloud, or execute locally, minimizing or eliminating the need to send data to distributed servers for further processing. In either case, training takes place in the datacenter.

With the advent of small size, low cost, yet very powerful compute resources such as NVIDIA’s Jetson Nano, this centralized model can give way to a decentralized approach where the training of machine learning models, too, can take place at the edge using techniques such as Google’s Federated Learning architecture. With Federated Learning (and other similar approaches), models are still downloaded to edge devices for inferencing, but the local model ‘learns’ with experience and then sends updates to the cloud, along with those of other, distributed, devices. In this way, training is enhanced (scores of decentralized, collaborating devices, consistent updates) and users’ data privacy is ensured.