AI’s New Onramp: Meet the Data Science PC
The trip to AI and big-data analytics is now just a click away. Starting today, three NVIDIA partners are selling online a new class of computers we call data science PCs.
The systems bundle the hardware and software data scientists need to hit an “on” button and start managing datasets and models to make AI predictions. Data science PCs tap NVIDIA TITAN RTX GPUs and RAPIDS software to deliver 3-6x speed-ups compared to CPU-only desktops.
Three experts in building high-end PCs — Digital Storm, Maingear and Puget Systems — are offering the products now. They’re targeting an expanding class of independent data scientists to help them achieve better results faster.
Some of the world’s largest and most innovative organizations are already using GPU-accelerated servers and workstations to tackle their demanding data-science jobs.
For example, Walmart’s supermarket of the future that can compute in real time more than 1.6 terabytes of data generated per second using NVIDIA’s EGX platform. The Summit system at Oak Ridge National Laboratory can tap its 27,648 NVIDIA V100 Tensor Core GPUs to drive 3.3 exaflops of mixed-precision horsepower on AI tasks.
But data science isn’t just for large enterprises. Startups, researchers, students and enthusiasts are jumping into this burgeoning field. They’re contributing to the corporate momentum making the role of data scientist one of the fastest growing jobs in the U.S.
The data science PC aims to fuel this growing class of independent data science practitioners. The combination of powerful, pre-configured systems and a tested software stack can jumpstart their work.
The Speeds and Feeds
Under the hood, a data science PC includes one or two TITAN RTX GPUs, each with up to 24GB of memory. NVLink high-speed interconnect technology connects the two GPUs to tackle datasets that demand more GPU memory.
The systems can accommodate 48-128GB of main memory and storage options include drives that range up to 10TB.
Each data science PC will ship with Linux and RAPIDS, NVIDIA’s data science software stack, powered by its popular CUDA-X AI programming libraries.
NVIDIA RAPIDS eases the job of porting existing code for GPU acceleration. Its APIs are modeled after popular libraries used in data science. In many cases, it’s only necessary to change a few lines of code in order to tap the potential of GPU acceleration.
Here are some of the key elements of RAPIDS:
- cuDF is a Python GPU data-frame library for loading, joining, aggregating, filtering and otherwise manipulating data. The API is designed to be similar to Pandas, so existing code easily maps to the GPU.
- cuML accelerates popular machine learning algorithms, including XGBoost, PCA, K-means, k-Nearest Neighbors and more. It is closely aligned with sciKit-learn.
- cuGraph is a library of graph algorithms, similar to NetworkX, that works with data stored in a GPU data frame.
An ecosystem of startups in Inception, NVIDIA virtual accelerator program for startups focused on AI and data science, provides applications and services that run on top of RAPIDS. They include companies, such as Graphistry and OmniSci, that offer big-data visualization tools.
Data scientists can also use NVIDIA’s data science developer forum to ask questions and learn more about data science on GPUs.