
According to a recent announcement, MapR Technologies, provider of a data platform for Artificial Intelligence (AI) and Analytics, will now support data access and production deployments for data science through the NVIDIA RAPIDS open-source software.
MapR helps data scientists accelerate the access of required training data by focusing on easing the issues of on-boarding, cleansing, cataloging, and feeding data at high performance to GPUs and NVIDIA DGX systems. The MapR solution also manages the deployment and management of multiple models into production to speed business impact.
“The challenge for most data scientists is the data logistics to locate, prep and access the right data for training. In many cases, 90 percent of the time is spent data wrangling,” said Anil Gadre, EVP and chief product officer, MapR Technologies. “MapR complements RAPIDS with a data management and logistics fabric to accelerate the high-scale processing and access of disparate data across geographies. The same fabric also speeds the deployment of models into production and coordinates the continuous deployment and updating of multiple models to impact business in real-time at scale.”
Central to the solution is the ability to coordinate data flows from across the enterprise and, through a pre-built MapR container for GPUs, make it easy to integrate into NVIDIA’s complete end-to-end data science training pipelines. The MapR Data Platform for RAPIDS is designed to enable data scientists to:
- Collect data at scale from a variety of sources and preserve raw data so that potentially valuable features are not lost
- Make input and output data available to many independent applications even across geographically distant locations, on premises, in the cloud or at the edge
- Manage multiple models during development and easily roll into production
- Improve evaluation methods for comparing models during development and production, including use of a reference model for baseline successful performance
- Support rapid stream-based delivery of standard files including Parquet, ORC, JSON, AVRO, and CSV file formats directly into RAPIDS
“MapR’s work with NVIDIA in the RAPIDS ecosystem is helping make broad adoption in the enterprise easy for the largest breadth of workloads,” said Clément Farabet, VP, AI infrastructure, NVIDIA. “MapR’s ability to span on-prem and cloud, from IoT edge to core with a scalable, high-performance common platform means that more data can be fed to GPUs and more innovative applications can be created by data scientists faster.”
Ken Briodagh is a writer and editor with more than a decade of experience under his belt. He is in love with technology and if he had his druthers would beta test everything from shoe phones to flying cars.