Technical white paper

Reference configuration forHPE AI Data Node

Optimizing storage for AI workloads

Artificial intelligence (AI) learning is moving from research into mainstream business use. In the Gartner 2019 CIO survey, 37% of respondents reported that their enterprises either had deployed AI or would do so shortly. Common examples of AI use are facial recognition, real-time translation from images, and voice recognition in cell phones. Many AI/machine learning workloads require storage solutions, which have been optimized both for working on very large data sets and for very high IOPS and/or throughput and low-latency performance. The expectation is that AI compute will come to resemble high-performance computing (HPC) in that not only will servers scale up, that is, adding more GPUs per server, but also scale out, that is, using a distributed clustered server environment. This will require the use of shared storage file systems to avoid storage bottlenecks.
Flash storage technology may be utilized to provide the necessary throughput performance but can be quite costly for capacity storage. As companies go into production with AI, data sets will grow to tens and even hundreds of petabytes, and will exceed the capacity of traditional storage appliances. To achieve scalability and performance while simultaneously controlling costs, storage system designers build separate tiers of storage for hot and cold data, utilizing archival object storage for the colder data. This dramatically lowers the total cost of ownership.
Hewlett Packard Enterprise, in partnership with WekaIO and Scality, provides storage solutions tailored to HPC and AI workloads using software-defined storage applications deployed on HPE ProLiant and HPE Apollo servers. With these solutions, customers can havehigh-performance, petabyte-scale storage solutions with integrated data lifecycle management, providing tiering managementby file system and a single namespace. This solution can be implemented in a classic two-tier architecture, with one tier dedicated tohigh-performance flash while a second tier provides scalable object storage, typically as two separate clusters of storage servers. A second, hybrid approach combines both tier elements into a scalable cluster, utilizing storage servers, which are optimized for both NVMe flash capacity and scale-out bulk data storage. This is the concept behind the HPE AI Data Node, based on the HPE Apollo 4200 Gen10 storage server. HPE AI Data Node offers a building block for production AI that can scale in performance and capacity.