Institute of Aerospace Information, Chinese Academy of Sciences
project context
The National Space Science Center of the Chinese Academy of Sciences is a comprehensive research institution for China’s space science and satellite projects. It serves as a national platform for space science innovation across the country. It is responsible for organizing and conducting research on the national space science development plans, specifically in charge of organizing and implementing the China Academy of Sciences’ space science pioneering projects, conducting innovative scientific and technological research in the fields of space science and related applications, providing scientific and technological support for the space science pioneering projects and future development, leading the advancement of space science, and driving space technology innovation.
project requirement
● Meet the performance requirements of the HPC cluster
The computing layer HPC cluster has nearly 20 computing nodes, which are designed for a large number of applications such as aerospace information analysis, calculation, and preprocessing. It has extremely high performance requirements for the entire HPC cluster. These include: stable bandwidth support, extremely low latency response, and sufficient concurrent capabilities.
● Meet the capacity and scalability requirements
The initial construction plan aims to achieve a PB-level available capacity. As data for collection and computation is generated, the future capacity is unpredictable. Therefore, the storage system should also have strong scalability, with capacity and performance increasing linearly to meet the pressure requirements at all stages.
● Meet the requirements of openness and compatibility
By adopting standard access protocols, access interfaces are provided for the HPC cluster and business platform, facilitating the convenient use of HPC’s calculation results by the front-end business platform in the future. It also facilitates the connection of different calculation model interfaces in the future.
● Has a certain degree of technological superiority
The adopted technical solution should not only conform to the future development direction of the industry, but also possess certain technological leadership, maintaining a leading position among similar systems, which is conducive to enhancing the computing and processing capabilities of the entire system.
solution
topology of networks
Overview of the Plan
The solution employs two sets of XDFS distributed storage clusters, namely: a high-performance SSD storage pool and a capacity-based HDD storage pool. All use erasure coding redundancy to achieve a higher storage utilization rate while meeting performance requirements, and to satisfy capacity and cost control.
● High-performance SSD storage pool
The storage medium adopts enterprise-level SATA SSD solid-state drives, providing a high-performance storage pool capacity of approximately 1PB. It is mounted using the POSIX protocol, which offers better performance compared to NFS. Combined with a 100Gb IB link and RDMA technology, it fully ensures the high-load performance pressure and capacity requirements during HPC computing, accelerating the computing process and enhancing the overall computing efficiency of HPC.
● Capacity-based HDD storage pool
The storage medium uses enterprise-level SATA disks, providing approximately 5PB of elastic storage space. It is easy to expand and both capacity and performance are improved simultaneously. With a 40GE link combined with RDMA technology, it ensures faster data access speed. It is mounted using the NFS protocol, which has wider applicability. The deployment is simple and convenient for business servers to apply the data of HPC computing results.
Advantages of the plan
● The optimized erasure code algorithm simultaneously ensures the space utilization and performance output.
SSD和HDD存储池都采用4+1纠删码,具有更好的空间利用率,同时优化的算法完全满足近20个计算节点规模的HPC访问要求。
● Both the SSD and HDD storage pools employ 4+1 erasure coding, which results in a higher space utilization rate. Moreover, the optimized algorithm fully meets the access requirements of HPC systems with up to 20 computing nodes.
The streamlined system kernel of XDFS, the data distribution method for files, the distributed metadata management strategy, and the xMate acceleration module have significantly enhanced the hardware processing efficiency of storage nodes, better leveraging the performance of SSD hard drives and 100Gb IB networks. Through the Bound technology of tens of millions of network ports, users can obtain non-core performance outputs comparable to those of 100Gb IB networks even in a 10GE network environment.
● 与用户应用场景深度结合,实现数据流转
Deeply integrated with the user’s application scenarios to achieve data flow.
application effect
Different configurations of storage pools correspond to different levels of application systems, meeting the performance requirements of different business units, fully meeting the design expectations. At the same time, it has also improved the return on investment, helping users truly achieve on-demand construction. The connection between the data archiving module and the tape library backup system simplifies the operation of user data management, improves the collaborative efficiency of the user’s overall business system, and more efficiently completes computing tasks.

