This paper presents the design and implementation of a cloud disk system based on Hadoop, which aims to leverage the powerful data processing capabilities and high scalability of Hadoop's distributed computing framework to address the performance bottlenecks faced by traditional cloud disk systems when dealing with massive data storage and access. The system utilizes Hadoop Distributed File System (HDFS) as the storage backend, combines the MapReduce model to optimize data read-write efficiency, and employs YARN for resource management and task scheduling, thereby achieving efficient and reliable management and sharing of massive data sets. This paper elaborates on the system's architecture design, key technology selection, functional module implementation, as well as performance testing and analysis, validating the advantages of the Hadoop-based cloud disk system in processing large-scale data sets. Keywords: Hadoop, Distributed Storage, HDFS, MapReduce, YARN, Cloud Disk System, Data Mining, High-Performance Computing, Scalability, Cloud Computing
目次