每一个Chunk有多个存储副本(默以为三个),分别存储在不同的Chunk Server上,以淘汰ChunkServer挂掉带来的损失。对于每一个Chunk,必须将所有的副本全部写入成功,才视为成功写入。
GFS中的每一个文件被分别成多个Chunk,Chunk的默认巨细是64MB,这比范例的文件系统块巨细要大得多,这是由于Google应用中处理的文件都比力大,以64MB为单元进行分别,是一个较为合理的选择。
每一个Chunk分别为多个Block,Block巨细为64KB,每一个Block对应一个32bit的校验和(checksum,用block里的数据hash一下得到的值)。当读取一个Chunk副本时,Chunk Server会将读取的数据与checksum进行比力(雷同于带有hash加密的账号登录,判定暗码是否正确),如果不匹配,代表数据损坏,就会返回错误,使Client选择其他Chunk Server上的副本(如某个ChunkServer发现自己的Chunk03坏了,会求助Master探求其他副本)。对于一个1T的文件,1T/64KB*32bit=64MB,即1T的文件的checksum只须要占用64MB,因此checksum可以放在内存中。
原论文中对于Chunk Size的优缺点形貌为:
A large chunk size offers several important advantages. First, it reduces clients’ need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. The reduction is especially significant for our work loads because applications mostly read and write large files sequentially. Even for small random reads, the client can comfortably cache all the chunk location information for a multi-TB working set. Second, since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Third, it reduces the size of the metadata stored on the master. This allows us to keep the metadata in memory, which in turn brings other advantages that we will discuss in Section 2.6.1.
On the other hand, a large chunk size, even with lazy space allocation, has its disadvantages. A small file consists of a small number of chunks, perhaps just one. The chunkservers storing those chunks may become hot spots if many clients are accessing the same file. In practice, hot spots have not been a major issue because our applications mostly read large multi-chunk files sequentially.
翻译如下:
较大的块巨细具有几个紧张上风。起首,它淘汰了客户端与主服务器交互的需求,由于对同一数据块的读写仅需向主服务器发送一次初始请求以获取数据块位置信息。对于我们的工作负载而言,这种淘汰尤为明显,由于应用程序主要按次序读写大型文件。即使对于小型随机读取,客户端也可以轻松地缓存多 TB 工作集的所有数据块位置信息。其次,由于在较大的数据块上,客户端更有可能在给定的数据块上执行很多操作,因此它可以通过在较长时间内保持与数据块服务器的恒久 TCP 毗连来淘汰网络开销。第三,它减小了存储在主服务器上的元数据的巨细。这使我们能够将元数据保存在内存中,这反过来又带来了其他上风,我们将在 2.6.1 节中进行讨论。
另一方面,即使采用延迟空间分配,较大的块巨细也有其缺点。一个小文件由少量块组成,可能只有一个。如果很多客户端正在访问同一个文件,存储这些块的块服务器可能会成为热门。在实践中,热门并不是一个主要问题,由于我们的应用程序主要是次序读取大型多块文件。
2.1.3 系统管理技术