Hadoop Distributed File System as a System for Handling Big Data

Hadoop Distributed File System (HDFS) is a file system that aims to store large files with the ability to stream access to data, working on clusters from arbitrary hardware. Files in HDFS are divided into blocks if the file size exceeds the block size. By default, the block size in HDFS can be 64, 128, or 256 MB, and each block is replicated in triplicate. Such an architecture provides HDFS with the opportunity to analyze and handle substantial amounts of data – structured, semi-structured, and unstructured.

An HDFS cluster consists of a management node (NameNode) and data storage nodes (DataNodes). NameNode is a separate server with program code for managing the file system namespace that stores the tree of files, as well as meta-data of files and directories (Kaur, Bagga, & Mann, 2017).

DataNode is one of many cluster servers with program code responsible for file operations and working with data blocks. DataNode is a required component of the HDFS cluster that is responsible for writing and reading data, executing commands from the NameNode node to create, delete, and replicate blocks (Kadam, Deshmukh, & Dhainje, 2015). It also periodically sends status messages (heartbeats) and processes requests for reading and writing from clients of the HDFS file system.

When reading a file from HDFS, the client receives the address of the location of the blocks from the control node and independently performs a sequential reading from the nodes of the file blocks. In this case, the nearest node is selected for each block. Moreover, the client extracts data from data nodes directly; hence, HDFS is available for many competitive clients, as traffic is distributed between data nodes.

References

Kadam, A. M., Deshmukh, P. K., & Dhainje, P. B. (2015). A review on distributed file system in Hadoop. International Journal of Engineering Research & Technology (IJERT), 4(5), 14–18.

Kaur, G., Bagga, S., & Mann, K. S. (2017). Hadoop approach to cluster based cache oblivious Peano Curves. In 2017 IEEE 7th International Advance Computing Conference (IACC) (pp. 115–120). Hyderabad, India: Institute of Electrical and Electronics Engineers.

Cite this paper

Select style

Reference

StudyCorgi. (2021, July 22). Hadoop Distributed File System as a System for Handling Big Data. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/

Work Cited

"Hadoop Distributed File System as a System for Handling Big Data." StudyCorgi, 22 July 2021, studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.

* Hyperlink the URL after pasting it to your document

References

StudyCorgi. (2021) 'Hadoop Distributed File System as a System for Handling Big Data'. 22 July.

1. StudyCorgi. "Hadoop Distributed File System as a System for Handling Big Data." July 22, 2021. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.


Bibliography


StudyCorgi. "Hadoop Distributed File System as a System for Handling Big Data." July 22, 2021. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.

References

StudyCorgi. 2021. "Hadoop Distributed File System as a System for Handling Big Data." July 22, 2021. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.

This paper, “Hadoop Distributed File System as a System for Handling Big Data”, was written and voluntary submitted to our free essay database by a straight-A student. Please ensure you properly reference the paper if you're using it to write your assignment.

Before publication, the StudyCorgi editorial team proofread and checked the paper to make sure it meets the highest standards in terms of grammar, punctuation, style, fact accuracy, copyright issues, and inclusive language. Last updated: .

If you are the author of this paper and no longer wish to have it published on StudyCorgi, request the removal. Please use the “Donate your paper” form to submit an essay.