Hadoop Distributed File System as a System for Handling Big Data

Hadoop Distributed File System (HDFS) is a file system that aims to store large files with the ability to stream access to data, working on clusters from arbitrary hardware. Files in HDFS are divided into blocks if the file size exceeds the block size. By default, the block size in HDFS can be 64, 128, or 256 MB, and each block is replicated in triplicate. Such an architecture provides HDFS with the opportunity to analyze and handle substantial amounts of data – structured, semi-structured, and unstructured.

We will write a
custom essay
specifically for you

for only $16.05 $11/page
308 certified writers online
Learn More

An HDFS cluster consists of a management node (NameNode) and data storage nodes (DataNodes). NameNode is a separate server with program code for managing the file system namespace that stores the tree of files, as well as meta-data of files and directories (Kaur, Bagga, & Mann, 2017).

DataNode is one of many cluster servers with program code responsible for file operations and working with data blocks. DataNode is a required component of the HDFS cluster that is responsible for writing and reading data, executing commands from the NameNode node to create, delete, and replicate blocks (Kadam, Deshmukh, & Dhainje, 2015). It also periodically sends status messages (heartbeats) and processes requests for reading and writing from clients of the HDFS file system.

When reading a file from HDFS, the client receives the address of the location of the blocks from the control node and independently performs a sequential reading from the nodes of the file blocks. In this case, the nearest node is selected for each block. Moreover, the client extracts data from data nodes directly; hence, HDFS is available for many competitive clients, as traffic is distributed between data nodes.

References

Kadam, A. M., Deshmukh, P. K., & Dhainje, P. B. (2015). A review on distributed file system in Hadoop. International Journal of Engineering Research & Technology (IJERT), 4(5), 14–18.

Kaur, G., Bagga, S., & Mann, K. S. (2017). Hadoop approach to cluster based cache oblivious Peano Curves. In 2017 IEEE 7th International Advance Computing Conference (IACC) (pp. 115–120). Hyderabad, India: Institute of Electrical and Electronics Engineers.

Print Сite this

Cite this paper

Select style

Reference

StudyCorgi. (2021, July 22). Hadoop Distributed File System as a System for Handling Big Data. Retrieved from https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/

Work Cited

"Hadoop Distributed File System as a System for Handling Big Data." StudyCorgi, 22 July 2021, studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.

1. StudyCorgi. "Hadoop Distributed File System as a System for Handling Big Data." July 22, 2021. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.


Bibliography


StudyCorgi. "Hadoop Distributed File System as a System for Handling Big Data." July 22, 2021. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.

References

StudyCorgi. 2021. "Hadoop Distributed File System as a System for Handling Big Data." July 22, 2021. https://studycorgi.com/hadoop-distributed-file-system-as-a-system-for-handling-big-data/.

References

StudyCorgi. (2021) 'Hadoop Distributed File System as a System for Handling Big Data'. 22 July.

Copy to clipboard

This paper was written and submitted to our database by a student to assist your with your own studies. You are free to use it to write your own assignment, however you must reference it properly.

If you are the original creator of this paper and no longer wish to have it published on StudyCorgi, request the removal.

Psst... Stuck with your
assignment? 😱
Susan
Online
Psst... Stuck with your assignment? 😱
Do you need an essay to be done?
Yes
What type of assignment 📝 do you need?
Yes
How many pages (words) do you need? Let's see if we can help you!
Yes