Introduction
The fault management is one of the major areas in network management. With the increase in the network capacity due to the purchase of competitors, there is need for TechCorp to develop a more reliable network management scheme to capture the needs of the entire network, in this analysis; a template for fault management will be developed supported by documentation outline with relevant industrial compliance. Two templates will be developed; the first template will explore fault capturing process while the second template will explore communication and response process.
Template: Fault Detection
Introduction
Fault management is a very critical responsibility in network administration which demands urgency in fault detection, communication and resolution. With current service level agreements between network service vendors and their clientele down times are the least encouraged scenario in an organisation. Even in the case of TechCorp, it will be to the detriment of the company’s strategy for the operations of the organisation to be down because of a network management failure. TechCorp operations like other commercial enterprises needs real time and high quality connections for an effective strategic and tactical decision making.
The robust network infrastructure of TechCorp needs proper administration with concentration on the continuous network operations. This means that the network management team must put up a proactive network monitoring system. This will enhance a faster capturing of network errors and faults consequently faster resolution of the same. Different local area networks of the TechCorp will be using the above fault detection template for the effective capture of the network errors which is very essential for the timely correction.
Purpose
This template captures critical information about faults in the network. This basic information is necessary for the provision of a relevant and timely solution. The source of this information will be the network monitoring system which can be a server or even administrators in the network. The end users also can pass the information to the administration when experiencing different problems. This template will collect system information on the failures in particular hardware device, abnormal network operations and traffic and even intrusion detection. This template even gives real time information on network faults and the required action centres. Action centre will include the local network team office or TechCorp headquarters team.
Disclaimer
This template can very informative but still the network management may not meet the industry standards as a result of other factors consequently the network management team should be able to have improvements not only in the fault detection but also regarding fault diagnostics and final resolution.
Currency of the document
This template is a real time database. Immediately the fault is monitored, it is captured. For institutions like TechCorp, textual message on the capture fault is immediately communicated.
Objective
The objective of the template is to enable capture of the timely and relevant information for the network fault management. With the information on the nature and source of fault and proper allocation of resolution responsibility, this template will have passed critical information on the fault and has assigned individual responsibility for resolution (Lennox, 2004).
Notations
Example
For a case of a failure of a router in Texas office, the system will record the following information in the template. The date and fault number will be system generated. For the case of the monitoring system, this can be the network administrators or the network monitoring software. Mostly the fault source will include information on either network software or hardware failures. The system will be particular for the case of the network hardware failure or malfunction. It will indicate the device number and the location. In addition the exact nature of fault is recorded. Finally the responsible office for the resolution will be clearly indicated in the template. This assignment of responsibility will be essential so that network management can be transparent and traceable.
Development Approach
The network monitoring system, administrators and the end users will provide this information either directly or indirectly. For the case of the end users information passed to the support desk will have to be verified that it has not been automatically captured by the system. For cases of abnormal traffic necessary diagnostics measures will have to be carried out by the network team to confirm its causes.
Verification and validation
The information on the fault source and information should be validated so that correct solution is given. It is the duty of every network team member to have necessary skills for verification of different system error logs.
Template: Communication/response
Introduction
Communication of the fault in the system is an imperative step of ensuring that the network problem is solved with minimal waste of time given the implications of a network that is down. This template will therefore explore options available in the communication and response of a wider network.
Purpose
The purpose of this proposed fault management template is to reduce the time taken for the fault to be detected and acted upon. The most fundamental way of ensuring that the network problems are immediately solved is through alarms that ring to indicate the following; server degradation, routers, and link failures among others. The response is supposed to produce an error log of the network and try to rectify the problem or alternatively engage human intervention for the solution of the problem. Once the problem is solved, the alarm is then supposed to communicate back to the staff by notifying them that the problem has been solved via their network communication log or phone contact (Bruno, 2003).
Disclaimer
TechCorp cannot accept the liability which results from the contents of this piece of template for the consequences of the actions taken on the information provided; such actions will only be regarded as binding if presented in writing.
Currency of the document
This document is current but subject to modification with respect to system upgrade and changes made.
Objective
The primary objective of this template is to manage the communication and response time for fault occurrence with compliance of the industry standards and measures that aims at mitigating fault issues or possibly providing adequate response time to the corrective measures that should be taken with regard to the inherent fault problems experienced.
Notations
Example
Development Approach
The development process should follow the identified concepts in the notation model.
- Identify all the flaws in the fault detection system.
- Do proper determination of the probability or rather the chances of occurrence of the risk in order to set in place a system that can address these inherent problems. Statistical tools are important in this point.
- Identification of the impact of the risk is an imperative consideration.
- The status and the communication process if the risk is of essence.
- Proper documentation process on the communication and response to crisis should be enhanced in order to provide unskilled employee of TechCorp the basics of handling network problems.
- The response time should be analysed to enhance the efficiency of the system.
- Measures to mitigate the fault occurrence should be analysed in order to ensure that the process of fault elimination is enhanced.
Verification and validation
- The system that to be put in place should be able to stand the diagnostic tests.
- Fault mitigation process should be enhanced with the involvement of the stakeholders in order to establish a lasting solution on the elimination of risks.
Conclusion
This paper has presented a vivid fault management template which is an area in network management. The template presents two templates which are fault capturing process and communication response process. The two templates clearly comply with the industrial rules and regulations. The objective of this fault management system is to mitigate network faults within a larger network system. The fault capturing process helps in identifying the source of error while the communication process aids in communicating the source of the problem to the machine solution or human intervention. The two template thus co-relate and necessary in the fault mitigation process.
References
Bruno, A., & Kim, J., 2003. CCDA self-study: CCDA exam certification guider. 2nd ed. Texas: Cisco Press.
Lennox, B., 2004. Network management with smart systems. Michigan: McGraw-Hill.