Abstract
The issues related to the security of information in data warehouses and the risks of theft or leakage pose a severe threat to organizations’ intellectual property and budget assets. Failure to address the associated risks can lead to significant reputational and financial losses, which explains the need to implement robust digital data security principles in warehouses. The purpose of this work is to identify privacy issues and how to prevent them through the implementation of effective protective practices. A qualitative design based supplemented by a phenomenological approach has allowed gaining valuable experience of the employees involved in control over the security of data warehouses. A literature review has helped obtain relevant findings on this topic. Specific solutions have been presented at three levels – physical, technical, and administrative. Among the most effective principles for dealing with privacy issues, hiring qualified personnel, controlling authentication, and implementing adequate governance and monitoring policies are proposed.
Introduction
Data warehouse security practices mean protecting infrastructure, including local and external data centers and clouds, and stored information. The spectrum of potential problems is wide, ranging from accidental or deliberate damage to information and ending with unauthorized access. Addressing these issues is critical because most data leaks ultimately boil down to data warehouse security issues. A well-designed system provides for compliance with various regulations and frameworks designed to secure data warehouses. Increasingly, security companies are developing solutions to help meet these regulations. A reliable data warehouse security system minimizes the risk of theft, unauthorized disclosure, forgery, and accidental or deliberate damage to private information.
This research work is aimed at highlighting the real privacy issues associated with data warehousing and determining possible tools and algorithms to prevent these problems. A literature review will be conducted, and the results will be compiled by comparing the most common problems and how to solve them. In addition, qualitative research will be carried out with the involvement of qualified specialists serving data warehouses in large companies. A list of levels and solutions will be drawn up to address potential privacy issues. Through surveys as the main data collection tools, information from the involved participants will be processed, and the overall outcomes will be obtained. Effective information management in data warehouses is the key to securing assets and preventing cyber threats that can harm the intellectual property of organizations.
Background of the Problem
In the context of a massive transition to cloud services and other advanced methods of data storage, many companies have managed to optimize their algorithms of control over information and create convenient platforms for accessing it. Data warehousing is an approach utilized in many spheres, and in Figure 1, some areas are provided, including the complexity levels (Shahid et al., 2016). One can note that the business sector uses the advantages of this technology actively, and data warehouses are convenient information storage spaces that provide opportunities to manage specific data conveniently. However, according to Binjubeir et al. (2019), numerous operations over intellectual property are associated with the risks of losing valuable data caused not only by careless handling of information but also by external threats. The disadvantages of unstable protection, in turn, affect the credibility of companies in the market and can lead to great expenses associated with the loss of important information. Therefore, the assessment of hazards and methods of overcoming them is of great importance for modern organizations. A literature review can help identify the existing issues associated with privacy and highlight mechanisms to avoid these problems.
Literature Review
Before implementing a specific data warehouse security system, this is crucial to understand the types of threats that a company may face when establishing the infrastructure of these systems. In general, in academic literature, all these factors are usually divided into three categories – external, internal, and unforeseen. According to Geetu and Vats (2016), organizations and cloud service providers are responsible for ensuring the safety of their clients’ information. However, companies themselves are at risk, and any threat is fraught with the leakage of valuable information that can subsequently be used to the detriment of the organization by competitors or other interested parties. Therefore, many studies are devoted to researching the top threats to data warehousing with a focus on privacy issues and their causes.
External Threats
The list of external threats includes cyber fraudsters who conduct hacker activities and organize criminal groups on the world wide web. As Parikh et al. (2019) note, organizations often face individual cybercriminals who steal valuable information and sell it to interested parties. Another external threat is the terrorist activities of extremists who, using less sophisticated tools and applying the active methods of pressure, steal private data (Shahid et al., 2016). This threat poses no less danger since organizations are vulnerable to direct aggression from extremists. Industrial espionage competitors are capable of stealing data if they are interested in obtaining private information. This type of threat is common and represents a form of active entrepreneurial struggle. Finally, the intelligence services of individual countries can conduct targeted activities involving the steps aimed at stealing valuable information. This threat is severe and fraught with foreign policy disagreements and struggles on the world stage, which, in turn, require vulnerable companies to create a highly effective defense mechanism. All these types of external threats are real and explain the need to develop adequate solutions on the part of organizations and companies that own private data.
Internal Threats
With regard to internal threats, researchers agree that ineffective work with personnel is a consequence of data leakage from corporate networks. Singh and Gurm (2016) argue that one of these dangers is employees’ insider activity. Employees, guided by criminal thoughts, collude with competitors and sell valuable information for money, thereby creating problems for their organizations. This form of fraud is hard to detect since many workers have access to private data, and this is challenging for managers and responsible employees to identify those who sell information to third parties. Another internal source of data leakage is poorly trained staff using corporate databases without proper preparation. As Alvi (2018) remarks, employees who do not understand the importance of securing information can exhibit errors in the work with passwords and unknowingly provide attackers with an opportunity to steal valuable data. Poor password management, negotiating access keys with third parties, and other gaps in team training can be the causes of inadvertent information leakage and lead to severe outcomes. Thus, targeting employees from the perspective of communicating the importance of compliance with data protection mechanisms is an important aspect of data warehouses security.
Unforeseen Threats
Unforeseen threats are risks that are not related to external or internal activity and concern situations caused by interruptions in the operation of networks due to concomitant circumstances. One of the possible reasons is power outages, which can lead to a security failure and, therefore, open vulnerabilities in data warehouses (Parikh et al., 2019). This threat is dangerous because such interruptions cannot always be identified timely, and even if appropriate measures are taken, information can be lost. The storage of digital data depends on the technical characteristics of networks, which, in case of a breakdown or failure, may cause problems. Another unforeseen threat is a natural force majeure caused, for instance, by a flood, fire, or another element that is beyond human control (Liang et al., 2021). In this case, there is a risk of physical damage to the hardware. Data warehouses, like any digital bank, operate due to a secure system that is free from physical threats. In case of force majeure entailing the aforementioned failures, the information may be lost irretrievably. Therefore, protection against such threats should also be part of adequate data warehouses maintenance and control.
Data Warehouse Security Principles
At the highest level, the security of a data warehouse should comply with specific principles governing its resilience. In case one or all of the conditions are not met, threats to privacy arise, and related dangers appear. According to Binjubeir et al. (2019), the initial three main concepts of information security are confidentiality, integrity, and availability” (p. 20070). Each of these aspects forms a general algorithm of work and determines the convenience of operating with information. In Figure 2, the architecture of a common data warehouse is demonstrated (“Data warehouse architecture,” 2021). This scheme shows how the process of information transmission takes place.
Confidentiality implies creating a system in which unauthorized users cannot access a warehouse and download or modify data. As Binjubeir et al. (2019) state, unauthorized access to information should be prevented both over the network and locally. This is a key principle to prevent information leakage and avoid data theft by third parties. The factor of integrity presupposes the establishment of a system that is not subject to modifications and changes without the knowledge of its administrators. In other words, the data in the storage are to be protected from tampering and unauthorized changes. Otherwise, hackers or inexperienced users can disrupt the algorithm by restructuring security options and introducing new information that is not recognized by the security system. Finally, the principle of accessibility implies the ability to modify the structure of a data warehouse by responsible parties. Following this rule helps minimize the risk of storage failure or inaccessibility, both deliberately, for instance, through a DDoS attack, and accidentally, for example, during a spontaneous action, power outage, or mechanical failure. The aforementioned security aspects are mandatory conditions to comply with to exclude threats to privacy in data warehouses.
Research Method
To determine how to protect data warehouses and identify the security levels that need to be taken into account, qualitative research is a method of evaluating information from qualified employees. Professional administrators of such systems will be engaged as the target audience, and survey results compiled from their responses can help identify the key security principles. This means that the study has a phenomenological framework because the participants’ experience is the key assessment tool to take into account and analyze. The outcomes obtained are valuable in the context of productive work to eliminate potential privacy issues associated with data warehousing and establish a stable mechanism for storing digital information.
Sample
The target sample includes 20 specialists serving data warehouses of large organizations. The parameters of age, gender, and other demographic characteristics are ignored because the main focus is on the experience and professionalism of the participants. Accordingly, the sampling criteria are selective and affect the working practices of the members invited from different companies. The participants will be involved online for safety and social distancing purposes. All ethical conventions will be respected, and the study members will be made aware of the research goals and how their responses will be processed. Their distinctive experience can be a valuable factor in obtaining an accurate and unbiased assessment of the issue under consideration.
Data Collection
Surveys are the key tool for collecting data from the target audience. This mechanism is appropriate for the type of research claimed and involves obtaining information based on the research members’ experience. The surveys will be conducted online and will include questions regarding privacy issues associated with data warehousing and potentially effective ways to avoid them. This principle of data collection allows obtaining information from each participant individually, which increases the credibility of the study.
Instruments and Measurement
For conducting the study, a table will be compiled, including the participants’ responses. A special online program will contain the questions and record the answers as the survey is conducted remotely. By evaluating the response of each of the survey members, the most frequent opinions will be displayed in the table. For the convenience of the participants, special criteria will be proposed. The aim of the study is to clarify privacy issues associated with data warehousing. Therefore, the optimal solutions provided by specialists will be divided into groups. Specific safeguards for data warehouses will be categorized into three levels – physical, technical, and administrative. The most frequent solutions cited by the participants will be included in the results of the study after comparing all responses during one week.
Discussion
The principle of evaluating the methods of dealing with privacy issues associated with data warehousing by dividing potential solutions into categories allows optimizing the research results. In Table 1, the results of the surveys are summarized based on the participants’ responses. Specific solutions are presented in terms of their frequency of mention, ranging from the most popular answers to the rarest opinions. Each of the proposed interventions is relevant to any data warehouse and may be applied by IT specialists and managers of different organizations.
Table 1. Methods of combating privacy issues associated with data warehousing.
Based on the responses of the participants involved, one can point out that the correct maintenance of data warehouses is more important than the technical characteristics of these systems when privacy issues arise. The principles of reporting and monitoring occupy the last lines of the proposed lists. Hiring qualified personnel, effectively managing user authentication, and evaluating control practices are the key procedures to implement in organizations. The study outcomes confirm that the adequate maintenance of data warehouses involves creating an integrated infrastructure in which all nodes are protected at different levels, and appropriate security standards are applied. The results of the work done are valuable in view of the massive transition to cloud storage systems as the main algorithms of digital information management. Subsequent research on the topic of data warehouses privacy may include narrower issues regarding individual components of these systems.
Limitations
Despite the relevance of the research and its specific results, this study has some limitations. Firstly, the representatives of large companies have been involved as a sample, which narrows the assessment of the results and does not allow asserting the above safety principles in all organizations without exception. Future research may include a broader list of participants from different types and sizes of companies. Secondly, in this study, the levels of protection are predefined, and all information is grouped into ready-made categories. A wider range of areas involved in addressing privacy issues can increase analytical credibility and attract more procedures to be implemented. However, the stated goals have been achieved, and the results of the research confirm the importance of discussing security aspects of data warehouses.
Conclusion
Privacy issues associated with data warehousing are a critical reason to examine potential threats and how to avoid them in modern digital storage systems. The relevance of the problem is due to the growing interest in this form of data management and cases of theft and leakage of information. The conducted research has made it possible to identify the most frequent risks and highlight the range of potentially important solutions at the appropriate levels – physical, technical, and administrative. Hiring qualified employees, adequate algorithms for controlling user authentication, and relevant practices for assessing security standards are the components of secure data warehouses management. Despite some limitations, the research is relevant and offers valuable insight into the issues of digital data privacy.
References
Alvi, I. A. (2018). Best practices for ensuring impenetrable data warehouse security. Data Warehousing Information Center.
Binjubeir, M., Ahmed, A. A., Ismail, M. A. B., Sadiq, A. S., & Khan, M. K. (2019). Comprehensive survey on big data privacy protection. IEEE Access, 8, 20067-20079.
Data warehouse architecture. (2021). GeeksforGeeks.
Geetu, G., & Vats, S. (2016). A survey on issues of security in cloud computing. International Journal of Advanced Research in Computer Science, 7(6), 204-207. Web.
Hayes, J. (2020). The security challenges of data warehousing in the cloud. Cloudera.
Liang, Y. (2021). Study on overseas warehouse location of manufacturing cross-border e-commerce enterprises based on multi-objective. Converter Magazine, 2021(3), 172-184.
Parikh, S., Dave, D., Patel, R., & Doshi, N. (2019). Security and privacy issues in cloud, fog and edge computing. Procedia Computer Science, 160, 734-739.
Shahid, M. B., Sheikh, U., Raza, B., Shah, M. A., Kamran, A., Anjum, A., & Javaid, Q. (2016). Application of data warehouse in real life: State-of-the-art survey from user preferences’ perspective. International Journal of Advanced Computer Science and Applications, 7(4), 415-426.
Singh, P., & Gurm, J. S. (2016). Detecting insider attacks sequences in cloud using freshness factors rules. International Journal of Advances in Engineering & Technology, 9(3), 278-288.
Zhang, H., & Zhu, Y. (2019). A method of sanitizing privacy-sensitive sequence pattern networks mined from trajectories released. International Journal of Data Warehousing and Mining (IJDWM), 15(3), 63-89.