The development of communication technology allowed for the evolution of data analysis techniques, which can now incorporate not only analysis of quantifiable data, but also an examination of other types and sources of information. Text mining allows finding answers to complex questions, such as the origins of ideas or innovation, which can help improve management research. This paper will present a literature review of three articles that examine text mining methods for quantitative analysis.
Literature Review
Non-numeric data can be collected from a variety of sources, including social media or other websites. George et al. (2016) define text mining as a tool applicable when “seeking to answer questions such as where do ideas or innovations come from (p. 1493). The methods for such research can include extraction, retrieval, categorization, clustering, and analysis. There are several approaches to collecting such data, including independent raters, Amazon MTurk, or automatic resources. For Python and R programming languages, there are toolboxes that allow creating a program for collecting and analyzing textual data. George et al. (2016) state that both theory development and testing can be facilitated through text mining, encompassing the broad scope of applicability for this method. Once the data is collected, a researcher can decide on the analysis method through variable selection, which is crucial for large quantities of data.
Machine learning, which is a program that allows computers to recognize patterns in texts, is a useful method for text mining. Bach et al. (2019) examined the text mining techniques for financial services to locate structured and unstructured data from different sources. Mainly, linguistic, statistical, and machine learning techniques are used. Machine learning is one of the four essential techniques used for text mining, and it allows for supervised or unsupervised learning and detection of patterns. Therefore, machine learning is one of the basic methods for text mining because it allows for an analysis of large amounts of information because the software can be taught to detect patterns and retract them.
Another notable method is the visualization of data, necessary to see the relationship between different words or combinations of words. Notably, Bach et al. (2019) point out that visualization can also be applied as the final stage of the text mining process, to present the results of the analysis. According to Liu et al. (2018), “visualizations, typographic visualizations, and graph visualizations” are the simple and popular techniques of visualization (p. 1077).
Visualization can also be applied as a part of the reporting process, where the researchers discuss different specifics of their work, to enable a better comprehension of the content of their work. A variety of online tools for visualization of data were developed in recent years, one example being Tableau. Therefore, through visualization, the textual data can be analyzed using qualitative techniques.
Conclusion
Overall, text mining is a relatively new analysis technique that allows collecting information, both structured and unstructured, from a variety of communication sources. This approach can be used to test a theory or to develop a hypothesis. The main steps include retrieving information, extracting the necessary parts of it, classifying, categorizing, and summarizing. For the purposes of text mining, programming languages such as R or Python can be used. Alternatively, manual selection by independent raters can be applied for small quantities of data.
References
Bach, M. P., Krstić, Ž., Seljan, S., & Turulja, L. (2019). Text mining for Big Data analysis in financial sector: A literature review. Sustainability, 11 (5), 1277. Web.
George, G., Osinga, E. C., Lavie, D., & Brent, S. (2016). Big data and data science methods for management research: From the Editors. Academy of Management Journal, 59 (5), 1493-1507.
Liu, S., Wang, X., Collins, C., Dou, W., Ouyang, F., El-Assady, M., Jiang, L., & Keim, D. (2018). Bridging text visualization and mining: A task-driven survey. IEEE Transactions on Visualization and Computer Graphics, 1077-2626.