Distributional Semantics
As has become known by now, semantics characterizes the discipline concerned with the study of word meanings. High interest in the term persists not only because determining the semantic mechanisms of the human mind allows a more profound definition of brain potentials but also because the issue of machine learning is becoming increasingly urgent. Arrays of textual data, whether lyric texts or databases, are characterized by an almost uncountable array of semantic meanings encapsulated in lexical units (Xue et al., 2018).
Semantic folding is a valuable tool of computer logic to fragment substantial data sets into individual, semantically meaningful units that allow easier management and information retrieval of data in such large databases. Until now, the definition of the essence of semantic folding, in which individual terms, whether isolated words or word constructions, have been defined as closely and logically related, even though they are spelled differently, has remained an important issue. In fact, the training of computer systems for semantic retrieval is human since the basic mechanisms of semantic folding have been copied and replicated from the human brain device (Stanford, 2009; Webber, 2015).
Thus, for reasons as yet undescribed at this point, the individual clearly understands that the words “computer” and “laptop” are somewhat related and have a similar, but not identical, meaning. In an attempt to explain word recognition and interpretation mechanisms, the academic community has formed several fields within which computational semantics exists. Distributional and formal semantics in terms of computational linguistics stand out among them in particular.
In a sense, one can point out that both units of semantics are exceptional cases of a general semantic structure since each of them generalizes its unique domain of knowledge. This fact, in turn, encourages researchers to make attempts to combine distributional and formal semantics into a single artificial system in order to improve the parameters of information retrieval (Boleda & Herbelot, 2016). However, a discussion of the functionality of such search should be preceded by an in-depth discussion of each type of computational linguistics. Therefore, according to some authors, semantic understanding of word relatedness is based on a foundation of distributional semantics, a simplified interpretation of which can be reduced to the unconscious use of all contexts with which the terms being studied are associated (Kuzmenko & Herbelot, 2019; Karlgren & Kanerva, 2019).
To put it another way, the human brain compares the semantic cores of “laptop” and “computer” to some individual contexts of each concept and, through this comparison, concludes that the words are related. The same algorithms apply to computer systems because when trying to train machine intelligence to understand and operate on semantics autonomously, it is necessary to design a model of all varieties of contexts based on an extensive text corpus (Cortical.io, 2017). In that case, simplistically speaking, if two database words constantly have the same “neighbors,” in this case, with high confidence, one can state that such words mean approximately the same.
Traditional distributional semantics uses vector-statistical models of computation. More specifically, each lexical unit in textual data is assigned a context vector, with its set forming a word vector space (Schutze, 2019; Wang & Jin, 2006). Distributional representation functions that represent words in the form of a vector with given numerical coordinates are standard for word2vec transitions (Church, 2017). For example, numerical constants can be calculated based on the frequency of occurrence of specific terms in a text corpus. Such linguistics is established on the hypothesis that words with a similar distribution of context vectors are most likely to have a close meaning (Webber, 2015).
In this sense, it is essential to emphasize what is meant by a similar distribution of context vectors. As a measure of the semantic coherence of terms, the remaining words in the text block serve as a measure, and the quantitative measure is the frequency of co-occurrence between a particular word and a benchmark from the database (Cortical.io, 2017). In other words, the higher the value of these frequencies, and the more similar they are, the more likely the two words are semantically related. Notably, the rules of classical geometry work in this text corpus: for example, the direct distance between vectors can be determined using cosine distance formulas (Webber, 2015; Karlsson, 2017).
In other words, the degree of lexical proximity between terms lies between 0 and 1, where one means identity (Karlsson, 2017). Interestingly, one of the properties of such a vector space is the impossibility, or to be more precise, unjustifiability, of measuring the distance between unrelated words. In practice, such vectors are rarely counted in tens but instead range from tens to hundreds of thousands (Boleda & Herbelot, 2016). Moreover, the smaller the distance between neighboring vectors in the field, the closer the linguistic units-carriers of vectors.
An alternative is a formal approach to the study of semantics, traditionally associated with natural language. Formal semantics is based on inductive logic, which allows obtaining the meaning of multisyllabic linguistic constructions based on the meaning of their individual parts (Boleda & Herbelot, 2016). As was the case with distributional semantics, the formal model also retains the continuity of human consciousness since the mechanism for recognizing previously unknown meanings was also copied from the brain’s semantics device (Stanford, 2009). More specifically, the learning of new concepts and notions for the individual is based on an inductive method, which interprets unknown words as possible. In terms of classical semantics, this induction is an example of extension, in which the meaning of words is scaled into broader clusters.
Strictly speaking, formal semantics works on the principle of finding the meaning of each word in a text corpus. In an attempt to determine the overall meaning of the text, this model combines the different meanings of words among themselves (Boleda & Herbelot, 2016).
This approach allows for increasing the scope of the compositions being studied through gradual scaling. At the same time, such semantics is not perfect: when focusing too much on individual lexical meanings, a deeper understanding of expressions can be lost, as Boleda & Herbelot (2016) have shown. Thus formal semantics have a severe gap between the interpretation of meaning, which is often accurate, and the explanation of that meaning on subjective levels. In turn, this leads to the problem of the psychological validity of the final interpretation. In other words, formal semantics is still not perfect, and therefore its use in isolation does not carry a high application value.
It is only natural that, following the goal of improving the available models of meaning, there is an ongoing attempt to define an integrated approach between formal and distributional semantics. Distributional semantics has made more progress in teaching computer intelligence to the principles of linguistics (Boleda & Herbelot, 2016; Kuzmenko & Herbelot, 2019). However, the cognitive plausibility of the individual models is still not perfect, and thus, attempts to combine them seem to be well-considered measures. Formal distributional semantics is not yet a fully formed subfield of semantic analysis, so its study is fraught with potential controversy. One of the most critical questions in this context remains the definition of the primary basis of formal distributional semantics: it is not entirely clear whether formalism or distribution is the core (Boleda & Herbelot, 2016).
It is not difficult to understand that formal distributional semantics have wide application in fields more related to natural languages. Thus, such models can be used to measure the semantic proximity of words, generate dictionaries of terms used, and determine the subject matter and keywords of the document being analyzed (Webber, 2015). An interesting use practice is determining the tone of an utterance, determining the possible message the writer was putting forward. In turn, such functionality opens broad horizons for literary analysis and bridges the gap between the author’s semantics and the reader’s perception (Leavy, 2019).
Concerning the industry under study, a formal and distributional approach to semantics allows for the management of ontological database schemas to improve the quality of tax advice. For example, one way to apply such semantics could be to predict the following steps when identifying a tax-related problem. In addition, if foreign dictionaries are connected to the semantics, it allows for more accurate management of the cross-national team and more competent delivery of the exact instructions to employees with different cultural and ethnic codes.
Semantic Folding Theory
One particular example of distributional semantics is semantic folding. The central core of this theory is the need for an HTML page layout that accurately reflects the meanings of individual blocks. The blind user experiment is a good explanation of this need. For example, a user with normal vision can usually easily explore different Web sites by clearly understanding the boundaries of individual blocks, such as header, the main text box, or comments under a post (Potluri et al., 2021; FCC, 2019).
The relatively easy ability to distinguish unique blocks is often dictated by design features in which each block is visually separated. In contrast, in the case of a blind user, visual familiarity with the posted information is not possible, so screen readers are used. These are specially designed programs that allow the contents of web pages to be read aloud so that a blind user can perceive this information through hearing (Yesilada et al., 2004). For such devices to read text blocks correctly, it is necessary to layout them intelligently, which is what semantic folding is used for. However, it is a mistake to assume that semantic folding solves only for people with disabilities, although it is workable in their case.
On the contrary, the functional spectrum of such a method is much broader and allows automated systems, whether search engines or computer intelligence, to perceive and process the information on a site (Webber, 2015; FCC, 2019). Consequently, the need for semantic folding was justified by the need for accessible site usage and increased relevance and integration with search engines.
Semantic folding emerged as a natural response to the outdated model of using the divergence operator, div. The stylistic tag allows the entire HTML document to be split into separate blocks, but the quality of this splitting cannot compare to semantic differentiation (FCC, 2019). More specifically, the divergence operator noticeably complicates the code and does not allow the semantic linking of individual lexical units as made possible in HTML5. At the same time, the use of semantic tags improves interaction with search engines since the algorithms of Google and other machines, having studied the layout of a particular site, can perceive the location of specific lines of code and, therefore, the visual elements of the page.
The computer mechanism underlying the theory of semantic folding should again be turned to the language of distributional semantics. Vectors assigned to each word in the text set represent a sparsely distributed space with different distances between them (Webber, 2015; Cortical.io, 2017). Notably, the term “semantic fingerprints” is often found in the literature, referring to word vectors (Webber, 2015; Han et al., 2017).
To put it another way, semantic folding describes the procedure of encoding textual data into machine-readable SDR code, widely used by HTM networks in layout. Cortical.io (2017), the creator company of SFT, provides a handy analogy to further understand the theory’s essence. Thus, information about a particular cat, its appearance, the shape of its muzzle and ears, and its coloring is fragmented into individual traits and then recorded as semantic fingerprints. Consequently, each cat with a different appearance will have a different final code in SDR format. Comparing the semantic maps of two expectedly close objects will have a high application value for evaluating from proximity because each of the bits of the code has a strictly assigned value. In other words, with such a system, it becomes possible to determine a semantic similarity between terms (Schutze, 2019).
A consequence of this approach is the ability to make quantitative statistical comparisons of textual variables translated into a suitable numerical format for this purpose. As already noted, measuring the distance between semantically close vectors, for example, using cosine theory, allows estimating the lexical similarity of words (Karlsson, 2017). Therefore, semantic folding helps to solve problems of comparing words, sentences, and texts with each other for specific application tasks of researchers.
Notably, there is a transparent connection between semantic folding theory and the principles of human brain neurobiology. In particular, Douglas Hofstadter, a prominent American physicist and computer scientist, developed analogy as the core of human perception of information. According to Hofstadter’s idea, the human brain uses similarity as an unorthodox premise for identifying unknown phenomena (Stanford, 2009).
Thus, each time an individual is introduced to a previously unexplored phenomenon or process, the brain automatically fragments it into unique but self-contained parts and compares them to the individual’s internal cognitive base. Such a process allows the nature of the entity being studied to be primarily determined through personal experience. An important consequence of this idea is that the broader the researcher’s initial knowledge base, the more likely it is to match the object being studied.
By the current moment, it becomes possible to fully generalize the theory of semantic folding to provide a further critical evaluation of its practical and theoretical applicability. To this end, it should be said that the entire theory under discussion is built on the principles of hierarchical temporal memory, which is directly borrowed from Hawkins’ neurobiology (Webber, 2015). The HTM model describes the properties of the neocortex to learn algorithms to solve models, investigate causes, and hypothesize about causes (Luo & Tjahjadi, 2020). In this context, Bayesian network theory comes to mind, and indeed, there are ideational similarities between the two models (Ding, 2018).
Progress in mapping the brain’s neural networks has, in a sense, inspired machine learning experts to produce complete semantic maps that allow a computer to analyze and perceive written text at the level of human perception. However, while large amounts of human memory can be processed relatively efficiently by the brain center, serious amounts of textual data cannot be analyzed with the same performance (Lutkevich, 2021).
To speed up processing, SDR code is used to translate verbal encoding into binary encoding. The compression of stored-memory objects is realized by saving only the positions of the set bits, and the error potential of bit-drop is negligible. In addition, SDRs can be combined according to the principles of formal semantics into larger subunits, preserving the identity of the fragments included in the new semantic map (Webber, 2015). To put it another way, the use of SDR coding in semantic folding networks dramatically simplifies the work with the layout.
The basic procedure of constructing a semantic map proposed by the author of the theory is based on the gradual conversion of textual information to vector information with the help of numerical functions that set unique coordinates. For example, the initial text block at the first stage of preparation is completely cleared from all non-text elements, whether visual or audio. The text is divided into separate fragments with an independent meaning: by analogy with academic works, separate fragments can be paragraphs or paragraphs. Each of the lexical units, a semantically isolated fragment of the source text, gets its pair of coordinates placed as a vector on the semantic map (Cortical.io, 2017).
The critical point of semantic folding is to define the methodology that underlies the translation of a word from textual to statistical representation. To do this, the system analyzes all fragments of the source text to identify the content of a particular word. If a text fragment does contain a given term, it gets unique coordinates on the map. The aggregate of such points becomes a personal footprint for a particular word, and the set of footprints is a dictionary (Webber, 2015; Cortical.io, 2017). In this field, those fragments with closely related meanings will be placed closer to each other while less related elements proportionally move away: clustering occurs.
If there is no doubt in the definition of connectivity between distributional semantics and semantic folding theory, since the latter is a consequence of the former, it is intriguing to compare the theory under study with another model. This no less significant model is called Knowledge Graph Embedding (KGE). Primarily, knowledge graphs are visual representations of the relationships between elements in a system, indicating the semantic essence of their relationships (Cai et al., 2018; Sabrina & Axel, 2021). This graph, for example, can explain the nature of relationships and authority in a large multinational company. KGE, in turn, is a method for transforming a graph into a vector while preserving the topology of the entire graph (Cai et al., 2018). Various functions can be used to make this transition, among which the representative TransE model is often used (Hogan et al., 2021).
Without going into deep details of the mathematical apparatus of this function, TransE performs a linear transformation by assigning a specific variable to each term (Ying, 2018). This transformation results in a single vector field reflecting the mutual relationship of terms in the original text fragment. It is easy to see that there are some parallels between semantic folding and KGE. Thus, both models use the concepts of low-dimensional vector spaces to obtain information about the relationships of field components. In both KGE and semantic folding, the distance between vectors is proportional to the type of relationship between terms.
At the same time, both models are widely used to predict causal relationships between the processes being studied, and therefore their development has practical value for applied sciences (Webber, 2015; Cai et al., 2018). However, it is impossible not to notice a difference: while KGE uses graphs as “raw materials,” semantic folding can work with text. In other words, folding has a broader range of quick uses and requires almost no preprocessing of textual information into a graph.
In this context, it is impossible not to discuss the well-known case of the American company Numenta, which develops its areas of machine intelligence. Like Cortical, Numenta aims to reconstruct the human neocortex, but such research focuses on improving machine learning systems (Numenta, 2020). Numenta’s promising product, the open-source NuPIC software code, uses the same HTM-based SFT system (Webber, 2015). Interestingly, the source word (N) of a given length using Cortical’s Retina functions can be reworked with machine learning algorithms to improve the prediction quality of related (N+1) words. As a consequence, the two mechanisms can effectively complement each other, improving the functional quality.
Critical Appraisal of the Semantic Folding Theory
It is appropriate to begin evaluating the applicability and effectiveness of semantic folding theory by discussing the domain of knowledge in which this algorithm is used in the framework under study. It is worth recalling that the company under observation is a multinational agency providing tax consulting and customer service. Thus, the subject area of the organization in question is reduced to financial management. It is a mistake to think that finance-oriented companies operate only with numerical data, because in reality, the actual industry must also consider the text in addition to numbers. The spectrum of tax companies’ responsibilities that require the use of text is heterogeneous: it can be communicating with customers and business correspondence, as well as reading and processing technical and legal documents written in a one-word language (Taylor, 2017; Webber, 2016).
The situation gets seriously complicated when a company moves to the level of an international agent hiring employees with different ethnic and cultural backgrounds. Conducting dialogues in different languages, using non-standard approaches to work, and the need to maintain a unified corporate culture impose essential constraints on information retrieval mechanisms (Sabrina & Axel, 2021). It is not out of the question that employees will have to perform complex searches on a single database for a particular document, whether a quarterly report or a municipal ordinance. Alternative examples of practical work tasks might be analyzing a text document to establish its topic, especially for electronic correspondence (Han et al., 2017).
Reading large volumes can take time, and information retrieval based on textual analysis is expected to optimize these costs. Consequently, an efficient, fast, accessible, and accurate search algorithm based on machine learning — with a modification of NuPIC — is a priority goal for the organization in question.
As a particular example of distributional semantics, Semantic folding is the kind of breakthrough tool that is qualitatively different from existing offerings. While most linguistic processing mechanisms are based on statistical calculations, semantic folding operates on natural language terms. Simulating the work of the interconnected structures of the human brain, which Webber (2015) wrote about, in the limit indicates much higher reliability of the created folding system than other artificial algorithms.
The consequence of such a system becomes a natural ability to self-learn and determine solutions to the generated problem. For example, by classical methods of statistical language processing, the word “organ” could be counted in all text fragments, and the final result could indeed highly precisely identify the ideational theme of the analyzed text and the keywords. However, such tools would hardly cope with a semantic analysis describing several lexical meanings of that word. Semantic folding theory perfectly covers this need by comparing binary vectors for each word in the source text. Thus, in textual analysis, a particular word’s context is prioritized (Webber, 2015).
Invisible to the human eye, the resulting semantic map determines the degree of relatedness between terms and delineates the text’s themes. In this case, the “organ” receives a multisyllabic description, and the analysis operator receives information about the semantic meanings of a given term found in the text. Another advantage of the semantic folding theory is comparably high performance in analyzing large arrays (Webber, 2016; Luo & Tjahjadi, 2020). By packing bits densely in a vector field and using a unique approach of storing only meaningful, different bits, the amount of memory required for computation is significantly reduced (Cortical.io, 2017). This makes it possible to process terabytes of text data at high speed.
At the same time, the idea essence of the semantic folding system is highly adaptive. Such algorithms are successfully embedded and incorporate systems designed to perform specific tasks, be it document search or business email processing with automatic sorting (Taylor, 2017). One can expect improved content personalization algorithms using this system if one extrapolates the potential of applying semantic folding to business processes in the coming years. Selecting targeting vectors with key user interest vectors may allow intelligent ad customization and increase its commercial effectiveness (Taylor, 2017).
Automatically processing customer inquiries based on the question entered can also improve customer interaction and give them a more complete and, importantly, correct response without engaging support services. Exploring email content can identify the possibility of fraud and data tampering with high accuracy, which is especially relevant for companies in the financial services sector (Levine, 2021). Finally, the possibilities of semantic folding are incredibly enhanced when a translator is used as a modification (Webber, 2015; Cortical, 2021). Thus, the mapping of semantic meanings occurs in parallel in several dictionaries, which increases the completeness of the given response.
However, it is fair to admit that despite all the apparent advantages for business, the theory of semantic folding is not without essential drawbacks. However, the disadvantages described below are hardly sufficient grounds to refuse to invest in this information retrieval technology. The fact of their existence imposes limitations on the financial, time, and technological resources of the enterprise. First, this system is still too young to be widely used in different markets and a highly competitive environment. This algorithm is a product of the past decade, from which the main difficulties stem from the relatively high cost of maintenance, potential imperfections, and the need for regular updates (Webber, 2015).
Second, semantic folding is still not a universal solution to semantic analysis since it aims to process only textual data. Thus, audio and visual content cannot be analyzed using this technology (Luo & Tjahjadi, 2020).
Another disadvantage comes from the processing of initial inputs for verification (Webber, 2015). Although there is no need to remove pictures manually and other non-text material from the input message, and even though the semantic folding process is generally autonomous, the possibility of erroneous readings cannot be excluded. Hidden text encoded in images can be accidentally read during analysis, which will affect the quality of the output. At the same time, the increasingly popular voice format, including in business correspondence, is not affected by semantic folding procedures due to the inability to read the audio format (Prist, 2020). It is possible to modify the program with an audio-to-text translation system, but the quality of the such translation will also not be highly accurate.
It should be said that it is a breakthrough technology that significantly expands the possibilities of textual analysis. The underlying idea of copying the functioning of the neocortex allows even the most detailed, highly specialized tasks to be reconstructed, using nature as an example to follow. Semantic folding solves most of the pressing issues associated with textual analysis and provides excellent opportunities for business processes.
Processing digital correspondence, searching for documents in a database, identifying keywords and semantic units of complex and large texts, and identifying potential themes and message directions are just some of the promising applications of this technology. At the same time, the main disadvantages of its use concern the limitations of using only text corpora or the relatively unaffordable price. Consequently, semantic folding algorithms do an excellent job and will potentially evolve and expand.
References
Boleda, G., & Herbelot, A. (2016). Formal distributional semantics: Introduction to the special issue. Computational Linguistics, 42(4), 619-635.
Cai, H., Zheng, V. W., & Chang, K. C. C. (2018). A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9), 1616-1637.
Church, K. W. (2017). Word2Vec. Natural Language Engineering, 23(1), 155-162.
Cortical.io. (2017). Semantic folding: A new model for natural language understanding [Video]. YouTube. Web.
Cortical. (2021). A new way of representing language. Web.
Ding, N., Gao, H., Bu, H., & Ma, H. (2018). RADM: Real-time anomaly detection in multivariate time series based on Bayesian network. Web.
FCC. (2019). Semantic HTML5 elements explained. Free Code Camp. Web.
Han, H., Yao, C., Fu, Y., Yu, Y., Zhang, Y. and Xu, S. (2017). Semantic fingerprints-based author name disambiguation in Chinese documents. Scientometrics, 111(3), 1879-1896.
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C.,… & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37.
Karlgren, J., & Kanerva, P. (2019). High-dimensional distributed semantic spaces for utterances. Natural Language Engineering, 25(4), 503-517.
Karlsson, S. (2017). Using semantic folding with TextRank for automatic summarization [PDF document]. Web.
Kuzmenko, E., & Herbelot, A. (2019). Distributional semantics in the real world: building word vector representations from a truth-theoretic model [PDF document]. Web.
Leavy, S., Meaney, G., Wade, K., & Greene, D. (2019). Curator: A platform for semantic analysis and curation of historical, literary texts [PDF document]. Web.
Levine, S. (2021). Semantic folding solves the problem of too many emails. Future of Sourcing. Web.
Luo, J., & Tjahjadi, T. (2020). Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding. Sensors, 20(6), 1646-1671.
Lutkevich, B. (2021). Natural language processing (NLP). Tech Target. Web.
Numenta. (2020). Advancing machine intelligence with neuroscience. Web.
Potluri, V., Grindeland, T. E., Froehlich, J. E., & Mankoff, J. (2021). Examining visual semantic understanding in blind and low-vision technology users [PDF document]. Web.
Prist, A. (2020). WhatsApp for business: A complete guide on the most popular messenger. Medium. Web.
Sabrina, K., & Axel, P. (2021). Knowledge graphs for analyzing and searching legal data [PDF document]. Web.
Schutze, H. (2019). Distributional semantics word embeddings [PDF document]. Web.
Stanford. (2009). Analogy as the core of cognition [Video]. YouTube. Web.
Taylor, A. (2017). Semantic fingerprinting: Natural language for the finance industry. Lending Times. Web.
Wang, X., & Jin, X. (2006). Understanding and enhancing the folding-in method in latent semantic indexing [PDF document]. Web.
Webber, F. D. S. (2015). Semantic folding theory and its application in semantic fingerprinting [PDF document]. Web.
Webber, F. D. S. (2016). Arguments for semantic folding and hierarchical temporal memory theory [PDF document]. Web.
Xue, C., Lu, S., & Zhan, F. (2018). Accurate scene text detection through border semantics awareness and bootstrapping [PDF document]. Web.
Yesilada, Y., Harper, S., Goble, C., & Stevens, R. (2004). Screen readers cannot see [PDF document]. Web.
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., & Leskovec, J. (2018). Graph convolutional neural networks for web-scale recommender systems. Web.