Sentiment and emotion analysis are critical tools in knowledge aggregation and interfacing with people. As we move from the industrial age, where wealth is measured in capital, into the information age, Barbara Endicott-Popovsky suggests that knowledge will be the new measure of wealth . According to Addleson, knowledge management typically takes two approaches, either focus on people as knowledge workers or on the tools and data . With the rapid development of neural networks, these two knowledge management foci can merge as machines become the knowledge workers. As machines take on the role of knowledge workers, there will be an increased need for machines to recognize emotion as well as sentiment. Current state of the art methods for machines to distinguish sentiment and emotions utilize artificial neural networks. This article will discuss artificial neural networks and how they are used in emotion and sentiment analysis, as well as a look into how these technologies can allow machines to be a more integral part of knowledge management and the cyber domain.
In this article, the use of sentiment analysis is based on Scherer’s typology of affective states [3, 4, 5]. According to Scherer, sentiment analysis focuses on attitudes, which are enduring beliefs towards objects or persons. Due to the enduring nature of sentiment, written views are a common source for this analysis. Following Scherer’s typology, emotion is considered a brief organically synchronized event; thus, emotion analysis is highly temporal and triggered by any and all stimuli. In terms of emotional analysis and detection, the focus will be on the seven universal emotions identified by Ekman ; joy, surprise, fear, anger, sadness, disgust, and contempt . The data used as the basis for emotional analysis, as discussed in this article, focuses on images or video capturing a specific emotion in time. Analysis of these two affective states requires different approaches due to the medium by which they are conveyed.
Modern sentiment and emotion analysis are built on decades of psychological research. Natural Language Processing (NLP) is critical for sentiment analysis based on the use of statistics as discussed by Manning and Schütze . With an understanding of word usage frequency, various methods can be used to assign a sentiment by sentence, paragraph, or even larger portions of text. Supervised and unsupervised sentiment analysis are the two approaches used. These methods typically utilize a sentiment lexicon coupled with some machine learning algorithm like the following exemplars: bagging, K-Means, support vector machine or naive Bayes classifiers and/or some form of a hybrid [8, 9, 10, 11].
Just as NLP is a critical stepping stone for work in sentiment analysis, computer vision is critical to the area of automated emotion detection [12, 13]. Ekman  devised the Facial Action Coding System (FACS), which mapped facial muscles known as, or Action Units (AU), and combinations of AU to the facial expressions related to the seven universal emotions. Various methods have been used to automate emotion detection from video and images, as well as other factors, like attention. Initially, basic facial landmark methods were used to establish an AU, and from there, a probability of the associated emotion was calculated [12, 15]. While the trend for emotion detection is moving towards artificial neural networks, the field is still young. There has been considerable exploration of the other methods for emotion detection from images and video. These other methods typically leverage computer vision techniques like Histogram of Oriented Gradients (HOG), Histogram of Image Gradient Orientation (HIGO), Histograms of Optical Flow (HOF), Local Binary Patterns (LBP) coupled with Support Vector Machines (SVM) or support vector regression (SVR) [16, 17].
Artificial Neural Networks
The computational model for an artificial neural network was proposed in 1943 by McCulloch and Pitts , when trying to understand how cats and monkeys process information from the eyes. Computational limitations greatly hampered widespread use and simpler machine learning methods such as SVMs. The increase of computational power and the need for more sophisticated machine learning solutions that are less sensitive to noise has resulted in a resurgence of interest in artificial neural networks.
The promise of Artificial Neural Networks (ANN) is to move beyond the von Neumann computer architecture . The von Neumann approach has resulted in computers that can outperform people in the numeric domain. However, there is a need for algorithms that can learn and adapt in order to solve new problems such as sentiment analysis and emotion detection.Understanding ANNs starts with understanding the neural networks on which the ANNs are modeled. Typical neurons have three components: inputs (dendrites), the cell body (soma), and the output (axon) . Each neuron has multiple inputs which go into the soma. From there, a neural network is formed by a single axon branching out from the soma, connecting to other neural networks via their dendrites. Figure 1 illustrates the structure of a neuron showing the inputs via the dendrites on the left, where the signals travel through the soma and then, depending on the inputs, a signal may be sent out via the axon.
The axon will branch in order to connect to multiple other neurons. The connection from the axon to the dendrite of another neuron is called a synapse. The speed (much slower than electrical signals) that the signals travel in a neural network, when compared to time it takes for a response to stimuli, suggests that signal processing takes less than 100 stages .Figure 2 illustrates a Feed-Forward Neural network (FNN), the first of two major types of ANNs, with multiple layers. The bottom neuron or node is highlighted to illustrate how the node processes a signal input . For each node, the inputs are multiplied by learned weights (wj) and summed. Weights can be positive or negative, consistent with exciting a neuron or inhibiting a neuron. To this sum, a learned bias value is added (b). This sum is given to an activation function, of which there are many. Two of the most popular are the sigmoidal curve and the Rectified Linear Unit (ReLU). Some of the first neural networks utilized a unit step function acting as a binary neuron. For the majority of applications, the sigmoid or ReLU have replaced the unit step because they are differentiable, making learning methods like gradient descent easier. The nodes in the next layer that are connected to this highlighted node receive the output of the activation function and begin the process over again.
The feed-forward ANN, once trained, can be deployed and will not adapt or continue learning. The second major type of ANN is the Recurrent or Feedback Neural Network (R/FNN). This is illustrated in Figure 3 which shows a basic ANN with feedback. These types of networks continue to learn to adapt to changes, the complication being that training is slow and can stop if the gradient goes to zero . To address this, a long short-term memory unit (LSTM) was proposed. LSTM maintains a constant error along with the ability to forget and reset its state [23, 24, 25]. A long short-term memory LSTM units can continue to learn over 1,000 time steps. Figure 4 is an illustration of an LSTM unit. This unit uses weighted inputs summed with past inputs, which are then sent to an input activation function and activation functions that make up the input, output, and forget gates. The activation functions are typically sigmoid or tanh. The center contains the cell, which stores a continuous error of one multiplied by the output of the forget gate. The result of this multiplication is summed with the product of the input and input gate. The sum from the forget and input products goes to an output activation function, which is multiplied by the output gate.While the LSTM unit is more complicated than the simple neuron/node seen in Figure 2, it outperforms a traditional RNN. The benefit of the RNN type network (including the LSTM) is its ability to process temporal data or sequences, which is why these types of networks are typically used for sentiment analysis.
CNNs and Emotion Detection
Convolutional neural networks (CNN), a type of FNN, were inspired by looking at the visual cortex of cats and monkeys, which contain locally-sensitive, orientation-selective neurons [26, 27, 28]. This type of structure has proven to work well for visual analysis. CNNs are trained feature filters, which work well at identifying features that are related spatially. Figure 5 illustrates this starting with an image on the left and moving to the right; this represents showing the first set of filters in the convolutional neural network . The first filter is shown as the image directly to the right of the original image in Figure 5, highlighting the very basic edge/line detection. From there, additional filters are applied, each one adding a convolutional layer, which is more abstract than the previous (shapes, contours, objects). By the last layer, parts of a face can be identified, such as eyes, mouth and so on. The last layer in this CNN example are portions of the faces used to train the filters. Neural networks require large quantities of training data to ensure that generic features are identified and that overfitting does not occur.
Modern emotion detection methods utilize CNN for their utility in image identification [30, 31, 32, 33]. The CNN is used to classify an observed emotion on static images and relating them to the previously mentioned Action Units. As mentioned before, neural networks require large amounts of data and considerable computational power for training. However, once trained, a neural network classification is very efficient and typically exhibits higher accuracy when compared to other existing machine learning methods.
RNNs and Sentiment AnalysisWhile CNNs work well when information is related spatially, RNNs work well when looking at temporal or sequential information. RNNs have input, output, and hidden nodes which are connected to create an internal memory. This allows for the processing of arbitrary sequences, but these types of networks are very susceptible to vanishing or exploding feedback. If the feedback vanishes, no learning occurs. On the other hand, if the feedback explodes, incorrect learning can occur. LSTM units, a type of RNN themselves, are typically used in modern RNN, resulting in a stable learning network [23, 34]. The LSTM has been described as a low pass filter, keeping high frequency noise from confusing the answer . Figure 6 illustrates an LSTM-RNN network used for sentiment analysis . The LSTM units are the yellow filled cross-hairs. In a standard RNN network, these units would simply be removed and each word would go into the blue RNN layer. The outermost layer, known as the softmax layer, is used to calculate the sentiment probability of neutral, positive, or negative, along with the classification of the sentence as a whole.
RNNs and CNNs are currently the most common neural networks, but there are others and researchers are continuing to build deeper networks. Combining the benefits of a learned network found in CNNs with the ability to adapt and learn over time, has resulted in ANN, which are combinations of both architectures [36, 37, 38]. This is typically done by starting with a trained CNN and connecting that to a RNN, providing both spatial and temporal reasoning.
Generative Adversarial Networks (GANs), which combine multiple NNs in a very different way to provide surprisingly effective optimizations, are showing benefit in several areas.
In terms of human-AI teaming and applications to the cyber domain, there is even work moving ahead on helping humans understand the AI’s “point of view” to better solve problems and meet complex goals.
With the development of more advanced ANNs and the integration of as machines become integrated into the process, knowledge management will become diverse with people, tools, and data. Improving human-machine interactions through emotional intelligence is crucial in developing trust between people and machines . Advances to neural network algorithms, and better-quality computational capability are enabling better emotion and sentiment detection systems. This results in improving the human-machine interface as well as the machines ability to manage knowledge and interpret human interactions more effectively.
Companies from a variety of industries have been developing their own emotion detection systems or buying up other companies with experience in emotion detection [40, 41, 42, 43]. Most, if not all of these companies, are utilizing neural networks to understand emotions, and developing automated sentiment analysis for text, as well as voice analysis, to improve human-machine interactions . In the cyber domain, there is much work going on in the area of combined human-machine teams that require emotion and sentiment understanding to stand up to the sometimes complex scenarios of cybersecurity.
Neural networks are still in their infancy and it will be a moment while before neural networks will be able to think like a human due to the limited complexity of the neural networks currently possible. Williams and Herrup  have looked at the total number of neurons in different species central nervous systems. They found that small organisms, like metazoans, typically had less than 300 neurons, while the common octopus and small mammals, like mice, have between 30 – 100 million neurons. Larger mammals, like whales and elephants, have more than 200 billion neurons. Healthy adult humans of normal intelligence have an estimated 100 billion neurons. Estimates for the current number of neural units used in ANN is in the millions for the most complex networks . However, with the continued increase in computing power and introduction of new computational designs like neuromorphic commuting, closing the gap is just a matter of time [47, 21].