It is widely accepted that many state and non-state adversaries are approaching technical parity with United States military. This is especially the case where commercial research and development produces militarily useful technologies such as cyber, robotics, and drones. (The Operational Environment and the Changing Character of Future Warfare, 2017) The global diffusion of technology has reduced the cost of entry barrier to technological warfare. In fact, in many areas, commercial research budgets far exceed DoD expenditures. Often when the DoD does develop technological “things”, the duration of advantage often quickly erodes as technology is the easiest thing to copy.
Technological overmatch is still achievable in the time domain by creating a faster process to ingest and operationalize new technologies from anywhere. The hardest thing for U.S. adversaries to duplicate is the integration of advanced technologies with skilled soldiers and well-trained teams. Investing in an advanced process to operationalize technology will produce an enduring source of overmatch versus purely creating technological “things.” Succinctly stated: “Process over Platforms.” (Martin & FitzGerald, 2013)
Early Synthetic Prototyping
Early Synthetic Prototyping (ESP) is an effort to construct a physics-based game environment to rapidly assess how technologies might be employed on the battlefield. ESP is presently led and funded by the Army Capabilities and Integration Center (ARCIC) and supported by U.S. Army Research, Development and Engineering Command (RDECOM) labs. The first effort is a small unit first person shooter entitled Operation Overmatch. The bulk of this paper will focus on Operation Overmatch. Operation Overmatch is currently at an alpha stage at the time of this writing and expected to be production-ready by October of 2019. There is a lot of challenging research that needs to be performed to integrate ESP in systems engineering processes, especially in the area of data analytics.
ESP is envisioned to be a persistent game network that allows Soldiers to play scenarios and provide experiential feedback to concept and capability developers. Soldier assessment from the game environment will be used to inform materiel tradespace exploration while simultaneously assessing force employment and force design development. ESP will greatly enhance the communications between engineers and Soldiers. Engineers often lack a deep understanding of how new materiel may be used and what performance is needed. At the same time, Soldiers gain an early understanding of potential new technologies for the U.S. Army and how a future enemy might exploit the same. Here’s how the process might work:
First, concept and capability developers, as well as scientists and engineers from across the Army will postulate various force employment, force design, and materiel capability theses. These ideas are then modeled in the game environment with an appropriate amount of physics rigor. Scenarios are created that specifically address what the Army wants to learn about the postulated solutions. For example, the Army may want to explore how future platoons should be equipped and employed in an airfield seizure against a near-peer threat.
Next, the game is distributed to Soldiers across the Army (presently over Steam, a digital gaming distribution platform developed by Valve Corporation) and they are able to learn how to use and modify the equipment in single-player missions before engaging in multi-player scenarios against other Soldiers. Some Soldiers will play as an opposing force using emerging threat platforms and some will play as the U.S. player. Following each scenario, the players are able to provide feedback about what they liked/disliked and provide recommendations. Additionally, the game server will collect game data for analysis. This process is intended to repeat continuously with changing equipment, scenarios, organization, goals, rules, and objectives.
Where ESP Fits with Systems Engineering
The idea of ESP fits tightly with the latest Office of the Secretary of Defense (OSD) systems engineering initiatives: Digital Engineering (Gold, 2017). ESP also supports the OSD Mission Engineering concept (Gold, 2016) that treats the end-to-end mission as the system in the operational context to drive performance requirements for individual systems. Inherent in Mission Engineering is to use an assessment framework to measure progress towards mission accomplishment through test and evaluation in the mission context. ESP creates a digital operational assessment loop and provides measurable data to systems engineers as shown in Figure 1.
Figure 1: Physics-Based Digital Warfighting Connection to Tradespace Decision Tools.
Starting in the upper left, technological solutions are analyzed using traditional higher-fidelity computer aided engineering (CAE) simulations. These simulations are turned into real-time lookup tables inside the game to assure accurate game physics. Scenarios are simultaneously developed over some mission set. Next, players can use design mode to construct a vehicle (in this case) that they feel will best achieve the mission at a good score. Players are provided a limited virtual budget which would allow them to, for example, up-armor the vehicle. Up-armoring will add weight and cause more rollovers and slower acceleration during the game. Budget constraints assure Soldiers do not simple pick the highest tech solution and forces them to make cost-constrained trades based on their evolving virtual experience.The current process of developing a capability from concept to product is a largely linear process that seldom gets continuous feedback from Soldiers. According to Boehm (2010), “The weakest link in SE is often the link between what the warfighters need and what the development team thinks they need, together with a shared understanding of the operational environment and associated constraints and dependencies.” GEN Perkins stated when presenting Win in A Complex World (Perkins, 2015), “A CONEX full of electronic gear is not a capability…that is a property accountability nightmare…a capability is technology in the hands of Soldiers, who are trained to use it, and can apply it on the battlefield.” When Soldier feedback is captured, it is typically from small focus groups of Soldiers. The ESP process enables continuous feedback among all stakeholders as illustrated in Figure 2.
Figure 2: Early Synthetic Prototyping Enables Soldiers and Engineers to Co-Develop Solutions.
There are several advantages of incorporating ESP into the concept and capability development process. First, ESP allows Soldier feedback early in the development process where design changes are significantly less expensive in terms of resources, time, and money. Second, ESP allows orders of magnitude more design options to be explored in a crowd-source game environment because Soldiers could make changes in model performance in a game environment in a short period of time whereas physical prototyping could take weeks, months, or years to change a physical prototype’s characteristics. Third, the ESP process enables the Army to develop a greater understanding of the problem while developing a greater understanding of potential solutions that span materiel (capabilities), doctrine (force employment), and organization (force design) considerations.
Early Synthetic Prototyping might inform tradespace tools such as Army WSTAT (Edwards, 2012) and Marine Corps FACT (Browne, Ender, Yates, & O’Neal, 2012) as shown in area 3 of Figure 1. For example, ESP warfighting data could allow data-centric rank ordering of performance requirements instead of relying on subject matter expert (SME) options. Additionally, tactical utility functions may be computed on requirements to assess the mission success value of exceeding threshold towards objective requirements of various engineering solutions over multiple vignettes. Tactical utility may be loosely defined as: Probability Mission Success / Total Burden.
Allowing soldiers to test-drive virtual systems in various operations will enable program managers to compare system resilience and tactical utility against cost, schedule, and risk. An example of how this might look for an analysis of alternatives is shown in Figure 3.
Figure 3: Assessing Maximum Warfighting Benefit at Minimum Burden. Source: Keena(2011)
Figure 3 was created from Keena (2011) game data consisting of 1400 runs in MindRover on robotic ground vehicles. MindRover is a Defense Acquisition University (DAU) teaching game from the capstone PM course. MindRover has limited physics, but the data is illustrative of what could be done with more rigorous efforts under ESP. The labels at the bottom of Figure 3 show s=acceptable survivability, S=enhanced survivability; l=acceptable lethality, L=enhanced lethality; m=acceptable mobility, M=enhanced mobility. All the data was normalized and based on random trials with participants testing forced vehicle configurations. It is relatively easy to find the best tactical utility on this simplified unweighted tradespace.
Seater (Seater, 2016) demonstrated for the Air Force (contract FA8721-05-C-0002) that within a game environment, players do discover novel and effective strategies. Seater created an unmanned aerial system (UAS) game and conducted game-based experiments using 36 participants over 5 trials. Participants were given a budget and chose technologies to test on their UAS. One result was corroborating that gameplay significantly changes player opinions as shown in Figure 4. In this figure, each bar is one capability. Bar height is the difference between the average survey-based utility rating (1=low, 5=high) that capability received before and after players used cost-constrained gaming. This shows that the act of playing the game with in-game tradeoffs between capabilities and different strategies changed players’ opinions of the utility of the proposed capabilities. In particular, gameplay nearly universally made players more critical about which capabilities would be useful to have in the field. This suggests that ESP won’t just quantify a systems engineering analysis that is traditionally ad-hoc, but also improve the quality of qualitative feedback from participants by providing confidence that the preferences exhibited by players are not just wishful speculation.
Figure 4: Average Change in Survey Rating of Capability Utility. Source: Seater (2016)
Further, Seater showed that it is possible to discern mission success correlations in combinations of technologies as shown in Figure 5. The figure shows there is a synergistic effect between choosing the drone camera and drone camera arm, where there is no advantage to choosing RF and IR sensors. His work also found limits in the statistical significance of data with only 36 participants which ESP’s crowdsourcing approach should eliminate. Since the goal of Mission Engineering is to treat the mission as the system, individual platform combinations and redundancies might also be isolated.
Figure 5: Assessing Utility of Combinations of Technologies from Game Data. Source: Seater (2016)
Operation Overmatch is a first-person shooter type game within the Early Synthetic Prototyping effort. Through a collaborative effort between TRADOC, U.S. Army Research and Development Command and Army Game Studio, Operation Overmatch is being government-developed using the Unreal 4 commercial game engine. Initially, within Operation Overmatch, Soldiers will be able to play eight versus eight – against other Soldiers, where they will fight advanced enemies with emerging capabilities in realistic scenarios. Players will also be able to experiment with weapons, vehicles, tactics and team organization. Game analytics and Soldier feedback will be collected and used to evaluate new ideas and to inform areas for further study. A screen shot of the current alpha release is shown in Figure 6.
Figure 6: Operation Overmatch Alpha Version Screenshot
The game currently models a few future vehicles to include variants of manned armored vehicles, robotic vehicles, and UAVs. The scenarios are centered on manned/unmanned teaming at the squad and platoon level in an urban environment. Through game play, Soldiers will provide insights about platform capabilities and employment.
Operation Overmatch will have several defining features: 1) It will be physics-based. The fidelity of accuracy will vary depending on the stage of acquisition, but this distinguishes it from commercial games such as Call of Duty. 2) It will be crowd-enabled. Survey data from an ESP pilot study at Ft. Bliss (Vogt, Megiveron, & Smith, 2015) indicates a potential of up to a million hours of game play a month. The Ft. Bliss test found more than 87% of Soldiers played video games and 50% of Soldiers played more than 10 hours of video games per week. 3) Operation Overmatch will produce measurable data regarding warfighting theses on equipment. 4) Lastly, it will provide some sort of leaderboard and discussion area so innovative ideas may be piggybacked off each other.
Data Mining Challenge
Since observers will not be able to interact with players after experiments in ESP, which are conducted at the leisure of participants, it is important to mine game telemetry to gain understanding. The volume of telemetry data collected in Operation Overmatch creates a challenging big data/ spatio-temporal data mining problem. Tactics and mission performance specifications are interrelated. For example, a heavy/slow tracked vehicle would be used completely differently than a light/wheeled vehicle. It is important to discern the best tactics, along with the design of the system corresponding to the tactics. This is complicated by the fact that people have tastes and preferences. Additionally, players may just be “playing around”, or simply learning.
Currently researchers at the Tank Automotive Research Development and Engineering Center (TARDEC) are doing internal research work and have sponsored two Phase II SBIRs on Tactical Behavior Mining. In lieu of Operation Overmatch data, the performers are using large public game data sets from the commercial game Defense of the Ancient 2 (DOTA2). DOTA2 is an excellent surrogate for Operation Overmatch since it’s somewhat strategic and there are different characters with different powers (similar to different platform configurations).
Figure 7 shows how low-level actions can be learned and inferred from telemetry, which then may be assessed for their ultimate contribution to mission success or failure. This unsupervised learning method clusters the telemetry data into unlabeled actions (which an expert manually labels later), such as standing, walking. Next, sequences of actions are grouped into behaviors, forming a hierarchical model of agent behavior. These groupings are context dependent, based on the state space.
Figure 7: Using Machine Learning to Discover Tactics in Game Telemetry Data. Credit: (Kooij, Englebienne, & Gavrila, 2012).
Once the data is organized into actions and behaviors, it is possible to further use machine learning to discover the behaviors that drive mission success and to understand the optimal actions that should be taken in a given scenario to accomplish a mission. The wealth of extracted data will provide sufficient coverage of possibilities and contexts to determine the combinations of technologies and tactics that are most appropriate to achieving an objective. The challenge of learning optimal behavior strategies requires first learning the relative importance of various reward factors. Using an Inverse Reinforcement Learning technique (Tastan & Sukthankar, 2011), it is possible to essentially generate the reward functions from observations of successful missions. This feedback allows analysts to develop an understanding of optimal tactics for specific battlefield mission and conditions, in the context of soldier skill sets and equipment load-outs.
Figure 8: DOTA 2 Action Discovery Over Game Playfield. Credit: Decisive Analytics Corporation.
Figure 8 shows some of the initial data mining work from Decisive Analytics Corporation (contract W56HZV-15-C-190) on a DOTA 2 dataset. The clustered Actions are automatically identified, but require experts to label them. The Action labels are in white numbers, the ellipse shows the radius of influence for each Action, while the line with white * shows the mean direction of motion. Two example Actions are highlighted. Each shows a hypothetical Army analogy for what the label of the Action might be.
Figure 9: Glyph Visualization. Credit: Soar Technology, Inc. and Northeastern University.
Figure 9 shows some early results of data mining and visualization from SoarTech and Northeastern University (contract W56HZV-15-C-188). Glyph is an interactive visualization system, designed for understanding behavior traces of user groups (Nguyen, Seif El-Nasr, & Canossa, 2015). This visualization is showing a state transition diagram (left) and a cluster of behaviors (right) for 8 entities in a play session of ESP. The figure on the left shows behaviors of different units from start (blue) to end state (red). All discrete actions were visualized, such as InfantryMounEvent (infantry mounting a vehicle), InitialArmorEvent (initializing the armor configuration on vehicle), DRE (Damage Received Event), UseOpticEvent (when a unit looks into a view), WD (Weapon Discharge Event), DamageStateChangeEvent, and DestroyedEvent. Movement events were collapsed to Move_20, which is movement for 20 seconds. The figure on the right shows the patterns of behaviors (represented as nodes) and their popularity (encoded as node size) as well as their difference (encoded as distances between nodes). For example, pattern 5 was done by few people but was very different from all other patterns. The big circle in the middle (labeled 0) is a popular pattern exhibited by many players.
Scoring Mechanism Research
One important aspect of ESP is that Soldiers act in a tactically sound manner to ensure that data collected is accurate. Scoring drives is one method to drive realistic behavior in a game environment and it also may increase player enjoyment. The scenario needs to be realistic and an appropriate scoring mechanism should be developed. For example, it may be desirable for the friendly force scoring to be different than the opposing force. For the friendly force, scoring might be weighting to discourage collateral damage and death of non-combatants. The opposing force may gain points for collateral damage.
Ross (2016) investigated scoring mechanisms that ensure relevant data to answer engineering design questions used to inform acquisition decisions. Ross suggests that metrics should maintain traceability to the research questions that the ESP study is seeking to address. This ensures that the scoring mechanisms are encouraging the intended behaviors. Ross also suggested that once a scenario has been determined, a study team would determine what outcomes would constitute mission success. The players would then be provided outcomes as game objectives. Scoring algorithms would provide scores to successfully meeting rewarding mission objectives or reduce scores by a flat rate for violating punitive mission objectives. The value of completing an objective will be proportional to its overall significance in contributing to mission success. The challenge of this method is in not over constraining tactics and reducing creativity. Additionally, for players attempting multiple scenarios/ games over time, how to normalize scores remains an open research challenge.
Seater (2016) investigated combining game theory with auction theory to drive players to think critically about customizing their platform with technology (the design area shown in section 2 of Figure 1). Seater set up a technology market with non-fixed prices. So, for example, choosing to up armor a platform would have an initial cost. If that technique proved useful and more players started up armoring, the price to up-armor would increase. The market based costing for the customization shop can be shown to increase creativity by forcing players to explore other options. Additionally, it will force a quicker convergence to the true value of the technology versus other choices.
Early Synthetic Prototyping is poised to help the DoD achieve an enduring time-domain overmatch even if U.S. adversaries achieve technical parity. ESP provides a rapid digital assessment framework to measure progress towards mission accomplishment through test and evaluation in the mission context. Combined with advanced manufacturing (Smith, 2016) (Martin & FitzGerald, 2013), ESP could enable the DoD to ingest technologies from anywhere, figure out how to use them on the battlefield, and rapidly place the output into the hands of Soldiers who are readily able to employ them on an evolving battlefield. The hardest thing for U.S. adversaries to duplicate is the integration of advanced technologies with skilled soldiers and well-trained teams.
ESP is not a turn-key software implementation. There are many challenging research questions, many of which have still not been addressed. Foremost is continuing research on data mining and data farming. Security considerations also present unique challenges. The DoD labs might help to address some of these and additional research questions, including:
Are the results of analysis from Soldier feedback significantly different from the results of analysis from traditional experimentation? How do you begin to allow the Soldiers an active role in the design of platforms? How can we perform autonomous interviewing to understand why Soldiers made tactical choices? How do you assure that the correct level of physics has been captured or quantify the error? Can an AI be used for the opposing force or must human on human play always be used?