One of the fundamental challenges with data integration is establishing a common set of definitions. In this case the term ‘data’ is defined as:
“Data is a general concept that refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.” (Retrieved May 31, 2015 Data Characteristics, Wikipedia)
That represented information or knowledge also has a set of characteristics that underscore the definition. These characteristics are generally describing data as being relevant, complete, accurate, and current (Linthicum, 2009, p. 1). At this point it is important to note that these characteristics should be qualified as ‘useful data’. The point being that it is possible to have an item that meets the given definition but minus these characteristics is of little value. The terms utility and value are indicative of another term that is frequently used when describing data – ‘quality’. Some additional characteristics referring to the quality in addition to the four already listed are accessibility, consistency, and granularity (Characteristics, quizlet.com). The point of sharing these additional items is merely to show that data as a topic is of great importance which has garnered much investment of time for study. For the purposes of this paper, however, we will limit the list of characteristics to relevant, complete, accurate, and current.
As will be shown later, the crux of data integration revolves around format. Associated with format is the use of a variety of measures to represent a commonly named item or representation term. A representation term is a word, or a combination of words, that semantically represent the data type (value domain) of a data element (Representation, Wikipedia). For instance, consider the concepts of Speed and Location. These elements can become more complex when one considers that each can be represented by different units (Fig 1).
In order to ensure an accurate exchange of information between these simulations a conversion must take place. That conversion begins the introduction of complexity in to the process and represents a progression that will continue to increase with the introduction of multiple simulation architectures and their associated data components (Fig 2). The graph is a simple representation of the concept and not meant to imply there is a one-to-one relationship between the two variables.
As previously mentioned accounting for differences in data format is a key requirement for system interoperability and a significant source of complexity. A simple example is a data element in one system is described using six bits and in another system that same element is described using eight bits. Some form of conversion must take place in order for these data bases to accurately exchange information. That conversion step is yet another source of complexity. The following example using DIS and HLA data elements shows how quickly the level of complexity can grow. Data elements in DIS are known as Protocol Data Units (PDUs). The current standard incorporates 72 different types of PDUs arranged in 13 families. Each PDU is comprised of 576 bits (IEEE, 2011, p. 67).
- Entity information/interaction family – Entity State, Collision, Collision-Elastic, Entity State Update, Attribute
- Warfare family – Fire, Detonation, Directed Energy Fire, Entity Damage Status
- Logistics family – Service Request, Resupply Offer, Resupply Received, Resupply Cancel, Repair Complete, Repair Response
- Simulation management family – Start/Resume, Stop/Freeze, Acknowledge
- Distributed emission regeneration family – Designator, Electromagnetic Emission, IFF/ATC/NAVAIDS, Underwater Acoustic, Supplemental Emission/Entity State (SEES)
- Radio communications family – Transmitter, Signal, Receiver, Intercom Signal, Intercom Control
- Entity management family
- Minefield family
- Synthetic environment family
- Simulation management with reliability family
- Live entity family
- Non-real time family
- Information Operations family – Information Operations Action, Information Operations Report
Given a single architecture network the issue of complexity is fairly easy to manage. When the network design begins to co-mingle simulation architectures the resultant incompatibility between data structures results in additional issues that in turn gives rise to tertiary effects. Let’s look at the addition of HLA in order to better understand some of the issues involved.
The core data element for HLA is called a Basic Object Model (BOM). Like a DIS PDU the BOM structure captures a number of variables that describe an entity or Federate using HLA terminology (SISO BOM). Insight to the data integration problem is readily seen from comparing the components of an HLA BOM with that of the DIS PDU (Fig 3).
The complex nature of exchanging data between DIS and HLA is further exacerbated in practice because each Federate is actually described by an expanded form of the BOM known as a Federation Object Model (FOM). The description of the fields that are part of a FOM requires 27 pages of text and is therefore far too long for use here (IEEE, 2010, p. 34). Certainly such extensive detail leads to greater fidelity in the simulation entities but this in turn also underlies the creation of tertiary effects which contribute further to the profile of network complexity. These items primarily have a negative effect on performance which must be addressed. The tertiary effects include such items as (Lessmann, E-mail):
- How much data is being distributed from each entity?
- What is the update rate for this data for each entity?
- What is the packet size?
- Are they simple transaction (like bank exchanges) data exchanges, or do they contain rich state data that contains large amount of contextual data?
- Have all the entities joined the execution before publishing data or do they join bundled/ad-hoc?
- Are there filters in the system managing data flow?
The point here is that once designers invite complexity in to the network design there is a tendency for the effects to spill over in to other areas that may or may not be anticipated.
Fortunately the M&S community has a great deal of experience in addressing many of these issues. This has resulted in the development of ways to mitigate these challenges. The three primary areas are the use of standards, tools, and processes.