AbstractData sharing has become a given today, especially in cyberspace and social media. It is not entirely the case in the Intelligence Community (IC) due to security concerns and other architectural considerations, despite their quest for connecting the proverbial “dots.” This article will revisit several common data-sharing models and explore how the IC can take advantage of them, while taking security concerns and architectural differences into account. In other words, the discussion will focus on how IC members can mature individual stovepipe clouds into a community cloud where data will have a chance to become more widely sharable.
Section 1: Background/Introduction
In establishing the IC Common Operating Environment (COE), the Office of the Director of National Intelligence stated two major IC aims: (a) achieve IC savings through information technology efficiencies and (b) establish common IT architecture, but allow unique mission or specific capabilities. The ultimate objective is to share mission-relevant information efficiently and securely. Many initiatives have been taken to support the above aims and objectives across the community, such as IC Desktop Environment (DTE) and IC-Cloud. The Excel-Cylinder (EC) project supports these efforts with a data fusion platform that complies with Director of National Intelligence (DNI) standards and links virtual mission spaces into the wider COE.
Traditionally, IC information sharing has been achieved in many ways, such as through formal arrangements (e.g., liaison offices) or analysts “socializing” in mission-partnering situations. However, in modern warfare, including counter-terrorism, cyber operations and asymmetrical threats, “theaters” are dynamic and fluid. The elements of surprise and ingenuity, coupled with lethal force, are the main weapons of the bad guys. Therefore, the need for timely, online, on-demand information sharing beyond formal protocol or informal socializing is becoming more pressing than ever before. In this paper, we will use the EC project of a Special Access Required Agency (SARA) as an example, but this also may apply to other IC agencies’ cloud initiatives. As a member of the much-anticipated IC-Cloud, EC is building a framework for more dynamic and timely information sharing with mission-relevant information.
In this paper, we will (a) revisit some major information-sharing models and their architectural implications; (b) review the current EC architecture against the objective of efficient and secure information sharing among IC partners; and (c) explore the next logical steps for maturing EC architecture toward achieving an information-sharing framework—a framework designed to provide optimal usability to users of partnering agencies.
Section 2: Information-sharing Models
Information sharing ranges from (a) a fully integrated environment to (b) a common operating environment (hardware, software, toolsets, and at times, shared domains) to (c) a loosely coupled (federated) environment where information is shared, in most cases, through web services.
A fully integrated environment is an ideal setting for information sharing. However, due to special security or operational considerations, the reality is that such an environment rarely exists even within a single agency. In most cases, it is unfeasible because of differences in partners’ legacies, operating environments, cultures, and legal and budgetary concerns.
COE information sharing, on the other hand, is bound by specific interface protocols and aimed at supporting a number of missions. The EC program follows a Defense Intelligence Information Enterprise (DI2E) template for COE information sharing among mission partners and allies. IC-COE DTE is another example for this type of sharing. Though this model is effective when dealing with more stable conventional warfare, dependencies on prescribed hardware/software/applications suites can prevent the community from adopting more advanced technologies in a timely manner. This reduces overall mission effectiveness, especially when dealing with the dynamic nature of irregular warfare.
Lastly, a federated information-sharing model, such as Joint Worldwide Intelligence Communication System Open Search, is more flexible and dynamic. However, it lacks the richness of some tools. For example, due to certain technical and security concerns, analytics and exploitation tools usually available to each agency’s domain may not be included.
Section 3: EC Current Architecture in the Context of Information Sharing via IC-Cloud
IC-Cloud is another initiative aimed at providing a richer information-sharing environment to the community by partnering members. It is to allow partners to see more proverbial “dots” from sources from partnering members. At the same time, it allows for unique mission- or agency-specific capabilities. In other words, “share all you can share, and keep what you must keep.” EC Shared Cloud Machine (SCM) along with its Community Cloud Interface (CCI) facilitates sharing EC data with the rest of the IC (Figure 1) on a “need-to-share” basis. This is possible because EC shareable data is physically separated from native EC data. The same “need-to-share” principle applies to each and every agency partnering in the IC-Cloud. In other words, like other agencies’ private clouds, EC serves its Department of Defense Intelligence Information System (DoDIIS) enterprise, but it also makes its shareable data available to the rest of the IC by participating in the “public” IC-Cloud through its SCM.
This hybrid information-sharing model contains some elements of a COE model, because the IC-Cloud requires partnering clouds to adopt the SCM configuration-prescribed stack as a condition for participation. On the other hand, the same model fosters federated, inter-agency data sharing via web services. Ideally, users from any agency can go to any SCM on the IC-Cloud, security permitting, to obtain requisite shareable information to connect the “dots.” Unfortunately, it does not do this seamlessly.
For example, within the EC Private Cloud, EC users can obtain fused information from different sources managed by EC. This can be done in one single search via available ozone widgets or other means using discoverable data services specified/presented by DNI specifications. Using this “one-stop shopping” approach, the EC users then seamlessly compile fused results into their intelligence products (Figure 2). At this time, this is not the case with SCMs and the IC-Cloud.
Section 4: Challenges and Opportunities for
Information Sharing on IC-Cloud
4.1 Seamless Sharing Barriers
EC SCM will allow IC partners to retrieve EC shareable information through IC-Cloud. EC users can also obtain information from other partners’ SCMs. At this time, however, there is no practical way to issue the same search across all participating SCMs to obtain fused results in one-stop shopping fashion. Notwithstanding legitimate security and other cultural concerns, this limitation damps the usability of the well-intended information-sharing ideals of the IC-Cloud. Therefore, the IC COE Operational Model (Figure 3) with its two-way domain trust framework offers high hope to the IC, because it will allow users, on a need-to-know basis, to “surf” the IC-Cloud for a rich experience in IC one-stop shopping.
EC architecture (Figure 2), with compatible architecture on its SCM, will be ready for such IC information sharing with relatively minimum changes to the architecture. All shareable data in EC SCM is discoverable through RESTful services, retrievable though Ozone widgets or other means. Its data conforms to the DoDIIS Framework and supports DI2E. Its security will be Protection Level 3 (PL3)-accredited (at Initial Operation Capability) and supports need-to-know. Under IC-COE’s two-way domain trust paradigm (Figure 3), EC shareable data will become seamlessly discoverable and retrievable across the IC-Cloud. Consequentially, one-stop shopping search and result fused, shared widgets and other advanced analytics can be expanded to cover all the SCM nodes on the IC-Cloud, bringing an enriched experience to the IC end users. Best of all, this much-anticipated intelligence-sharing scenario will enable IC analysts to connect the proverbial intelligence “dots.” In more ways than one, the IC-Cloud and its participating SCMs will become increasingly more useful.
4.2 Cross-cultural Knowledge Fine-tuning and Enrichment
Historically, each IC agency developed and fine-tuned its intelligence tools with its own knowledge base. This allowed each IC agency to be in tune with its own culture and modus operandi, thus improving its workforce’s efficiency and effectiveness. Case in point: in semantic searches, analysts and technologists develop and augment ontotologies that capture their knowledge on subject matter of interest. Integrating these ontologies into search engines then allows analysts to expand/fine-tune their intended “hits,” regardless of what data are involved.
Here is a simple example: In an ontology that an analyst uses to perform a semantic search, the concept or term “table” is associated with “desk,” “chair,” “furniture,” etc. The proximity/hierarchy of each of these concepts/terms in relation to other concepts/terms depends on the culture and modus operandi in which the analysts operate. In this case, a semantic search will use the ontology to retrieve more than just “table.” Results with “desk” or “chair” or “furniture” or all of the above may be returned depending upon the search specifications. These kinds of enriched searches have become common must-have tools in the analyst circle. The question then is: How can such ontologies travel with an analyst from one SCM to another SCM to help retrieve information pertinent to the analyst’s knowledge base?
In the current IC-Cloud model (Figure 1), there is no provision to allow such a knowledge base to be made automatically available to analysts when they venture out of their own agency’s data territory into another agency’s realm. If such ontologies cannot dynamically “follow” the analysts as they surf the IC-Cloud for information from sources that pan the cloud, then their ability to perform rich semantic searches can be severely limited. Technically, at this time, sensible ontology portability or harmonization tools are not available. Presently, tools that attempt such portability are neither very useful nor easy nor practical.
Fortunately, EC architecture with its data fusion services layer may provide probable hooks for a dynamic extension from a simple list of registered tags to an expanded list of tags based upon the associations of ontology concepts to registered tags. The expanded list of tags then can be used as search criteria to semantically reach more data, yielding more enriched results than otherwise possible with only registered tags. Since the expanded lists of tags are based upon the analyst’s preferred ontologies, they may preserve the effectiveness of semantic searches that analysts have come to find effective.
In the same manner, other agencies may use their own ontologies to expand search terms to achieve similar results on data on EC SCM or any other agencies’ SCMs. All these scenarios, however, are predicated upon the assumptions that the IC-COE will become a reality and IC-Cloud surfing will be possible through the two-way domain trust scheme.
As agencies start sharing data, it would be reasonable to predict that they will start sharing knowledge encapsulated in their own ontologies. This extended knowledge-sharing scenario, a much more desirable scenario beyond information sharing, may not be far-fetched. It becomes credible when the level of trust between agencies increases through mutually positive experience with the IC-Cloud and its associated benefits. However, shared ontologies are hardly useful or practical on a machine-to-machine basis, unless they all subscribe to the same frameworks and standards. The contents of ontologies may be different, but by using the same framework and syntax, the chance for one organization to navigate another organization’s ontology is entirely possible. Therefore, although the World Wide Web Consortium’s Resource Description Framework, Web Ontology Language and Simple Protocol and RFD Query Language may not constitute the most sophisticated ontology framework, they are widely used and will be improved as more people use them. It is common sense to adopt something that is already a standard.
4.3 Minimizing Data Duplication
In the current the IC-Cloud model, certain subsets of EC native data and systems are somewhat duplicated on the EC SCM. This allows safe sharing with the rest of the IC, eliminating the risk of unauthorized network jumping into a provider’s non-shareable repository. The same is likely to be true for other partners’ SCMs. The amount of redundant data, however, can be staggeringly large. Over time, especially with the influx of a massive amount of non-structured data, the duplicated data volume can easily be in petabytes if not exabytes. This can be the case even within a single agency. As the demand of sharing data will likely increase as the IC sees the usability of the IC-Cloud, the amount of data can become an increasingly heavy burden on facility, bandwidth and computing resources. In addition, the synchronization between native data and shareable data on SCMs can be problematic due to the ever-growing volume of data.
The burden of duplicating data across agencies can be even more acute. It is not unusual that many agencies are ingesting shareable data from other agencies for their own uses and then, in turn, making these available to other agencies to use. Case in point: EC is ingesting its own data and U.S. Army Intelligence and Security Command-Intelligence Community Data Layer data plus data from other outside sources, such as COMTEX®. In a stovepipe environment, this situation may not present itself as a problem for analysts, since largely they are limited to their own agency’s data. In a shared environment such as the IC-Cloud, however, duplication of data sources can become an annoyance or even a major distraction to analysts who surf beyond their agency’s enclave. Similar to the old Google search, receiving too many hits of duplicate data simply wastes analysts’ time, reduces their analytical efficiency and effectiveness, and increases their frustration. Thus, receiving too many hits would reduce the usefulness of the IC-Cloud.
There are, however, a few great opportunities, especially in the case of EC, for reducing such redundancy in the IC-Cloud within the IC-COE Operational Model:
• Under the current security constraints, physical separation of data to share and data not to share is a sensible approach. However, there is no reason, security permitting, why members of the same agency cannot access all data – shared and non-shared – through a virtual layer, as if the two sets of data were not separated. For external users, shareable data, discoverable and retrievable from the SCM, is nothing more than a virtual layer of the shareable data physically stored in the EC domain. The IC-COE two-way domain trust route will allow IC users to discover and access EC virtualized shareable data seamlessly. Consequentially, data virtualization (Figure 4), security permitting, eliminates the need to duplicate shareable data from a legacy repository to SCM.
• Another fringe benefit of data sharing is the eventual discovery of duplicate data by users who can search shareable data across agencies. Politics and other concerns aside, this kind of discovery can reduce resources (storage, bandwidth, ingest/administration efforts, etc.) while increasing the efficiency and effectiveness of analysts. Of course, this can also help agencies manage their already reduced budgets without reducing the cloud’s usefulness to the end-users. Over time, hopefully, there will be mutually beneficial agreements to divide source data provisioning tasking equitably and logically to each agency, thus minimizing the need for data provisioning redundancy.
Section 5: Summary and Recommendations
Through EC, SARA is building major stepping-stones toward better information sharing within SARA and with other sister agencies. Undertaking such an endeavor is a monumental task for SARA and for the entire IC. Taking down one barrier at a time, incrementally overcoming technical difficulties and operational concerns, SARA is building an architecture that satisfies the current IC-Cloud Framework, yet is adaptable to a more mature IC-Cloud that supports the IC-COE Operational Model. Yet the road to seamless data sharing will not be free of obstacles anytime soon.
There are three major challenges facing the IC-Cloud and, consequentially, EC architecture. They are (1) seamless data sharing, (2) supporting cross-cultural knowledge fine-tuning and enrichment, and (3) minimizing data duplication.
1. The EC Team is building an infrastructure of tools for seamless data sharing within the SARA/EC space. Similar experience on the IC-Cloud will depend on the implementation of the IC-COE and its two-way domain trust framework. In the near future, however, it is recommended that SARA experiment a bilateral two-way trust framework with another agency in the same fashion as the IC-COE DTE Memorandum of Understanding with National Geospatial-Intelligence Agency.
2. Data knowledge preservation and enhancement are the cutting edge in intelligence. Rather easily, SARA can implement the first increment of this initiative by building several experimental ontologies for a couple of intelligence domains. Then they can use these ontologies to expand the search terms on EC data to simulate dynamic semantic searches. This experiment will not only benefit analysts at SARA, but can also serve as a reference implementation for other IC partners to adopt.
3. Within current security and technical constraints, data duplication reduction via virtualization probably should be one of the priorities SARA must tackle soon. This is necessary because redundancy can be a major drain on already scarce resources. It is advantageous for SARA to achieve these savings for its own benefit and as a reference implementation for other IC partners to adopt.
By doing the above, SARA through EC, will place itself in the forefront of IC information sharing in the quest to connect the proverbial “dots.” Similarly, other agencies sharing cloud initiatives that use similar approaches will be able to make the community quest a reality much sooner.
Tables and Figures:
Figure 1: IC-Cloud Model ()
Figure 2: EC Private Cloud Model ()
Figure 3: IC COE Model ()
Figure 4: Virtual Data Layers ()
References and Notes• DoDIIS Worldwide Conference Denver 2012, http://www.ncsi.com/dodiis12/index.html, April 2012 • Zielecki, Jeff, Intelligence Community – Common Operating Environment (IC-COE), 31 January 2012 • Mitchell, Scott, IC Core Reference Architecture, Overview and Cloud Discussion, 21 October 2011 • Kelly, M.M. and Ngo, P.X., Editors - Chair and Co-Chair, Intelligence Community Metadata Registry Requirements Panel, Final Report, submitted to the IC Metadata Working Group, 20 December 2002 • DoD, DoD Discovery Metadata Specification 4.0.1 -- http://metadata.dod.mil/mdr/irs/DDMS/ , 11 November 2011 • W3C, Resource Definition Framework -- http://www.w3.org/RDF/ 10 February 2004 • W3C, Ontology Web Language (OWL) -- http://www.w3.org/2009/10/owl2-pr , 27 October 2009 • W3C, SPARQL Query Language for RDF -- http://www.w3.org/TR/rdf-sparql-query, 15 January 2008 • Noy, N.F. & McGoiness, D.L., Ontology Development 101, Stanford University, March 2001 • Ngo, Phong, International Editor, ISO/IEC 11179-6, Information technology – Specification and standardization of data elements. Part 6 – Registration of data elements, 01 April 1997.
Phong Ngo, assistant vice president and technical fellow with SAIC, has more than 30 years of software and systems engineering experience, from COBOL to Cloud. Ngo is a nationally and internationally recognized expert in data management and interchange at the American National Standards Institute and International Organization for Standardization levels; past chair of the ANSI-affiliated subcommittee for data engineering; past chair of the U.S. Technical Advisory Group to its ISO counterpart in data management and interchange; and chief architect of several systems engineering projects.
6909 Metro Park Drive, Suite 200
Alexandria, VA 22310
David Fado is an information management professional focused on intelligence system processing. He has a background in system modeling to support analytic workflows, specializing in Unified Modeling Language. He was the lead author on one of the early submissions of the UML profile for Department of Defense Architecture Framework and Ministry of Defence Unified Profile for DoDAF and MODAF. He has supported a number of classified and commercial systems.
3865 Wilson Blvd, Suite 600
Arlington, VA 22203
John Hoecker, a chief systems engineer with SAIC, has more than 30 years of software and hardware systems engineering and integration experience spanning the systems development life cycle, from concepts and requirements management through operations and maintenance.
Hoecker possesses a B.S. E.E. and an M.S. in systems engineering, both from Virginia Tech and has supported DoD, the IC, and civil agencies. In addition, he was an adjunct instructor of mathematics at Northern Virginia Community College.
4001 N. Fairfax Drive, Suite 785
Arlington, Va. 22203
« Previous Next »