SOA Forgot the Data: Composite Data Services
and Data Governance
by Ash Parikh, Ajay
Ramachandran and Premal Parikh
Published May 1, 2007
Summary: This is an introduction to composite
data services, a powerful framework in combination with XML data
management, SOA registries and repositories.
The authors would like to thank Bob Albo, VP of Business
Solutions at Raining Data, and Murty Gurajada, software architect,
XML-Centric Applications and Platforms group at Raining Data, for
technically reviewing this article.
Many organizations today are moving steadily toward implementing
a service-oriented architecture (SOA) for standards-based software
interoperability and business flexibility. However, most forget
about the data integration, governance and management issues
associated with true interoperability until it's too late. As
loosely coupled systems based on SOA begin to interact, data
integration, quality and harmonization issues are exacerbated and
become significant barriers to successful integration efforts. These
problems stem from not treating “data” or “information” as a
critical asset, contrary to the way people, capital equipment and
inventory are viewed. In order for organizations to be successful
with their SOA implementation, they must first recognize data as a
business-critical asset.
According to Gartner, service-oriented business applications
(SOBAs) require a robust set of services that capture, manipulate,
transform and reconcile data and semantics. Data services that
accomplish detailed transaction manipulation and provide a
transparency of business rules, semantic mappings and metadata
management enable the necessary linkage and binding between process
and information when deploying composite applications via SOA
techniques.
The concept of data services is rapidly gaining interest as an
approach for addressing data integration and governance challenges
in SOA. Data services will increasingly become recognized as a
critical component of SOA initiatives.
The authors of this article will introduce readers to composite
data services, a powerful framework in combination with XML data
management, SOA registries and repositories. This framework empowers
SOA and maximizes the governance and accessibility of
information.
Problem: Current SOAs Overlooked Data and Metadata Integration
Take a step-by-step approach to understand the data duplication,
quality and consistency issues that come up when traditional
monolithic architectures evolve to loosely coupled
architectures.
As shown in Figure 1, a traditional monolithic architecture
involves an application interacting with data directly:
Figure 1: Traditional Monolithic
Architecture
As seen in Figure 2, a loosely coupled architecture based on SOA
principles involves an application interacting with data through
business services. Consider a real-world scenario where, if an
application needs to get address data spread across many data
sources, it would interact with a business service that provides
this data by accessing the various data sources.
Figure 2: Loosely Coupled Architecture
While this satisfies the requirement for a loosely coupled
architecture, it introduces several other issues such as performance
and scalability considerations in the SOA. Moreover, consider the
real-world scenario where customer address data is inconsistent
across the data sources. This is a fundamental issue that the
business service might try to resolve using complex business logic
and code. As we know, any business service implementation with even
the slightest amount of complexity can quickly become a nightmare to
manage. Moreover, transparency of the business logic used by the
business service and flexibility for customizing this business logic
typically get left by the wayside.
As interest in SOA grows, questions remain about the role of data
when building composite applications. XML and simple object access
protocol (SOAP) merely lower the barriers to interoperability. It is
still a lot of hard work to adequately resolve issues around
information location, context, meaning and accuracy. A key to
successfully implementing an SOA is to use a flexible framework that
helps organizations with these issues as well as understand and
define the relationships between business-centric and data-centric
services.
Solution: Introducing XML-Centric Data Components and
Information Services
The solution lies in the specialized layer shown in Figure 3
called composite data services that can take care of data and
information inconsistencies, quality, accuracy and harmonization
across heterogeneous sources.
Figure 3: XML-Centric Composite Data
Services
Various classes of business services exist in SOA, including
infrastructure services (which provide low-level functions, such as
messaging, registration and authentication) and business services
(which perform higher-level business functions, such as processing
an order).
A data service is a new class of service which performs
data-centric tasks (such as data access, integration,
transformation, analysis, monitoring, movement, profiling,
enrichment, validation, verification, quality, governance, etc.).
From a technical implementation perspective, like traditional
business services, data services also support the three important
principles of service orientation, namely modularity, well-defined
interfaces and loose coupling.
As shown in Figure 4, atomic low-level data services are combined
to create composite data services to support sophisticated
data-centric requirements such as data governance, master data
management, data cleansing and enrichment. Composite data services
are optimally developed using an XML-centric, data-driven workflow
designer and deployed and orchestrated on an XML-centric workflow
engine.
Figure 4: Composite Data Services
XML-centric composite data services are a compelling feature for
any SOA as they natively handle SOA artifacts which are
predominantly XML-based and when combined with a native XML
database, can flexibly enable sophisticated data integration
features for accessing, cleansing, transforming, mapping,
aggregating and moving data. Additionally, when combined with a
mid-tier write-through cache data service to back-end data sources
with compliance to XA transactions and optimized refresh
policies, composite data services can seamlessly address the
scalability and performance requirements that SOAs demand.
As mentioned earlier, composite data services can be developed to
address a number of typical data-centric activities and challenges
in SOA, as outlined below in Figure 5.
Figure 5: Types of Data Services
If you applied composite data services to the aforementioned
real-world scenario, the address data spread across various data
sources and formats would be composed into clean, accurate
information delivered in a timely fashion to the consuming business
service or SOBA. The support for the graphical composition of the
composite data services affords ease of use and flexibility to even
nontechnical users.
Introducing Data Governance – Needs and Benefits
Data and information assets are currently overlooked in most
governance initiatives. SOA governance, the latest buzzword these
days, is all about the lifecycle management and access controls of
SOA services. While this is a critical aspect of governance, it is
equally important to implement data governance, which has to do with
lifecycle management and access controls for enterprise data.
In an analysis published in November 2006, Gartner analyst David
Newman states, “By 2010, more than 50 percent of early adopter
organizations migrating toward SOA will fail at their first
attempts, due to a lack of rigor in enforcing data governance and
information management policies.”
The complexity and amount of data that an enterprise needs to
harness and manage continues to grow. Information is a vital
enterprise asset and is critical to business success. Consistent and
accurate information is indispensable to enterprise resource
planning systems, customer relationship management applications,
business intelligence tools and certain classes of corporate
documents (e.g., regulatory compliance). Most enterprises make
significant investments in computer systems but limited investments
in ensuring information quality. Information, like a physical asset,
degrades with time and use. It is thus imperative to have a data
governance model that ensures information is leveraged and
consistent across the enterprise.
Some aspects of data governance may be achieved using business
rules; however, the governance model includes important roles and
responsibilities for human participants. Data stewards, data
captains and data governance committees play an important part in
governing processes (e.g., data change and approval process),
enterprise standards (e.g., metadata and policies) and technology
(data governance tools, data-centric workflow, XML databases,
XQuery). The roles played by human participants are often not
full-time jobs, so the enabling technologies must be extremely
flexible, have little to no learning curve and have intuitive and
information-rich user interfaces to support data governance
processes for high-value data.
Composite data services can alleviate the SOA migration risk
described by Gartner through enhanced governance, risk, compliance,
security and quality. Composite data services will have a dramatic
effect on improving enterprise data governance as indicated in
Figure 6.
Figure 6: Data Governance Practices
Comparison
Overview: XML and XQuery Data Components and Services
As shown in Figure 7, overarching business processes and
applications can consume the modular and reusable composite data
services as a part of the business logic by rapidly integrating with
open standard endpoints designed to provide loose coupling. The
figure also represents the recommended technology stack required to
deliver high performance data management through composite data
services in a SOA. This stack is comprised of an XML data management
server for the natural persistence of SOA data, a SOA registry
repository for lifecycle management of all SOA artifacts and a
data-centric workflow infrastructure for collaboration and
governance. Further, the XML data management server enables a
metadata cache for optimized data retrieval, a master data
management repository for a single source of truth and a message
repository for auditing and lineage.
Figure 7: Recommended Technology Stack for Data-Centric
SOA
Most SOA data, artifacts, metadata and messages are XML, so an
XML database offers an ideal repository; XQuery is an ideal language
for SOA data access and manipulation. To support non-XML artifacts,
a good XQuery technology should provide support for querying both
XML databases and non-XML data sources such as RDBMS, file systems,
Web services and Java applications. Such XQuery implementations are
wrapped in Web services and orchestrated into composite data
services to solve the data-centric problems of SOA
implementations.
Composite data services should be easily organized into
taxonomies and persisted in an XML database along with all of the
technical, business and operational metadata that may describe such
services. XML databases fill a critical gap in SOA by providing an
enterprise grade persistence infrastructure that can be used for
developing flexible and highly nested taxonomy and ontology
meta-models as well as data governance, lineage and impact
analysis.
Composite data services empower SOA and data governance by
eliminating the need to write code and abstracting out the high
degree of complexity of data-centric tasks from SOA business
processes. They provide coarse-grained orchestration of multiple
data sources, improved performance, sophisticated search and
discovery, flexibility and standards adherence. They improve
regulatory compliance and enhance visibility and control of
business-critical data and information assets.
References:
- Mark A. Beyer, David Newman, Daniel Sholler and Ted Friedman.
“The Emerging Vision for Data Services: Becoming
Information-Centric in an SOA World.” April 24, 2006.
- David Newman. “EIM Reference Architecture: An Essential
Building Block for Enterprise Information Management.” September
14, 2005.
- David Newman and Ted Friedman. “Data Integration Is Key to
Successful Service-Oriented Architecture Implementations.” October
12, 2005.
- Ivan Chong and Ashutosh Kulkarni. “Enterprise Data Integration
A critical piece of a Service-Oriented Architecture.” February 24,
2006.
- Ash Parikh and Murty Gurajada. “SOA For the Real World.”
November 29, 2006.
Ash Parikh is the director of technology and development for
the XML-Centric Applications and Platform Group at Raining Data
Corporation. He can be reached at ash@rainingdata.com.
Ajay Ramachandran is CTO and vice president of
XML-Centric Applications and Platforms at Raining Data.
Premal Parikh is a lead architect/engineering manager at
Raining Data Corporation. He has more than 13 years of experience in
development of enterprise software products and solutions, which
includes designing and architecting products.