Introduction
Does an Enterprise implementing SOA need a Canonical data Model ?There has been a lot of research going on in the implementation of Common Information Model (CIM) commonly called as Canonical data model (CDM) in an Enterprise SOA.
CDM and its definition
CDM is essentially for an enterprise having disparate applications requiring seamless integration. The problem occurs with an enterprise which is growing continuously requiring constant change in the CDM to include all sets of possible granular Business Objects and relationships that the Enterprise wants to integrate to keep it upto the pace of organizational changes. This causes increase in effort and cost in maintaining those changes. CIM should be completely Application independent and must be either domain oriented or Enterprise scoped.
But what is CIM/CDM?
A simple definition is as follows:
A CIM (Common Information Model) is all about semantic business process integration and is superset of Information objects and relationships that define the complete business semantics of an Enterprise in a granular, application independent, neutral approach backed by Industry standards for an Application-Application (A2A), Business-Business(B2B customers & partners) or an Event Driven (EDA) SOA solution applications needed to agree to do the business. CIM is a completely controlled and totally governed centralized data model that defines the dataflow through a Mediator/BPEL. CIM should be an application-independent and enterprise-specific metadata and complete analysis of domain data should be done before implementing CIM. CIM is substituted with terms like 'Canonical', 'Enterprise Canonical' and Information with 'Data', 'Domain'. CIM, CDM and ECDM - All means the same. For example, every major car company probably has a unique data type representation of a car’s tire throughout the company.
Early adoption of CIM is an ideal approach for any SOA initiative. The main ingredient of SOA i.e reusability can be achieved through CIM. We need to make a time-cost analysis and if your enterprise integration vision has new applications to be integrated or a possible acquisition/merger/expansion that involve ESB, its advisable to start CIM implementation as soon as you can and have the data flow for the new services through CIM-ESB. For enterprises starting the SOA initiative, it's probably a right approach to figure out the need for the CIM now than months later. Once CIM is deployed, enterprise can concentrate on service orchestration and assembly rather than data modeling. As the number of applications and data associated increases, CIM should be a growing structure with granular objects perfectly distinguished and maintained by proper name spacing. Maintenance of CIM through a central governance body (Ex: Center of Excellence in your enterprise) and metadata management repository is a must. As applications to be integrated are expected to grow in future, early CIM adoption gives ROI.
However, in order to integrate the applications, either along the way or at the end of the data-generation processes, we need to use a particularly constrained definition of Canonical Schema: the Enterprise Canonical Message Schema which is a subset of the Enterprise Canonical Data Model that represents the data we will pass between systems that many people feel would be useful. By constraining our message schema to the elements in the Enterprise Canonical Data Model, we radically reduce the cost of producing good data "at the end" because we will not generate bad data along the way. When we pass a message from one application to another, over a Service Oriented Architecture or in EDI or in a batch file, we pass a set of data between applications. Both the sender and the receiver have a shared understanding of what these fields (a) data type, (b) range of values, and (c) semantic meaning. This is where most of the cost of point-to-point integration takes from: creating a consistent agreement between two applications for what the data MEANS and how it will be used.
The schema guidelines must be clearly defined including Versioning, namespaces, and naming and usage of file comments and xsd tags.The entities shared across schemas in a common schema must be defined
Schema versioning is one of the greatest concerns when working with a canonical data model. Once developers start using your CDM it becomes harder and harder to make changes without impacting the current services. Determine a standard on how major and minor version numbers will be used to support backward compatibility. We use minor version numbers to represent changes that are fully backward compatibility and major version numbers to represent changes that require users to modify their code to utilize the new schema.
Common Information Model (CIM) is needed in these cases:
1) When the integration involves increasing number of applications and increasing acquisitions, mergers and ever increasing number of business processes involving a time-cost analysis if all your applications are not fully connected and does not require all possible two way point -point interfaces
2) Need for centralized (governed) dataflow of data flowing in the ESB for improved traceability and uniformity across the enterprise
3) When there is need for reusability in data modeling (schemas)
Ideal Common Information Model (CIM) should have following characteristics:
1) CIM should be an application-independent and industry specific or enterprise-specific metadata following industry standards. Also, backward compatibility and interoperability exists for the future applications to be integrated.
2) You need to establish is your governance model around the CDM. Governance model must assure that:
- Any new version is backward compatible (never delete entities or attributes, any new element is always optional)
- Changes can be propagated in steps and do not require a big bang change
- Changes are documented and communication of the changes reach everyone that uses the model.
3) Before implementing SOA-ESB methodology, perform a complete enterprise business (domain) data and cost analysis and then start services creation.
4) To execute loosely coupled Service Oriented Architecture, have your data tightly coupled to the services using a CIM and allow loose coupling for the services.
5) Make sure all the data that flows in the ESB is accommodated in the CIM for improved traceability.
6) CIM should have accommodations for future scoping and allow space for asking questions (reply back).
7) Make your CIM objects (schemas) granular, application independent, and reusable separated by proper name spacing.
8) The operability of the model and how people make use of it. Architects should review and approve any implementation of the model, assuring that the usage of it is consistent across domains/implementations.
9) Applications should consume the services on a standard way, and if you use CDM, the services should make use of it on the interface.However you should consider implementing service abstraction, allowing that way, that application to make use of your service using a different interface and do not try to force all applications to conform to the same standard interface
Disadvantages of CIM:
1) Additional translation (adapter) layer.
2) Improper implementation of CIM (lacking standards or an incomplete analysis of the domain) might have a ripple effect in your services if the domain datamodel changes.
3) Initial maintenance issues. If the application data and domain data models(CIM) changes frequently, CIM would be a maintenance burden.
4) Time to market factor
5) Heavy Flux of Business Data and Business System Data
6) CIM alone can't address other key data issues (eg:data security, data synch, data aggregation, data Mapping, data) which forces to take other SOA solution like ESB/Broker products
Advantages of CIM:
1) Speed of Integration with introduction of new applications.
2) By using a CDM you can have a single language the people can know the entities on your company, facilitating the conversations between different teams/sub-organizational units. Reduces mappings between applications
3) Transformations are only necessary to and from canonical form, reducing the number of different transformations required to be created
4) Interoperability with future applications.
5) Reduces Data modeling and Schema design effort for every interface to every other interface.Decouples format of data from services, allowing a service to be replaced by
one providing the same function but a different format of data
6) Maintenance in Long-term is economical.
7) Granularity in Enterprise workflow.
8) Traceability of transactions across your Enterprise.
9) This level of abstraction can go as far as hiding the fact that we are using multiple services concurrently by allowing us to make routing decisions at runtime.
The integration projects required to have a canonical data model for all messages exchanged between systems to map between the format and semantics of the disparate systems taking part in a SOA solution.The canonical data model was a common format for all messages. If a system wanted to send a message, then it first needed to transform itto the canonical form before it could be forwarded to the receiving system, which would then transform it from the canonical form to its own representation. Basically trying to enforce one Canonical Message Schema across services
All these features are there to help us build service-oriented architectures that are resilient to change and can easily absorb new functionality and services.
Classifications of CIM
SOA in Enterprise Architecture leverages Canonical data model which can be classified as follows:
->An enterprise canonical data model (Abstract model) – this is the model that encompasses the entire enterprise and covers all entities, their attributes and the various relationships they share. This data model does not limit the definition of an entity to a given business function or activity. On the contrary, this provides an exhaustive definition of what each entity signifies for the enterprise via its attributes, behaviors and relationships. The main advantage of this enterprise data model is that as it grows over a period of time, it becomes the single point of reference across the organization; service designers refer to it for building and specifying their contract; service consumers look at it to understand the “common meaning” of the entity and it’s attributes; business refers to this model to understand the complete view of the entity; all parties refer to this common vocabulary when communicating with each other. This model becomes the basic fabric on which all communication is based.
->A context-specific canonical data model (service canonical model)- this is the model that is very context-specific which is the business function context in which this data model is being used. This model will be a scaled down version of the entity and its attributes as they appear in the enterprise model in the light of the business function. Additionally, this model will also apply the appropriate constraints on the entities and their attributes as relevant to the context of the business function.The advantage of this model is that all parties to this service will have a clear definition of what this function will need as input and what can be expected from it in return.
Conclusion
Enterprise implementing SOA should incorporate Canonical data model to make services reusable and One of the most important things an architect also need to do, is to educate the enterprise on the use of the CDM, and the long term advantages of it. People often fail to see the long term and select a more pragmatic approach to solve the problems they are facing without considering the impacts of it down the road. Architect need to assure that your stake holders comprehend the benefits that they will get by applying strong governance on the usage of CDM and the technical people know how to make use of it. If you neglect either of this educational steps, you will find resistance whenever you propose to use the CDM and your CDM program will probably fail.
7) Granularity in Enterprise workflow.
8) Traceability of transactions across your Enterprise.
9) This level of abstraction can go as far as hiding the fact that we are using multiple services concurrently by allowing us to make routing decisions at runtime.
The integration projects required to have a canonical data model for all messages exchanged between systems to map between the format and semantics of the disparate systems taking part in a SOA solution.The canonical data model was a common format for all messages. If a system wanted to send a message, then it first needed to transform itto the canonical form before it could be forwarded to the receiving system, which would then transform it from the canonical form to its own representation. Basically trying to enforce one Canonical Message Schema across services
All these features are there to help us build service-oriented architectures that are resilient to change and can easily absorb new functionality and services.
Classifications of CIM
SOA in Enterprise Architecture leverages Canonical data model which can be classified as follows:
->An enterprise canonical data model (Abstract model) – this is the model that encompasses the entire enterprise and covers all entities, their attributes and the various relationships they share. This data model does not limit the definition of an entity to a given business function or activity. On the contrary, this provides an exhaustive definition of what each entity signifies for the enterprise via its attributes, behaviors and relationships. The main advantage of this enterprise data model is that as it grows over a period of time, it becomes the single point of reference across the organization; service designers refer to it for building and specifying their contract; service consumers look at it to understand the “common meaning” of the entity and it’s attributes; business refers to this model to understand the complete view of the entity; all parties refer to this common vocabulary when communicating with each other. This model becomes the basic fabric on which all communication is based.
->A context-specific canonical data model (service canonical model)- this is the model that is very context-specific which is the business function context in which this data model is being used. This model will be a scaled down version of the entity and its attributes as they appear in the enterprise model in the light of the business function. Additionally, this model will also apply the appropriate constraints on the entities and their attributes as relevant to the context of the business function.The advantage of this model is that all parties to this service will have a clear definition of what this function will need as input and what can be expected from it in return.
Conclusion
Enterprise implementing SOA should incorporate Canonical data model to make services reusable and One of the most important things an architect also need to do, is to educate the enterprise on the use of the CDM, and the long term advantages of it. People often fail to see the long term and select a more pragmatic approach to solve the problems they are facing without considering the impacts of it down the road. Architect need to assure that your stake holders comprehend the benefits that they will get by applying strong governance on the usage of CDM and the technical people know how to make use of it. If you neglect either of this educational steps, you will find resistance whenever you propose to use the CDM and your CDM program will probably fail.