Content Integration

From Wikipedia, the free encyclopedia

Content integration is an emerging (fall 2005) discipline at the boundary of enterprise integration and content management. There are two major flavors of content integration. An enterprise-centric approach involves the use of content from multiple sources within enterprise software systems and formal workflows and business processes. This co-exists, with the world of social software and content syndication, where the desire to share content and create new contexts is pushing the adoption of RSS, folksonomies, and composite applications. To date there has been little interaction between these two streams, though that may change as organizations seek to leverage the power of new forms of software and social organization.

In the recent past content integration was a marginal activity, as the systems used to create and deliver content were static (such as a book or film) or self-contained (such as a website that used content from within its own repository). In the social world content syndication and in the corporate world enterprise integration are driving the adoption of content integration.

Contents

[edit] Current Drivers

Content syndication (primarily using RSS and Atom) and the closely associated phenomena of blogs (weblogs) and other social software have made content sharing much more common. Content, or at least content access, is thereby likely to migrate across many different systems with the same content appearing in many contexts.

Enterprise integration and the increasing use of content within business applications is the other trend driving content integration. As the focus within enterprise integration moves from data and applications to business processes and service oriented architecture (SOA) content is becoming an important part of integration. Content created and hosted within one application frequently needs to be accessed from another application and data about this use sent to yet other applications. For example, a learning resource hosted in a learning management system might be accessed from within a call center support application, and data about the use sent to a performance management system and to the human resources information system.

[edit] Future Drivers

Ambient and mobile applications will extend content integration as will the adoption of workflow and business process management software. Ambient and mobile applications will require content from many sources to be shared across many applications, some of which are yet to be imagined, let alone implemented. Workflow management applications, which are intended to provide a unifying interface (application interface and user interface) for workflows that draw on multiple applications, will also require that content be accessed across many different applications and provided. Another development, closely related to content integration, is that of composite applications. Composite applications combine functionality or information from two or more applications. There are many ways to achieve this, through a service oriented architecture and web services, or through direct hacks into websites and APIs, or through workflow management approaches.

[edit] What is ‘content’?

The above discussion relies on a distinction between ‘content’ and ‘data’. The distinction is problematic on many levels, but useful. For the purposes of discussing content integration, ‘Content’ can be understood as ‘data in a format that can be understood by humans and that has enough internal organization so as to be meaningful without external context.’ Machine code is not ‘content.’ Relational data, without the database schema that defines it, is not ‘content.’ A text, an image, a diagram or model, and a video or audio are all ‘content.’ An XML file with all of the tags visible is ‘content’ but it is also something else, perhaps ‘structure’ or ‘explicit structure.’ This is an area where a great deal of thought and disambiguation is required.

It is sometimes useful, when designing content integration systems, to think of content as existing along a continuum of static content, dynamic content (content that is assembled when required from collections of content), and content applications such as simulations and games. Another useful dimension, especially in regard to social software, is the multiplicity of authorship (how many people were engaged in creating and editing the content), from a single identified individual to a small group, to a community. Also important is the link structure, with measures of internal and external linking important factors that shape the best approach to content integration. Wikipedia itself is a good place to study these questions.

[edit] Relationship to content management

Content management vendors have often struggled to differentiate themselves from database vendors. Database vendors see content as just one more form of data, to be stored in databases as BLOBs (binary large objects), or in more sophisticated approaches, as data structures in a mark-up language such as XML or as a collection of objects (but there is no well developed approached to content objects). Content management vendors have successfully differentiated themselves from database vendors by providing explicit support for the content management lifecycle and for the different roles involved in content management. In many cases, content management vendors support several different underlying databases and have established a position independent of the database vendors.

In the same way, content integration is defining a value proposition distinct from content management by providing explicit support for the content integration cycle, for content integration roles, and by treating content management systems as one of many types of content sources.

[edit] Content Integration Cycle

Content integration depends on five key actions: Locate, Assemble, Package, Deliver, Communicate, all of which depend on Context.

Locate: To find relevant content.

Assemble: To arrange content, often from different sources, into meaningful organizations.

Package: To format content in the way required by the application that will be used for delivery, and frequently to provide some form of communication mechanism to allow the content package to send and receive information to other applications.

Deliver: To actually deliver the content within the context of a specific application or workflow.

Communicate: Content packages often need to communicate data. This data is sometimes integrated into the content or it may be used to control navigation paths through the content. The data communication sometimes concerns the use of the content and is sent on to other applications.

Note that there are two fundamental design patterns (computer science) used in content integration. In the first approach the content files are actually moved between applications. This could be called a content transfer pattern and it is related to the Extract-Transform-Load (ETL) pattern found in enterprise integration. The second is a content connection pattern, in which the content is not actually moved and which it is left in its native format, with a content integration layer providing a virtual transform. This pattern is related to the RESTful (REST) approach.

[edit] Context

Content integration depends on making context explicit. For content integration, context can be understood in terms of content structure, content semantics, applications and services, content use, the user and the users current task.

Content structure refers to the organization of the content in to sections and the various apparatus that are used to navigate through the content such as the table of contents and the index. Wikipedia’s use of sections to create a table of contents is an example of this. This kind of content structure is most often made explicit using XML.

Content semantics can be made explicit using metadata (descriptions of the content for use by either humans or software), by indexing the content or by a representation of the link structure: internal links, links out and links in. The W3C’s Resource Description Framework or RDF was designed for this kind of description and many standards for metadata re enjoying adoption, especially Dublin Core and the IEEE LTSC LOM (Learning Technology Standards Committee Learning Object Metadata). At this time (fall 2005) content indexing is either called out into the document’s index or is built and used by full text search engines (an open source example of which is Lucene).

Applications and services semantics refers to the different applications implicated in content integration: the application used to create the content, the application used to store the content, the application used to serve the content, the application used to locate the content, the application used to assemble the content, the application in which the content is used, and applications that communicate with the content. All of these should be described using metadata. There are no well accepted, standard ways to this, but in regards to software as a service, web services are being developed to describe each service. Advanced content integration applications will no doubt requires semantic context such as that proposed in OWL-S.

Use is an extremely valuable form of context for many content integration applications. Metadata can describe how a piece of content has been used in the past, who has used it, and what the outcome has been.

Information about the user is used authorize access to content, to personalize the content, to track use and so on. At this point content integration intersects with identity management systems and security. In the future we may well see people using rich personal ontologies (in some cases under the individuals control and in other cases created and controlled by third parties) as a way to control content integration systems.

The user’s current task is often used to determine what content is relevant. This is an important part of workflow management and business processes and is also used in context-sensitive help systems and electronic performance support systems (EPSS).

Context is often organized in terms of general frameworks or ontologies. For example, in learning content integration competency models are often used to describe content resources, to conduct skills gap analysis, to align learning management systems with performance management systems and so on.

[edit] Roles

There are five explicit roles in content integration. Mature content integration systems will provide explicit support for all five of these roles.

Content Manager: The person responsible for the content. This can be broken down further into the person who sets the business rules around accessing the content, the person who describes (applies metadata) to the content and the person who recommends the content.

Application Manager: The person responsible for the application through which the content is being accessed. In most cases, the application administrator is in a position to control what content is accessed through the application, and therefore requires explicit support in the content integration system.

User: The person who actually uses the content. This person is often not provided with an explicit role within the content integration system (their interaction is with the application through which the content is accessed), but it is important to understand their needs and how they will interact with the content and any communication (data) that use of the content generates. Perhaps content integration systems should provide explicit support for this role.

Integration Manager: The person responsible for integration or the applications used in integration. As with the content manager role, this role can be broken down further into the locating the content, assembling the content, implementing the content communication scheme and packaging, and to monitoring performance of the content integration system and the content integrated.

Analyst: Many content integration systems are implemented in order to better analyze data about content use, and there is therefore a role on the system for an analyst. This role is sometimes supported by connecting the content integration system to a business information system that supports OLAP and other analytic tools.