Enterprise Data Fabric (EDF)

From Wikipedia, the free encyclopedia

An Enterprise Data Fabric (EDF) is a distributed, operational data platform that sits between application infrastructures (such as J2EE or .NET Framework) and back-end data sources. It offers data storage (caching), multiple APIs for data access, reliable data distribution and real-time data analysis. All these features are designed with scalability and performance in mind.

Forrester Research provides a broader definition of the features and functions of such a fabric in the report titled Information Fabric

[edit] Why is it relevant

An EDF is relevant in today's information architectures because traditional infratructure tools such as a databases, data warehouses and Enterprise Messaging System cannot handle the real-time needs of today's applications. Most of today's architectures suffer from:

  • High latency and lack of scalability under concurrent loads
  • Lack of effective state management in distributed environments
  • Expensive and inefficient data replication
  • Lack of flexibility in supporting event driven architectures as well as request/reply.

[edit] Fundamental Tenets

1. It’s about operational data management: Unlike a data warehousing system where terabytes (or petabytes) of data is consolidated from multiple databases for offline data analysis, the EDF is a real-time data store specifically optimized for working with operational data subsets needed by real-time applications – it can be referred to as the “right now” data, or the data accessed by many processes and applications. It is a layer of abstraction in the middle tier that collocates frequently used data with the application and works with backend databases behind the scenes.

2. Distributed persistence via distributed caching: An EDF stores data by utilizing main-memory distributed caching, which makes it many times faster than the traditional disk based DBMS. It harnesses the memory and disk across many clustered machines to co-locate data with consuming applications and provide unprecedented data access rates and scalability. Highly concurrent main-memory data structures are utilized to avoid lock contention. Different policies can be applied to different data subsets in different locations, making the data more application-centric as opposed to the other way around and isolating a user from implicit technology characteristics. Persistence becomes an attribute of all parts of the system, not just concentrated in the database. High availability or consistency of data is not compromised, as a configurable policy dictates the number of redundant memory copies to be maintained, and failure detection models built into the distribution system ensures data correctness. The in-memory data layer can be backed with a disk persistence layer that can be configured to receive data synchronously or asynchronously based on the usage scenario.

3. Key database semantics are retained: Quite like a database management system, distributed data in an EDF can be managed with transactional integrity, queried, and recovered from disk. This is unlike simple distributed caching solutions that provide caching of serialized objects and simple key-value pairs managed in hashmaps that can be replicated to your cluster nodes. An EDF also provides support for multiple data models across multiple popular languages – data can be managed as objects, XML documents or as relational tables and accessed via programmatic APIs (such as Java, C++, or C#) or query languages such as OQL, Xpath, and SQL, etc. Unlike a DBMS, where all updates are persisted and transactional in nature ACID, EDF relaxes the constraints allowing applications to control when and for what kind of data you need total ACID characteristics.

4. Active data management: Data in an EDF is a dynamic entity, which changes rapidly and is updated by many processes in a distributed environment. Thus in addition to the request-reply paradigm (ala databases), an EDF supports an event-driven model where applications are notified when events of interest are being generated in the fabric. Such a model is accommodated through a combination of ad-hoc querying (request-reply) and continuous querying (event-driven). In the continuous query model, applications can register queries representing complex patterns of interest. Unlike a database system where queries have to be executed on resident data, in an EDF data (or events) is continuously evaluated by a query engine that is aware of the interest expressed by hundreds of distributed client processes.

5. Messaging like Semantics for Data Distribution: While dealing with data management across distributed applications, developers expect reliable and guaranteed Publish-Subscribe semantics, quite like what is offered by messaging systems in the market. An EDF incorporates these messaging-like data distribution features on top of what looks like a database from a data access/storage standpoint to a developer. The system has knowledge about active subscribers and provides different levels of message delivery guarantees to those subscribers. Unlike traditional messaging where applications have to deal with piecemeal messages, message construction, incorporating contextual information in messages, managing data consistency across publishers and subscribers, an EDF enables a more intuitive approach - one where applications simply deal with a data model (Object or SQL) and subscribe to portions of the data model. When data publishers make updates to the business objects or relationships, subscribers are simply notified of the changes to the underlying distributed data fabric, and they can choose to access the relevant data instantaneously from the fabric.

[edit] External links