WinFS
From Wikipedia, the free encyclopedia
- Note: This article title may be easily confused with WinFX.
WinFS was a data storage and management system based on relational databases, developed by Microsoft from 2003 to 2006 for use as an advanced storage subsystem for the Microsoft Windows operating system. It has since been cancelled as a separate product, and some of its technologies will be moved into future versions of ADO.NET and Microsoft SQL Server.[1]
WinFS is the code name of the system, and is short for Windows Future Storage.
Contents |
[edit] Motivation
Many filesystems found on common operating systems, including the NTFS filesystem which is used in modern versions of Microsoft Windows, store files and other objects only as a stream of bytes, and have little or no information about the data stored in the files. Such file systems also provide only a single way of organizing the files, namely via folders and file names.
Because a file system has no knowledge about the data it stores, applications tend to use specific, often proprietary, file formats, so that the data can be interpreted only by the application that created it. This leads to proliferation of application-specific file formats and hampers sharing of data between multiple applications. It becomes difficult to create an application which processes information from multiple file types, because the programmers have to understand the structure of all the files where the source data could reside and then filter out the relevant information. Also, data from multiple applications cannot be easily aggregated. Using common file formats is workaround to this problem but not a universal solution; there is no guarantee that any given application will be able to access the data.
Also, a traditional file system can retrieve and search data based only on the filename, because the only knowledge it has about the data is the name of the files that store the data. A better solution is to tag files with attributes that describe them. Attributes are metadata about the files such as the type of file (such as document, picture, music, creator, etc). This allows files to be searched for by their attributes, in ways not possible using a folder hierarchy, such as finding "pictures which have person X". The attributes can be recognizable by either the file system natively, or via some extension. Desktop search applications take this concept a step further. They extract data, including attributes, from files and index it. To extract the data, they use a filter for each file format. This allows for searching based on both the file's attributes and the data in it.
However, this still does not promote data sharing, as the extracted data is stored in a format specific to the desktop search application, in a format which enables fast searching. Desktop search applications can only find information, but not help users do something with the searched information. Also, this approach does not help to aggregate data from two or more applications. For example, it is nearly impossible to search for "the phone numbers of all persons who live in Acapulco and each have more than 100 appearances in my photo collection and with whom I have had e-mail within last month". Such a search encompasses data across three applications – address book for phone numbers and address, photo manager for information on who appears in which photo, and the e-mail application to know the e-mail acquaintances.
WinFS solves this problem by using attributes to describe the data in files and the relation of that data with other data. It does not use the artificial organization of file names and locations. By creating a unified datastore, it promotes sharing and re-use of data between different applications. Any application, such as the file browser, can understand files created by any application. Addition of attributes gives further meaning to the data, such as which persons appear in which pictures, "the person an e-mail was addressed to" as referred to in the example. But, instead of considering the pictures, e-mails and files, WinFS recognizes picture, and e-mail to be specific types of data, which are related to person, using the relation "of some person". So, by following the relation, a picture can be used to aggregate e-mails from all the persons in the picture and, conversely, an e-mail can aggregate all pictures in which the addressee appears. WinFS extends this to understand any arbitrary types of data and the relations that hold them together. The types and relations have to be specified by the application that stores the data, or the user, and WinFS organizes the data accordingly.
[edit] Uses
Organizing the data that is in files by its relationships, as WinFS does, has various uses, including:
- Integrated storage - One example scenario is the Integrated Storage Conception that helps to reuse data. This feature will be of great use for businesses, by allowing it to automatically aggregate data from different departments.
- Full text search - A second possible scenario is a full-text search that works with items fields - the rich filters feature. By making use of the fact that any application's data files can be used by any other application, searches can be made to encompass the contents of the file as well, rather than just its attributes.
- Advanced search and data aggregation - WinFS provides an opportunity to create rich and custom made search queries, such as to find “all persons whom I called last weekend”.
- Data mining - WinFS can also give more information about data, by using data mining techniques and applying rules to the data, thus helping to uncover new information. This scenario is intended to use in development of expert systems.
[edit] Development
The development of WinFS is an extension to a feature which was initially planned in the early 1990s. Dubbed Object File System, it was supposed to be included as part of Cairo. OFS was supposed to have powerful data aggregation features. But the Cairo project was shelved, and with it OFS. However, later during the development of COM, a storage system, called Storage+, based on then-upcoming SQL Server 8.0, was planned, which was slated to offer similar aggregation features. This, too, never materialized, and a similar technology, Relational File System, was conceived to be launched with SQL Server 2000, but as SQL Server 2000 ended up being a minor upgrade to SQL Server 7.0, RFS was not implemented. But the concept was not scrapped. It just morphed into WinFS. WinFS was initially planned for inclusion in Windows Vista, and build 4051 of Windows Vista, then called "Longhorn", given to developers at the Microsoft Professional Developers Conference in 2003, included WinFS, but it suffered from significant performance issues. In August 2004, Microsoft announced that WinFS would not ship with Windows Vista; it would instead be available as a downloadable update after Vista's release.
On August 29, 2005, Microsoft quietly made Beta 1 of WinFS available to MSDN subscribers. It worked on Windows XP, and required the .NET Framework to run. It was refreshed on December 1, 2005 to be compatible with version 2.0 of the .NET Framework.
WinFS Beta 2 was planned for some time later in 2006, and was supposed to include integration with Windows Desktop Search, so that search results include results from both regular files and WinFS stores. However, on June 23, 2006, the WinFS team at Microsoft announced that WinFS would no longer be delivered as a separate product. Program Manager Quentin Clark wrote in a blog entry that, due to customer feedback, parts of the WinFS technology, specifically Entities and unstructured data, would be rolled into future versions of ADO.NET and Microsoft SQL Server.[1]
[edit] Architecture
WinFS stores data in virtual locations called stores. A WinFS store is a common repository where every application will store its data, along with its metadata, relationships and information on how to interpret the data. In this way, WinFS does away with the folder hierarchy, and allows searching across the entire repository of data.
WinFS store is actually a relational store, where applications can store their structured as well as unstructured data. Based on the meta-data, type of data, and also the relationships of the data with other data as will be specified by the application or the user, WinFS will assign a relational structure to the data. By using the relationships, WinFS aggregates related data. WinFS provides a unified storage but stops short of defining the format that is to be stored in the data stores. Instead it supports data to be written in application specific formats. But applications must provide a schema that defines how the file format should be interpreted. For example, a schema could be added to allow WinFS to understand how to read and thus be able to search and analyze, say, a PDF file. By using the schema, any application can read data from any other application, and also allows different applications from writing in each other’s format by sharing the schema.
Multiple WinFS stores can be created on a single machine. This allows different classes of data to be kept segregated, for example, official documents and personal documents can be kept in different stores. WinFS, by default, provides only one store, named "DefaultStore". WinFS stores are exposed as shell objects, akin to Virtual folders, which dynamically generates a list of all items present in the store and presents them in a folder view. The shell object also allows searching information in the datastore.
WinFS is not a physical file system; rather, it provides rich data modeling capabilities on top of the NTFS file system. It still uses NTFS to store its data in physical files. WinFS uses a relational engine, which is derived from SQL Server 2005, to provide the data relations mechanism, as the relation system in WinFS is very similar to the relation system used in relational databases. WinFS stores are simply SQL Server database (.MDF) files with the FILESTREAM attribute set. These files are stored in secured folder named "System Volume Information" placed into the volume root, in folders under the folder "WinFS" with names of GUIDs of these stores.
WinFS also allows programmatic access to its features, via a set of .NET application programming interfaces, that enables applications to define custom made data types, define relationships among data, store and retrieve information, and allow advanced searches. The applications can then use novel ways of aggregating data and presenting the aggregated data to the user. Beta 2 of WinFS will also allow a limited use of WinFS data via the ADO.NET API, which is used to access data in a relational database.
[edit] Data storage
A data unit that has to be stored in a WinFS store is called a WinFS item. A WinFS item, along with the core data item, also contains information on how the data item is related with other data. This relationship is stored in terms of logical links. Links specify which other data items the current item is related with. Put in other words, links specify the relationship of the data with other data items. Links are physically stored using a link identifier, which specifies the name and intent of the relationship, such as type of or consists of. The link identifier is stored as an attribute of the data item. All the objects which have the same link id are considered to be related.
WinFS helps in unification of data and thus reduce redundancies. If different applications store data in a non interoperable way, as is the present scenario, data has to be duplicated across applications which deal with same data. For example, if more than one e-mail application is used, the list of contacts must be duplicated across the two. So, when there is any need for updating contact information, it must be done at two places. If, by mistake, it is not updated in one of the applications, it will continue to have outdated information. But with WinFS, an application can store all the contact information in a WinFS store, and supply the schema in which it is stored. Then other applications can use the stored data. By doing so, duplicate data is removed, and with it the hassles of manually synchronizing all instances of the data.
[edit] Data model
WinFS models data using the data items, along with its relationships, extensions and rules governing its usage. WinFS needs to understand the type and structure of the data items, so that the information stored in the data item can be made available to any application that requests it. This is done by the use of schemas. For every type of data item that is to be stored in WinFS, a corresponding schema needs to be provided which will define the type, structure and associations of the data. These schemas are defined using XML.
Predefined WinFS schemas include schemas for documents, e-mail, appointments, tasks, media, audio, video, and also includes system schemas that include configuration, programs, and other system-related data. Custom schemas can be defined on a per-application basis, in situations where an application wants to store its data in WinFS, but not share the structure of that data with other applications, or they can be made available across the system.
[edit] Type system
The most important difference between a file system and WinFS is that WinFS knows the type of each data item that it stores. And the type specifies the properties of the data item. The WinFS type system is closely associated with the .NET framework’s concept of classes and inheritance. A new type can be created by extending and nesting any predefined types.
WinFS provides four predefined base types – Items, Relationships, ScalarTypes and NestedTypes. An Item is the fundamental data object, which can be stored, and a Relationship is the relation or link between two data items. Since all WinFS items must have a type, the type of item stored defines its properties. The properties of an Item may be a ScalarType, which defines the smallest unit of information a property can have, or a NestedType, which is a collection of more than one ScalarTypes and/or NestedTypes. All WinFS types are made available as .NET CLR classes.
Any object represented as a data unit, such as contact, image, video, document etc, can be stored in a WinFS store as a specialization of the Item type. By default, WinFS provides Item types for Files, Contact, Documents, Pictures, Audio, Video, Calendar, and Messages. The File Item can store any generic data, which is stored in file systems as files. But unless an advanced schema is provided for the file, by defining it to be a specialized Item, WinFS will not be able to access its data. Such a file Item can only support being related to other Items.
A developer can extend any of these types, or the base type Item, to provide a type for his custom data. The data contained in an Item is defined in terms of properties, or fields which hold the actual data. For example, an Item Contact may have a field Name which is a ScalarType, and one field Address, a NestedType, which is further composed of two ScalarTypes. To define this type, the base class Item is extended and the necessary fields are added to the class. A NestedType field can be defined as another class which contains the two ScalarType fields. Once the type is defined, a schema has to be defined, which denotes the primitive type of each field, for example, the Name field is a String, the Address field is a custom defined Address class, both the fields of which are Strings. Other primitive types that WinFS supports are Integer, Byte, Decimal, Float, Double, Boolean and DateTime, among others. The schema will also define which fields are mandatory and which are optional. The Contact Item defined in this way will be used to store information regarding the Contact, by populating the properties field and storing it. Only those fields marked as mandatory needs to be filled up during initial save. Other fields may be populated later by the user, or not populated at all. If more properties fields, such as "last conversed date", needs to be added, this type can be simply extended to accommodate them. Item types for other data can be defined similarly.
WinFS creates tables for all defined Items. All the fields defined for the Item form the columns of the table and all instances of the Item are stored as rows in the table for the respective Items. A Relation is stored as a reference to the particular row in the table of the Item, which holds the instance of the target Item with which the current Item is related. All Items are exposed as .NET CLR objects, with uniform interface providing access to the data stored in the fields. Thus any application can retrieve object of any Item type and can use the data in the object, without being bothered about the physical structure the data was stored in.
[edit] Relationships
A data can be related to one more item, giving rise to a one-to-one relationship, or with more than one items, resulting in a one-to-many relationship. The related items, in turn, may be related to other data items as well, resulting in a network of relationships, which is called a many-to-many relationship. Creating a relationship between two Items create another field in the data of the Items concerned which refer the row in the other Item’s table where the related object is stored.
In WinFS, a Relationship is an instance of the base type Relationship, which is extended to signify a specialization of a relation. A Relationship is a mapping between two items, a Source and a Target. The source has an Outgoing Relationship, whereas the target gets an Incoming Relationship. WinFS provides three types of primitive relationships – Holding Relationship, Reference Relationship and Embedding Relationship.
Holding Relationships specify the lifetime of the Target Item. For example, the Relationship between a folder and a file, and between an Employee and his Salary record, is a Holding Relationship – the latter is to be removed when the former is removed. A Target Item can be a part of more than one Holding Relationships. In such a case, it is to be removed when all the Source Items are removed.
Reference Relationships provide linkage between two Items, but do not have any lifetime associated, i.e., each Item will continue to be stored even without the other.
Embedding Relationships give order to the two Items which are linked by the Relationship, such as the Relationship between a Parent Item and a Child Item.
Relationships between two Items can be set programmatically either when the Items are created, or the user can use the WinFS Item Browser to manually relate the Items. A WinFS item browser can also graphically display the items and how they are related, to enable the user to know how their data are organized.
[edit] Rules
WinFS includes Rules, which are executed when certain condition is met. WinFS rules work on data and data relationships. For example, a rule can be created which states that whenever an Item is created which contains field "Name" and if the value of that field is some particular name, a relationship should be created which relates the Item with some other Item. WinFS rules can also access any external application. For example, a rule can be built which launches a Notify application whenever a mail is received from a particular contact. WinFS rules can also be used to add new properties fields to existing data Items.
WinFS rules are also exposed as .NET CLR objects. As such any rule can be used for any purpose. A rule can even be extended by inheriting from it to form a new rule which consists of the condition and action of the parent rule plus something more.
[edit] Access control
Even though all data is shared, everything is not equally accessible. WinFS uses Windows’ authentication system to provide two data protection mechanisms. First, there is share-level security that controls access to your WinFS share. Second, there is item level security that supports NT compatible security descriptors. The process accessing the item must have enough privileges to access it. Also in Vista there is the concept of "integrity level" for an application. A higher integrity data cannot be accessed by a lower integrity process.
[edit] Data retrieval
The primary mode of data retrieval from a WinFS store is searching for the required data and enumerating through the set of Items that has been returned. WinFS also supports retrieval of the entire collection of Items that is stored in the WinFS store or returning a subset of it which matches the criteria that has been queried for.
WinFS makes all data available as CLR objects. So the data retrieved, which is encapsulated as an object, has intrinsic awareness of itself. By using the abstraction provided by use of objects, it presents a uniform interface to hide its physical layout and still allow applications to retrieve the data in an application-independent format, or to get information about the data such as its author, type, and relations.
For each Item that has been returned, WinFS can also return a set of Relations which specify the Relations the Item is involved in. WinFS can return all the relations of the Item or can return Relations that conform to a queried criterion. For each pair of Item and Relation, WinFS can retrieve the Item which forms the other end of the Relation. Thus, by traversing the Relations of an Item, all the Items that are related with the Item can be retrieved.
[edit] Search
The WinFS API provides a class called the ItemContext class, which is used to search and retrieve WinFS Items. The criterion for the search is specified using an OPath query string, which is derived from SQL and XPath, and optionally the type of Item, such as a Picture, being searched for. All matching WinFS entries can either be searched one by one, or a collection of all matches can be retrieved. The advantage of using the former approach is that the search can be stopped when a required Item is found. The latter approach is useful when it is necessary to display all the matches, as in a Virtual folder or similar system.
Any WinFS Item has two properties, the IncomingRelationships and the OutgoingRelationships properties, which return a collection of the Incoming and Outgoing Relationships of the Item. Through either Incoming Relationships or Outgoing, the other Item can be accessed. For example, when a Picture is retrieved, which has a PersonInPicture field and has Outgoing Relation to a Contact, the relationship can be used to retrieve the address of the contact:
picture.ContactRelation.Contact.address
An OPath search query can specify a single search condition, such as "title = Something'", or a compound condition such as "title = 'Title 1' || title = 'Title 2' && author = 'Someone'". It even supports wildcard conditions, such as "title LIKE 'any*'". OPath queries can also be used with relations to find related data. For example, for the query "find addresses of all people whose pictures I have and whose name starts with 'A'", first all pictures are searched whose subject has a name that starts with 'A', and traversing via the relationship with a contact, we make sure that the name of the contact matches with the name of the person in the picture. The search will retrieve two sets of results, one for a set of pictures which have people whose names start with 'A', and a set of contacts whose names start with 'A'. The two set of results are joined according to a criteria which matches the name of a contact with the name of a person in a picture. The resulting set of contacts is what required. For each contact in that set, the address is retrieved, as in the query:
Query = "PersonInPicture = 'A*' && Contact.Name = 'A*' && ContactRelation.Name = PersonInPicture"
Different relations specify a different set of data. So when a search is made which encompasses multiple relations, the different sets of data are retrieved individually and a union of the different sets is computed. The resulting set will contain only those data items which correspond to all the relations.
Internally the data stored is structured according to the relationships, by using different techniques such as sorting, hashing and indexing. The resulting structure optimizes searching and accessing the data items through the relationships. So searching of data is very fast, nearly instantaneous.
[edit] Data sharing
WinFS allows easy sharing of data between applications, and among multiple WinFS stores, which may reside on different computers, by copying to and from them. A WinFS item can also be copied to a non-WinFS file system, but unless that data item is put back into the WinFS store, it will not support the advanced services provided by WinFS.
The WinFS API also provides some support for sharing with non-WinFS applications. WinFS exposes a shell object to access WinFS stores. This object maps WinFS items to a virtual folder hierarchy, and can be accessed by any application. WinFS data can also be manually shared using network shares, by sharing the legacy shell object.
Non-WinFS file formats can be stored in WinFS stores, using the File Item, provided by WinFS. Importers can be written, to convert specific file formats to WinFS Item types.
In addition, WinFS provides services to automatically synchronize items in two or more WinFS stores, subject to some predefined condition, such as "share only photos" or "share photos which have an associated contact X". The stores may be on different computers. Synchronization is done in a peer-to-peer way; there is no central authority. A synchronization can be either manual or automatic or scheduled. During synchronization, WinFS finds the new and modified Items, and updates accordingly. If two or more changes conflict, WinFS can either resort to automatic resolution based on predefined rules, or defer the synchronization for manual resolution. WinFS also updates the schemas, if required.
[edit] References
- Sean Grimaldi (December 2005). The WinFS Files: Divide et Impera. MSDN. Microsoft. Retrieved on 2006-05-22.
- Thomas Rizzo, Sean Grimaldi (October 18, 2004). An Introduction to "WinFS" OPath. MSDN. Microsoft. Retrieved on 2006-05-22.
- Thomas Rizzo (March 17, 2004). WinFS 101: Introducing the New Windows File System. MSDN. Microsoft. Retrieved on 2006-05-22.
- Shawn Wildermuth (March 2004). A Developer's Perspective on WinFS: Part 1. MSDN. Microsoft. Retrieved on 2006-05-22.
- Shawn Wildermuth (July 2004). A Developer's Perspective on WinFS: Part 2. MSDN. Microsoft. Retrieved on 2006-05-22.
- Shishir Mehrotra (September 2005). "WinFS" Future Directions: An Overview. Professional Developers Conference 2005 presentations. Microsoft. Retrieved on 2006-05-22.
[edit] See also
- Desktop organizer
- Relational Database Management System
- Storage, a storage management system for GNOME desktop