Language Integrated Query (LINQ, pronounced "link") is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages.
LINQ defines a set of query operators that can be used to query, project and filter data in arrays, enumerable classes, XML, relational database, and third party data sources. While it allows any data source to be queried, it requires that the data be encapsulated as objects. So, if the data source does not natively store data as objects, the data must be mapped to the object domain. Queries written using the query operators are executed either by the LINQ query processing engine or, via an extension mechanism, handed over to LINQ providers which either implement a separate query processing engine or translate to a different format to be executed on a separate data store (such as on a database server as SQL queries). The results of a query are returned as a collection of in-memory objects that can be enumerated using a standard iterator function such as C#'s foreach.
Many of the concepts that LINQ has introduced were originally tested in Microsoft's Cω research project. LINQ was released as a part of .NET Framework 3.5 on November 19, 2007.
Contents |
The set of query operators defined by LINQ are exposed to the user as the Standard Query Operator API. The query operators supported by the API are:[1]
The Select statement is used to perform a projection on the collection to select either all the data members that make up the object or a subset of it. The SelectMany operator is used to perform a one-to-many projection, i.e., if the objects in the collection contain another collection as a data member, SelectMany can be used to select the entire sub-collection. The user supplies a function, as a delegate, which projects the data members. Selection creates an object of a different type, which has either some or as many data members as the original class. The class must be already defined for the code to be compilable.
The Where operator allows the definition of a set of predicate rules which are evaluated for each object in the collection, and objects which do not match the rule are filtered away. The predicate is supplied to the operator as a delegate.
These operators take a predicate that retrieves a certain numeric value from each element in the collection and uses it to find the sum, minimum, maximum, average or aggregate values of all the elements in the collection, respectively.
IGrouping<Key, Values>
objects, for each distinct key value. The IGrouping
objects can then be used to enumerate all the objects for a particular key value.The Standard Query Operator also specifies certain operators which converts a collection into other types:[1]
IEnumerable<T>
type.IQueryable<T>
type.IList<T>
type.IDictionary<K, T>
type, indexed by the key K.ILookup<K, T>
type, indexed by the key K.IEnumerable
collection to one of IEnumerable<T>
by casting each element to type T
. Throws an exception for incompatible types.IEnumerable
collection to one of IEnumerable<T>
. Only elements of type T
are included.The query operators are defined in the IEnumerable<T>
interface as generic extension methods, and a concrete implementation is provided in the Sequence
class. As a result, any class which implements the IEnumerable<T>
interface has access to these methods and are queryable. LINQ also defines a set of generic Func
delegates, which define the type of delegates handled by the LINQ query methods. Any function wrapped in a Func
delegate can be used by LINQ. Each of these methods return an IEnumerable<T>
, so the output of one can be used as input to another, resulting in query composability. The functions, however, are lazily evaluated, i.e., the collections are enumerated only when the result is retrieved. The enumeration is halted as soon as a match is found, and the delegates evaluated on it. When a subsequent object in the resultant collection is retrieved, the enumeration of the source collection is continued beyond the element already evaluated. However, grouping operations, like GroupBy and OrderBy, as well as Sum, Min, Max, Average and Aggregate, require data from all elements in collection, and force an eager evaluation. LINQ does not feature a query optimizer and the query operators are evaluated in the order they are invoked. The LINQ methods are compilable in .NET Framework 2.0, as well.[1]
While LINQ is primarily implemented as a library for .NET Framework 3.5, it also defines a set of language extensions that can be optionally implemented by languages to make queries a first class language construct and provide syntactic sugar for writing queries. These language extensions have initially been implemented in C# 3.0, VB 9.0 and Oxygene, with other languages like F# and Nemerle having announced preliminary support. The language extensions include:[2]
var
keyword. In VB9.0, the use of the Dim
keyword without type declaration accomplishes the same declaration. Such objects are still strongly typed; for these objects the compiler uses type inference to infer the type of the variables. This allows the result of the queries to be specified and their result defined without declaring the type of the intermediate variables.For example, in the query to select all the objects in a collection with SomeProperty
less than 10,
int someValue = 5; var results = from c in someCollection let x = someValue * 2 where c.SomeProperty < x select new {c.SomeProperty, c.OtherProperty}; foreach (var result in results) { Console.WriteLine(result); }
the types of variables result, c and results all are inferred by the compiler - assuming SomeCollection
is IEnumerable<SomeClass>
, c will be SomeClass
, results will be IEnumerable<SomeOtherClass>
and result will be SomeOtherClass
, where SomeOtherClass
will be a compiler generated class with only the SomeProperty
and OtherProperty
properties and their values set from the corresponding clauses of the source objects. The operators are then translated into method calls as:
IEnumerable<SomeOtherClass> results = SomeCollection.Where ( c => c.SomeProperty < (SomeValue * 2) ) .Select ( c => new {c.SomeProperty, c.OtherProperty} ) foreach (SomeOtherClass result in results) { Console.WriteLine(result.ToString()); }
LINQ also defines another interface, IQueryable<T>
, which defines the same interfaces to the Standard Query Operators as IEnumerable<T>
. However, the concrete implementation of the interface, instead of evaluating the query, converts the query expression, with all the operators and predicates, into an expression tree.[3] The Expression tree preserves the high level structure of the query and can be examined at runtime. The type of the source collection defines which implementation will run - if the collection type implements IEnumerable<T>
, it executes the local LINQ query execution engine and if it implements the IQueryable<T>
implementation, it invokes the expression tree-based implementation. An extension method is also defined for IEnumerable<T>
collections to be wrapped inside an IQueryable<T>
collection, to force the latter implementation.
The expression trees are at the core of LINQ extensibility mechanism, by which LINQ can be adapted for any data source. The expression trees are handed over to LINQ Providers, which are data source-specific implementations that adapt the LINQ queries to be used with the data source. The LINQ Providers analyze the expression trees representing the query ("query trees") and generate a DynamicMethod
(which are methods generated at runtime) by using the reflection APIs to emit CIL code. These methods are executed when the query is run.[3] LINQ comes with LINQ Providers for in-memory object collections, SQL Server databases, ADO.NET datasets and XML documents. These different providers define the different flavors of LINQ:
Sequence
class and allows IEnumerable<T>
collections to be queried locally. Current implementation of LINQ to Objects uses e.g. O(n) linear search for simple lookups, and is not optimised for complex queries[4].XElement
objects, which are then queried against using the local execution engine that is provided as a part of the implementation of the standard query operator.[5][Table(Name="Customers")] public class Customer { [Column(IsPrimaryKey = true)] public int CustID; [Column] public string CustName; }
Customers
and the two data members correspond to two columns. The classes must be defined before LINQ to SQL can be used. Visual Studio 2008 includes a mapping designer which can be used to create the mapping between the data schemas in the object as well as relational domain. It can automatically create the corresponding classes from a database schema, as well as allow manual editing to create a different view by using only a subset of the tables or columns in a table.[7]DataContext
which takes a connection string to the server, and can be used to generate a Table<T>
where T is the type that the database table will be mapped to. The Table<T>
encapsulates the data in the table, and implements the IQueryable<T>
interface, so that the expression tree is created, which the LINQ to SQL provider handles. It converts the query into T-SQL and retrieves the result set from the database server. Since the processing happens at the database server, local methods, which are not defined as a part of the lambda expressions representing the predicates, cannot be used. However, it can use the stored procedures on the server. Any changes to the result set are tracked and can be submitted back to the database server.[7]Microsoft, as a part of the Parallel FX Library, is developing PLINQ, or Parallel LINQ, a parallel execution engine for LINQ queries. It defines the IParallelEnumerable<T>
interface. If the source collection implements this interface, the parallel execution engine is invoked. The PLINQ engine executes a query in a distributed manner on a multi-core or multi-processor system.[15]
|