EMC ScaleIO
EMC ScaleIO is a software-defined storage product from EMC Corporation that creates a server-based storage area network (SAN) from local application server storage, converting direct-attached storage into shared block storage.[1] [2] It uses existing host-based internal storage to create a scalable, high-performance, low-cost server SAN. EMC promotes its ScaleIO server storage-area network software as a way to converge computing resources and commodity storage into a “single-layer architecture.”[3]
ScaleIO can scale from three compute/storage nodes to over 1,000 nodes that can drive up to 240 million IOPS of performance. Developers can deploy the ScaleIO software on-prem commodity infrastructure or in the cloud and then port their applications back into a production ScaleIO instance.[4] As of September, 2015, ScaleIO is also available from the company bundled on EMC commodity computing servers (officially called EMC ScaleIO Node).[5][6]
ScaleIO can be deployed as storage only or as a converged infrastructure combining storage, computational and networking resources into a single block. Capacity and performance of all available resources are aggregated and made available to every participating ScaleIO server and application. Storage tiers can be created with media types and drive types that match the ideal performance or capacity characteristics to best suit the application needs. It is available for free for testing (with community support[7]) or as a paid-for EMC-supported option.
History
ScaleIO was founded in 2011 by five storage sector veteran technologists, including Boaz Palgi, Erez Webman, Lior Bahat, Eran Borovik, and Erez Ungar.[8] The founders stressed the emerging ubiquity of elastic storage – comparing it to RAM prices – and said it should be able to scale to thousands of nodes. To achieve this scalability, the ScaleIO founders separated the ScaleIO Data Client (SDC) functions from those of the ScaleIO Data Server (SDS).[9] The company was backed by venture capital firms including Greylock Partners and Norwest Venture Partners (NVP).[10]
EMC Corporation bought ScaleIO in June 2013, only about six months after the company emerged from stealth mode.[11] Paying the product little heed initially, EMC began promoting ScaleIO more aggressively in 2014 and 2015, marketing it in competition with EMC’s own storage arrays. Also in 2015, EMC introduced a model of its VCE (company) hyper-converged Server SAN appliance that that supports ScaleIO storage.[12]
At EMC World 2015, EMC announced that ScaleIO will be made freely available to developers for testing purposes. As of May, 2015, developers can download the ScaleIO software with no time or capacity limits, and all the features available in the commercial product.[4]
In September 2015, EMC announced the availability of the previously software-only ScaleIO pre-bundled on EMC commodity hardware, called EMC ScaleIO Node.[13]
Technology and Architecture
EMC ScaleIO converges the storage and compute resources of commodity hardware into a single-layer architecture, aggregating capacity and performance and scaling to thousands of nodes. It combines HDDs, SSDs, and PCIe flash cards to create a virtual pool of block storage with varying performance tiers. It features on-demand performance and storage scalability, as well as enterprise-grade data protection, multi-tenant capabilities, and add-on enterprise features such as QoS, thin provisioning and snapshots. ScaleIO operates on multiple hardware platforms and supports physical and/or virtual application servers.[2]
ScaleIO works by installing lightweight software components on application hosts. Application hosts contribute internal disks and any other direct attached storage (DAS) resources to the ScaleIO cluster by installing the SDS software. Hosts can then be presented volumes from the ScaleIO cluster by leveraging the SDC software. These components can run alongside other applications on any server (physical, virtual, or cloud) using any type of storage media (disk drives, flash drives, PCIe flash cards, or cloud storage).[14]
The ScaleIO architecture is built on two components: a data client and a data server. The ScaleIO Data Client (SDC) is a lightweight device driver situated in each host whose application or file system requires access to the ScaleIO virtual SAN block devices. The SDC exposes block devices representing the ScaleIO volumes that are currently mapped to that host. The SDCs maintain a small in-memory map, being able to maintain mapping of petabytes of data with just megabytes of RAM. The inter-node protocol used by SDCs is simpler than iSCSI and uses fewer network resources.[9]
The ScaleIO Data Server (SDS) is situated in each host and contributes local storage to the central ScaleIO virtual SAN. Each node is part of a loosely coupled cluster.[9]
Throughput and IOPS scale in direct proportion to the number of servers and local storage devices added to the system. The scalability of performance is linear up to at least 128 nodes with regard to the growth of the deployment.[15] [14] Additional storage and compute resources (i.e., additional servers and drives) can be added modularly. Every server in the ScaleIO cluster is used in the processing of I/O operations, making all I/O and throughput accessible to any application within the cluster. Any needed rebuilds and rebalances are processed in the background. Workloads are evenly shared with a parallel I/O architecture.[16]
ScaleIO can be deployed in either a “two-layer” multi-server cluster in which the application and storage are installed in separate servers, or as “hyper-converged” option where the application and storage are installed on the same servers in the ScaleIO cluster, creating a low-footprint, low-cost scalable single-layer architecture. Capacity and performance of all available resources are aggregated and made available to every participating ScaleIO server and application. Storage tiers can be created with media types and drive types that match the ideal performance or capacity characteristics to best suit the application needs.[14]
Storage and compute resources can be added to or removed from the ScaleIO cluster as needed, with no downtime and minimal impact to application performance. The self-healing, auto-balancing capability of the ScaleIO cluster ensures that data is automatically rebuilt and rebalanced across resources when components are added, removed, or failed. Because every server and local storage device in the cluster is used in parallel to process I/O operations and protect data, system performance scales linearly as additional servers and storage devices are added to the configuration.[14]
In tests using NULL devices, each ScaleIO node provided roughly 240K IOPS for a total cluster throughput of over 31 million IOPS. Had these nodes been populated with a single Micron P320 PCIe flash storage device or SSDs, the 128-node system would achieve roughly 28.3 million IOPS.
ScaleIO software takes each data chunk to be written and spreads it across many nodes, mirroring it as well. This makes data rebuilds from disk loss very fast as several nodes contribute their own smaller, faster and parallel rebuild efforts to the whole. ScaleIO supports VMware, Hyper-V, Xen and KVM hypervisors. It also supports OpenStack, Windows, Red Hat, SLES, CentOS, and CoreOS (docker). Any app needing block storage can use it, including Oracle and other databases. While it is not as closely integrated with VMware as Virtual SAN, the SDC functionality has moved into the VMware kernel.[9]
Test Results and Performance Data
- ESG Labs claimed that ScaleIO’s performance of 28.3 million IOPS, achievable with commodity hardware, is magnitudes higher than the best IOPS performance demonstrated by storage vendors to date.[16]
- An eight-node Oracle RAC ScaleIO cluster was able to perform over 800K query SLOB IOPS and 565K mixed query and database update SLOB IOPS. It was also able to perform over 21 GB/sec while performing parallel table scans of the database.[16]
- In addition to high IOPS, ESG noted that the response times seen in the tests were low as well: under 1.2 milliseconds for the mixed query and update testing, and under 0.7 milliseconds for the read-only tests.[16]
- A 53-node cluster performed over 8.5 million IOPS and 114 GB/sec.
References
- ↑ "Introducing ScaleIO Node". EMC Corporation.
- 1 2 "EMC ScaleIO: Software-Defined, Scale-Out SAN" (PDF). EMC Corporation.
- ↑ "EMC Bundles ScaleIO Software, Servers". EnterpriseTech. 2015-09-16.
- 1 2 "ESG Blog: EMC Embracing Freemium and Open Source with ScaleIo and ViPR". Enterprise Strategy Group, Inc. 2015-05-05.
- ↑ "Introducing EMC ScaleIO Node: Loving what’s INSIDE the Box". EMC Corporation. 2015-09-16.
- ↑ "ScaleIO". EMC Corporation.
- ↑ "ScaleIO Node - what's the scoop, and what's up?". Virtual Geek. 2015-09-16.
- ↑ "EMC Buys Israeli Start-Up ScaleIO For About $200 Million In Cash". Jewish Business News. 2013-07-11.
- 1 2 3 4 "How to storm the virtual heights of SAN". The Register. 2015-06-03.
- ↑ "ScaleIO". TechCrunch. 2015-11-04.
- ↑ "ScaleIO joins the pack of pooled storage startups with $12M". GigaOM. 2012-12-04.
- ↑ "Server SAN 2012-2026" (PDF). Wikibon. 2015-07-15.
- ↑ "EMC Introduces ScaleIO Nodes". StorageReview.com, Inc. 2015-09-15.
- 1 2 3 4 "EMC ScaleIO: Proven Performance and Scalability". Enterprise Strategy Group, Inc. 2015-09-01.
- ↑ "EMC Announces "No Restrictions" Download For ScaleIO And Future Enhancements". 2015-05-05.
- 1 2 3 4 "EMC ScaleIO: Transforming Commodity Hardware into Simple, Scalable, High-performance, Shared Storage". Enterprise Strategy Group, Inc. 2014-04-01.
External links
|