Download EaseTag Tiered Storage Filter Driver SDK Setup File Download EaseTag Tiered Storage Filter Driver SDK Zip File
Introduction to Data Tiering
Not many people would disagree that after cash, data is the lifeblood of a business - and if it is disturbed, so is the ability of the business to function properly. However, many administrators and IT managers fail to recognize the fact that the value of data to an organization is not a constant. In fact, the value of data decreases over time as it loses its relevance, freshness and “popularity”. One question that administrators should be asking themselves is: why should data that is decreasing in value remain in expensive front line storage, subject to the same backup, replication, and recovery policies and procedures as key data? Would it not be useful to have a system or methodology in place for analyzing and tracking data freshness, so that storage space could be made free for more fresh and relevant data, and time / bandwidth consuming data protection policies be relaxed as data loses its value? It is here that the concept of Data Tiering - sometimes known as Information Lifecycle Management or ILM - steps into this gap to try and resolve some of these questions.
First, let us set the stage for this discussion, and ask: what does Data Tiering mean? Data Tiering is a concept that encompasses the discovery, classification, analysis, and maintenance of data, across the entire period of its useful life. It adds structure and context to data, marking the transition from information to data. Data Tiering is a part of the larger concept of Business Continuity Planning, but has become increasingly prominent in the storage arena in recent years thanks to several factors, including advancements in data storage management techniques and the technology that underpins it, and evolution in the storage environment, including:
Coexistence of Fibre Channel and iSCSI (IP Storage) in the data center
SAS and SATA storage coexisting in storage systems
Storage consolidation practices, for reducing the use of solitary “islands of data” in direct attached storage (DAS)
Regulatory requirements for data archiving and recall (SOX, etc.)
Though many vendors offer Data Tiering services or modules as a part of their products, Data Tiering is above all a concept or a strategy, rather than a product. For a practical explanation of what the concept embodies, however, we can safely generalize that typical Data Tiering implementations encompass components such as:
Storage System Performance and Monitoring
Storage Capacity Planning and Management
Business Controls for Data Degradation and EOL
In a Data Tiering configuration, data is analyzed for its value, and stored accordingly. At the peak of its popularity, it is stored in the fastest, most responsive top-tier storage on hand and subject to the most stringent replication and backup controls. Since the Data Tiering system is constantly monitoring the data’s value in comparison to other data, as it loses value it is migrated down the chain to less expensive, less powerful storage where it may not be accessed as frequently, or protected as carefully. In the final stage, it is migrated out of the storage system completely. Data of the lowest value is either purged from the system o or transferred to other media (e.g., written to tape and delivered to offsite storage) depending upon the organization’s policy and regulatory requirements for data end-of-life.
Figure 1: Data Stored and Migrated According to “Freshness”
Reasons for Data Tiering
Having examined how an Data Tiering system can be implemented, we should next look more closely at the reasons why more and more organizations are accepting the need for a comprehensive Data Tiering strategy. While the needs of each organization can be wildly different, typically Data Tiering solutions are intended to solve the problems of rapid data growth, cost (initial and operating costs) of data storage, changes in data accessibility and freshness, and increasingly stringent data backup and retrieval requirements.
In practical terms, the goals of Data Tiering are to manage data growth and accessibility, reduce cost, improve recovery, and reduce hardware and software risk/exposure by introducing new technologies into the storage mix. Let us take a closer look at each of these different problems, to understand why Data Tiering is needed to satisfy each of these concerns.
Exponential Growth of Data
With data growth averaging near 80% to 100% per year, managing storage effectively has become a challenging uphill task. Storage administrators face limited budgets, and are charged with not only expanding capacity by purchasing new hardware wisely to meet projected storage needs, but also optimize the use of existing capacity, in order to maximize the investment in current storage hardware. Moreover, any changes or additions need to be considered carefully - as the downstream effects of new hardware are often unforeseen and can quickly wipe out any short term cost gains.
Data Accessibility / Freshness
As mentioned at the beginning of this paper, data does not maintain a constant value throughout its lifecycle. Instead, that value is constantly changing, due to time, relevance, security, or popularity. Policies and procedures must therefore be set in place to continuously shift and monitor the location (and therefore the accessibility) of data so that information that is in highest demand is in the most accessible location.
The overall cost of a storage system is measured not just in the initial price paid for the hardware and its commissioning. While there is a fixed dollar-per-gigabyte cost in acquiring the hardware, the total operating cost (TOC) includes maintenance, power and cooling expenses, together with the cost to staff and train administrators. As storage arrays grow, power usage (for server operation and cooling) is just one factor that has an enormous impact on the TOC of a storage solution.
If less expensive solutions are available, it is imperative for administrators to devise a careful plan to incorporate these components, within certain restrictions. When possible, additional storage technology should be adopted that does not require significant investment of time and resources to learn its operation. New solutions that are more power- or space-efficient should be integrated into the mix. And when promising, more economical technologies or vendors emerge into the market, sometimes the lower tier of storage can be a good proving ground for the testing of new solutions first before jumping on a new bandwagon.
Ability to Protect and Recover Lost Data
The term Continuous Data Protection (CDP) has come into widespread use to describe various strategies of protecting key data against loss to ensure business continuity in the face of disasters such as power/network outages and natural catastrophes. CDP strategies often incorporate techniques such as data backup, data snapshots and remote replication to do so. Adding to the challenges surrounding data protection are growing regulatory requirements for the preservation and archiving of many different types of corporate data. Data of a particularly sensitive or critical nature must be available for recall within clearly established time limits if circumstances demand it and kept secure as well. Therefore a successful Data Tiering implementation integrates well with the organization’s backup and recovery solutions along several touch points. For example, Data Tiering dictates that as items age they can be taken offline completely and migrated to tape storage, yet due to regulatory requirements some data must still be available for recall, even at this point. Since only a percentage of data has to be protected in this manner, the Data Tiering solution must be flexible enough to manage varying CDP requirements.
The Role of Data Tiering in the Larger Picture of Storage Management
As noted above, the Data Tiering policies that an organization devises must be flexible, in order to integrate itself with the other competing priorities such as data protection, cost control, capacity expansion, and the like. To get a better sense of this interplay, we will look at how Data Tiering functions within the larger perspective of the overall storage management architecture.
From a wider storage management architecture perspective, Data Tiering has an integrated relationship with the practices of Disaster Recovery Planning, and System Resource Management. While Data Tiering is concerned with organizing data based on its popularity or demand, while Disaster Recovery Planning (DR) addresses a storage system’s ability to restore lost data as quickly and completely as possible. System Resource Management (SRM) encompasses a whole range of diverse storage system status and performance analysis metrics to ensure that storage resources are optimized via a number of manual and automatic controls. These three elements interact with other, with changes in one element influencing the other two.
Let’s next look more closely at the tight interaction that exists between Data Tiering, Disaster Recovery, and SRM.
Data tiering refers to the separation of data into different classes according to its value. If we consider that data can generally be divided into groups of high, medium, and low value, then we can understand how and where this data should be stored, migrated, and protected. The illustration below shows us a sample stratification of data into five classes based on its worth to the organization:
Data Value Hierarchy
From this diagram, it is clear that storage should be tiered to reflect the hierarchy of data. For example, data in Classes 5 through 3 would be kept on the a redundant, high performance array, while data in classes 2 and 1 would most likely be stored on a mid-range storage array with lower performance, or possibly even on tape.
Tiered storage can work in several different ways. From the physical perspective, hardware power and sophistication cascades downward, migrating data from more expensive to less expensive storage, as the data loses its freshness and relevance. From a data protection perspective, it could mean decreasing the amount of coverage or the frequency of snapshots taken of this data set as it moves down the chain.
Note, however, that the data hierarchy shown above does not necessary have any connection with access. In other words, data that is not accessed frequently could in fact be mission critical (such as data containing system configuration instructions), while data that is not essential to operation is often frequently accessed (corporate Power Point slides) .
Therefore a combination of objective and subjective measures are necessary when evaluating the value of data in this hierarchy. The value of data could be based on the level of protection it requires, the frequency of its use or access, or whether or not it is required to be maintained for archival or regulatory purposes, and Data Tiering must be sensitive to these sliding, objective and subjective judgments about the value of data in storage.
Disaster Recovery Planning
Once data is separated into these separate hierarchies, its recovery priority can be established. Data at the top of the pyramid should receive the highest level of protection, by being made redundant and marked for immediate recovery, for example. Since lower-end data is not nearly as critical to the organization, it may be determined that days or weeks are an acceptable amount of time before data is restored after a disaster. In fact, low-level data may not even be restored, but simply preserved for regulatory or archival requirements. As Data Tiering migrates data, the shift in disaster recovery priority has to be synchronized with the data’s new location and status.
System Resource Management (SRM)
System Resource Management (SRM) is a term used to describe the mechanism and process of data collection, data usage analysis, and trending in a storage appliance. In brief, SRM performs the following tasks in a storage system:
Discovery of storage nodes
Storage provisioning and capacity management
Performance monitoring and characterization
Backup and recovery management
Presentation and reporting mechanisms
Depending on the hardware and software involved, SRM can be anything from an automatic process managed on homogeneous hardware to one that is completely manual and left to the administrator. However, at a minimum the SRM mechanism should provide information on storage capacity and data access patterns, so that informed Data Tiering decisions can be made based on this relevant information.
Having explained the concepts of Data Tiering, Disaster Recovery, and System Resource Management, it is easy to see their interdependence with Data Tiering. Since Data Tiering analyzes and sets the policies for moving data, migrating data as it changes in value from high to medium to low along the data hierarchy, data relocations made as a result of Data Tiering policies cause shifts in the recovery priorities built around the data.
When expanding capacity, one of the biggest challenges is balancing cost and performance with complexity. A good Data Tiering plan should balance the introduction of more economical, or newer technology (like SATA hard drives) with warnings against making a storage network more complex, with additional vendors, standards, and protocols to support. A good maxim to keep in mind is that the more complex a system becomes, the more expensive it is likely to be to deploy, integrate, optimize, and manage. Rather than thinking simply in terms of adding new, cheaper components, administrators should carefully consider how to use existing capacity more effectively, by improving bandwidth, capacity, and reliability.
We have also seen how Data Tiering has to fit neatly into the larger storage solution, since there are dependent priorities at work. An Data Tiering solution must therefore incorporate and leverage the tight interaction that exists between ILM, Data Tiering, Disaster Recovery, and SRM to be effective. A measured approach such as this will make all the difference in achieving the goals set out for the Data Tiering solution.