top-line

Selecting a VTL/ “De-dupe” solution for your business

Print
PDF

 

Introduction


As data volumes aggressively grow, companies are facing some major challenges in the last few years.  Some of these key challenges are evolving around: data protection, regulatory compliance, data management, infrastructure costs, the limitation of space in the data center, global operations support, and more.

Two major technologies that emerged in recent years, to address these challenges are: Virtual Tape Libraries (VTL) and De-Duplication (De-dup).   These technologies can be deployed separately or jointly to deliver measurable value to your organization. With the growing need to resolve the aforementioned, the storage market is becoming increasingly dense with solutions that range from hardware based appliances, to storage devices with an integrated software solution, to software only solutions.  The significant increase in VTL and de-duplication offerings makes it ever so challenging for customers to identify and select the appropriate solution for their enterprise. Customers that are trying to make a decision solely based on technology will more often than not find themselves confused by the VTL and De-dup nomenclature and have difficulty rationalizing their decision.



The logical approach we are proposing in this document is to take you through the following three steps:

 

  1. Define the problem statement (i.e. what problem are you trying to solve?); why?; what elements are critical for resolving this problem?
  2. Leverage a non-partisan analysis of the different VTL and De-dup technologies available on how to categorize the vendors based on each offering and feature set.
  3. Correlate your organizations functional requirements to the solutions being offered.

 

Define Key Objectives


Below is a sample matrix of common enterprise goals, the associated business drivers and the quantitative and qualitative result for the I.T. environment.  This matrix will help you define the key objectives for your VTL/De-duplication project.  It should also help with translating the business needs into functional requirements which can be then correlated to a specific solution.

Goal Drivers
Quantitative & Qualitative Result
Reduce cost
  • Backup hardware cost
  • Maintenance cost
  • Tape media cost
  • Labor cost
  • Power and cooling cost
  • Real Estate cost
  • Reduce tape footprint
  • Scalability
  • Simple to manage (Low Complexity)
  • Reduce backup volumes
  • Small form factor
  • Low power consumption
  • Investment protection
Reliability and availability Compliance
•    Reduce downtime
•    Improve backup & restore success rate
•    Disaster recovery
Replication
•    Reporting
•    Highly reliable backup media
•    Long-Term Vendor Strategy
Customer satisfaction
Improve backup &
•    Restore performance
•    Improve Service Level Agreements (SLA’s)
Sustain line rate performance
•    Faster backup and restores
•    Tight integration with the backup software

 

Technology Overview

 

Virtual Tape Library (VTL)


VTL technology presents a disk based storage device as a logical tape library that can be used by the backup software to perform backup and restores. A typical VTL solution will have a storage component and software component that is used for the tape library emulation. A typical VTL solution can be configured as a tape target which can be accessed through a fiber channel connection that is similar to a conventional tape solution.  Alternatively it can be configured as a disk target to the backup software which can be accessed via CIFS or NFS over the local area network.

The main benefit of a VTL solution is the ability to use disk media which is faster, more reliable, scalable, and easier to manage then a conventional tape solution. Additional feature sets such as built in replication support and tight integration with the main stream backup software solutions makes it very appealing to many organizations that want to reduce their tape footprint in order to improve SLA’s, and Recovery Point Objectives (RPO’s)  while reducing backup operation cost.

The disadvantages of VTL are that it replaces the conventional tape infrastructure with yet another tape management system. Although it’s much more reliable and faster, you still have to mess with tape barcodes; tape drives creation and scaling; virtual tapes and slots.
Configuring the VTL device as a disk target that can be accessed via NFS or CIFS will be difficult to scale in big environments that have multiple master/media servers and strong security policies.

VTL and Disk target topology

VTL topology image

De-duplication Overview


The term De-duplication refers to eliminating redundant copies of data on the storage device. In essence de-duplication is single instance storage. The main benefit of this technology is that it increases the available space on the storage and enables longer data retention policies. In the core of all the de-duplications technologies there is an algorithm that identifies duplicate objects and redirects reference pointers. Referencing several identical objects to a single master saves storage space that otherwise would have been occupied by the duplicate instances of the same data. The degree of storage space that can be reclaimed back is a function of the data change rate and the efficiency of the de-duplication algorithm. When implemented in a backup environment de-duplication rates will average around 10-20x and can be significantly higher in an environment with low change rates.  There is a variety of designs that can be implemented for de-duplication including: hashing, indexing, fixed and variable object length, inline and post processing de-duplication. Each aspect of the design can determine the resiliency of the solution, as well as, the speed and efficiency. How do you choose the right technology then? The key is to understand how each of the different design aspects can impact your storage environment and how it will help you to achieve your key objectives. As an example for some environments performance is not as big a concern as storage space reduction. In this case inline and post-processing considerations may not be applicable to the environment.

De-duplication Approaches

 

  • Hash-based Comparison

The hash-based approach breaks data into chunks and assigns a number (called a hash) to each chunk. It keeps a record of all of the hashes in an index. To find duplicate data, it compares the new incoming hashes to hashes that have already been stored in the index. If a new hash is not already in the index, its corresponding data is backed up and the hash is added to the index. If a new hash matches one in the lookup table, the corresponding data is not backed up.


  • ContentAware Comparison approach
The ContentAware approach actually reads the data that is in the backup and identifies commonalities and relationships between the objects/documents (e.g., Microsoft Word document to Word document or Oracle database to Oracle database) to narrow the search for duplicate data. It then examines that data at the most granular (byte) level. 

  • Client-Based/Source-side De-dupe
Each backup client eliminates common data and only sends changed data to B/U server/disk. Reduced data movement between clients, backup servers and storage can cause resource contention on the client.

  • Inline De-dupe
De-dupe backup data inline before it is written to the storage device. Aligns well with hash-based comparison technology and provides s a cost-effective way to reduce datacenter capacity needs. Extra processing is needed to scale performance in big environments.

  • Post-process De-dupe
Implements data de-duplication as a post-backup write process.  Backup data is written to temporary disks space first, then the de-duplication process starts and de-duplicated data is copied to the final disk stage. The Post-process approach requires additional storage capacity to maintain the staging area for the de-duplication process.

 

Vendor Solutions Matrix (1)

 

Solution Type
De-dupe Technology
Investment Protection 
Complexity
Tight integration with the backup application
Long-Term Vendor Strategy
Symantec    
De-dupe Only. Can support Inline on the client side and also Post process on the media server
Can be used with all the major disk hardware solutions in the market. 
Software solution only that needs to be augmented with storage andservers: High complexity to deploy and maintain, can lead to fingerpointing and high solution cost. Newer version is very well integrated with Netbackup. Supports OST.
No Issues.
Data Domain De-dupe & VTL Inline in the de-dupe device Yes.  Can augment any existing tapelibrary solution as a disk target for short to medium retentionrequirements. Can also be used as proxy front-end with some of themajor hardware storage solutions.
Single box solution.
OST support No Issues.
Copan De-dupe & VTL
Post Process in the VTL device
No.  Replace existing tape infrastructure with Virtual tape infra. 
Hybrid solution. De-dupe functionality is provided by a separateappliance. Requires many servers to support a large environment.
No OST support
No Issues.
EMC De-dupe & VTL Inline and post-process in the VTL device
No.  Can only work with EMC storage backend
Quantum appliance and EMC backend. Separate appliance and storage backend can lead to support and compatibility issues.
No OST support
Currently supports multiple product lines in this space. Unclear what will be the long term strategy
Sepaton De-dupe & VTL Post Process in the VTL device Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 No. Replace existing tape infrastructure with Virtual tape infra. 
Hybrid solution. No OST support  
No Issues.
Netapp De-dupe & VTL
Post Process in the VTL device
No.   Replace existing tape infrastructure with Virtual tape infra. Low. Single box solution 
No OST support Netapp currently supports De-duplication as a feature in the filers andalso in their VTL offerings. Not clear if they will keep this strategydown the road.
HP De-dupe & VTL Post Process in the VTL device No.   Replace existing tape infrastructure with Virtual tape infra. Low. Single box solution
No OST support New to this market space. Many features are not G.A yet
IBM De-dupe & VTL
Inline in the de-dupe device Yes.  Can augment any existing tapelibrary solution as a disk target for short to medium retentionrequirements. Can also be used as proxy front-end with some of themajor hardware storage solutions.
Hybrid solution. Diligent appliance with IBM backend. Can lead to compatibility and support issues.
No OST support  
IBM solution is offering software and hardware that they just recentlyacquired from two different companies. There are many question marksregarding the integration level and the support for these products downthe road




(1.)  Data is based on evaluations performed in October 2008

Mapping vendor solutions to the functional requirements


Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 excel-moshe

Conclusion


When you evaluating VTL and De-dupe solutions it’s crucial to first clearly evaluate, define and prioritize your needs. Whether you choose the method suggested in this article or any other method the goal is to translate your needs into functional requirements. Once you did that it should be a straight forward exercise to correlate those requirements to a specific solution offering. This approach will help you to focus on the important design aspects and not getting sidetracked with all the bells and whistles that one solution might be offering.

In the current market economy it can become very challenging to justify any new investment. When it comes to VTL and De-dupe strategy the challenge can be bigger as the ROI mostly evolves around soft cost savings and cost avoidance.  As IT departments are held to greater demands to do more with less a good solution will need to have the following attributes: Provides investment protection for the current infrastructure; maintains a flexible architecture that enables the customer to easily adopt the product in different types of environments and to re-architect easily if the environment is dynamic; delivers a black box solution that can reduce the management overhead, allows for manageable hardware, power and space costs; integrates with the backup software that can provide a rich feature set and helps to consolidate the environment.
Replacing a physical tape environment with a Virtual tape environment can certainly increase performance and reliability but it’s still a “tape-like” environment with a lot of the management overhead of traditional tapes.  The virtual tape environment will still require keeping the traditional tape environment in-place if the customer has a requirement to send tapes offsite. Even with the added benefit of post process de-duplication the complexity factor becomes obvious as the customer now must manage two separate tape environments one traditional and one virtual. Based on our evaluation of the market trends we strongly believe that there will be more justification to go towards smart disk target systems for backup rather then Virtual tape libraries. This became more apparent with the release of Symantec’s Netbackup 6.5 software with Open Storage (OST) support.

Based on our evaluation we strongly believe that Data Domain with their simple turn key solution; a proven record; a flexible architecture that can easily adapt to different architectures such as the VTL device, a disk target over NFS and CIFS.  Additionally, the Data Domain solution offers OST support and support for a proxy device.  For these reasons Data Domain poses all the important attributes necessary for Data Domain to continue their leadership in this market space.  Finally, Data Domain should continue to provide a solid return on investment to customers, even in the most challenging market conditions.

Add comment


Security code
Refresh

  • Demo Image
  • Demo Image