Wednesday, May 23, 2018

NBU Duplication and SLP


Things to know about Netbackup duplication
------------------------------------------

- you can duplicate a backup image from cmd or GUI
- by default, restore is being done from the primary copy
- duplication job doesn't show "KB per second" in JAVA console
- from experience, a 35 GB backup took 2 hours and a 32 KB backup took 16
  minutes to duplicate to a DR facility (destination system is a DatDomain w/
  un-aggregated links)
- To duplicate data generally takes longer than to back up data
- Duplication also consumes twice the bandwidth from storage devices than
  backups consume because a duplication job must read from one storage device
  and write to another storage device
- Duplication taxes the NetBackup resource broker (nbrb) twice as much as
  backups
- If nbrb is overtaxed, it can slow the rate at which all types of new jobs are
  able to acquire resources and begin to move data

How duplication jobs are triggered?
-----------------------------------

NetBackup starts a duplication session every five minutes to copy data from a
backup destination to a duplication destination. If a duplication job fails, the
next three duplication sessions retry the job if necessary. If the job fails all
three times, the job is retried every 24 hours until it succeeds.
Duplication occurs as soon as possible after the backup completes.

Concepts about backup service levels
------------------------------------

- service level is based on recovery capability
- Recovery point objective (RPO) is The most recent backup
- Recovery time objective (RTO) is the time required to recover the backup
- RTO of a given backup becomes less critical as the backup ages
- Backup data is at its most valuable immediately after the backup has been made
- Platinum service level = RPO and RTO of 1 or 2 hours --> mission critical
  applications such as order processing systems and transaction processing
  systems
- Gold service level = RPO and RTO of 12 hours or less --> non-critical
  applications such as e-mail, CRM, and HR systems
- Silver service level = RPO and RTO of 1 or 2 days --> non-critical
  applications such as user file and print data, relatively static data
- high cost storage devices are disk, ssds, etc
- low cost storage devices are tapes, virtual tape libraries, etc

Things to know about Netbackup Storage Lifecycle Policy (SLP)
-------------------------------------------------------------

- It is introduced in NBU 6.5
- a Storage Lifecycle Policy is a plan or map of where backup data will be
  stored and for how long
- it automates duplication process and determines how long the backup data will
  reside in each location that it is duplicated to
- when a storage plan changes (e.g., if a new regulation is imposed on your
  business requiring changes to retention periods or the number of copies
  created), you simply need to change a small number of Storage Lifecycle
  Policies, and all associated backups will take the changes into account
  automatically
- after the original backup completes, the Storage Lifecycle Policy process
  creates copies of the image, retrying as necessary until all required copies
  are successfully created
- in practice it is likely that a Backup Policy may have two or three Storage
  Lifecycle Policies covering different types of backup (e.g., daily
  incremental, weekly full, and monthly full)
- a backup policy may have one or more SLPs (e.g one for Daily Incr schedule and
  another one for Weekly Full)
- SLP scheduling is builtin on NBU 7.6

SLP Operations
--------------

- duplication jobs will start as soon as the backup completes (backup then
  duplication)
- by default, SLP checks every 5 minutes for backup images that have recently
  completed and require duplication jobs
- SLP groups batches of similar images together for each duplication job, to
  optimize the performance of duplication (when there is enough data, 8 GB by
  default, to warrant a duplication job, duplication is started)
  -> as an example, see "first_duplication_batch_job.jpg"
- default settings of 5 minutes and 8 GB can be varied by setting values in the
  /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS
- if a duplication job fails to make a copy of an image, that image will be
  added to a subsequent batch of images to be duplicated with the next
  five-minute sweep of images that need to be copied (this is done 3 times for
  a single image)
- after three failures, the SLP will wait two hours (by default) before trying
  to create that copy of that image again (this retry will continue once every
  two hours (by default) until either the user intervenes or the time of the
  longest retention specified for the image comes to pass)
- duplicate copies will not be deleted until if atleast one copy failed to
  duplicate
- In practice, I notice that SLP starts 30 minutes after a daily incremental
  finishes (both triggered and scheduled backup)
  -> reason of this is because we don't have a
     /usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS file in our master
     server
  -> so SLP is using the default values for
     MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB which is 30 minutes

Considerations in setting up Storage Lifecycle Policy (SLP)
-----------------------------------------------------------

1.) It is important to remember that this is not a hierarchical model; it is
    duplicated at the first possible opportunity and occupies all the storage
    locations simultaneously.
2.) In most cases the primary (first) Backup Storage Destination will be a
    high-speed storage device that allows fast restores.
3.) It is not possible to specify the use of the Media Server Encryption Option
    on specific Storage Destinations within a Storage Lifecycle Policy.
4.) A storage destination within a Storage Lifecycle Policy may use either a
    specific Storage Unit or a Storage Unit Group.
5.) It is important to remember this when defining Duplication Storage
    Destinations, as poor design may lead to excessive network traffic and other
    resource contention.
6.) The “Alternate Read Server” setting for a storage destination applies on the
    source destination, not the target destination. This means that the only
    Storage Destination on which the “Alternate Read Server” setting has any
    effect is the first Backup Destination (as this is the source used for all
    duplication).

Setup/Configuration
-------------------

The LIFECYCLE_PARAMETERS file:
/usr/openv/netbackup/db/config/LIFECYCLE_PARAMETERS

MIN_KB_SIZE_PER_DUPLICATION
This is the size of the minimum duplication batch (default 8 GB).

MAX_KB_SIZE_PER_DUPLICATION_JOB
This is the size of the maximum duplication batch (default 25 GB).

MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB
This represents the time interval between forcing duplication sessions for
small batches (default 30 minutes).

IMAGE_EXTENDED_RETRY_PERIOD_IN_HOURS
After duplication of an image fails three times, this is the time interval
between subsequent retries (default 2 hours).

DUPLICATION_SESSION_INTERVAL_MINUTES
This is how often the Storage Lifecycle Policy service (nbstserv) looks to see
if it is time to start a new duplication job(s) (default 5 minutes).

- if this file does not exist, the default values will be used
- not all parameters are required in the file, and there is no order dependency
  in the file
- any parameters omitted from the file will use default values

The syntax of the LIFECYCLE_PARAMETERS file, using default values, is as
follows:
MIN_KB_SIZE_PER_DUPLICATION_JOB 8192
MAX_KB_SIZE_PER_DUPLICATION_JOB 25600
MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB 30
IMAGE_EXTENDED_RETRY_PERIOD_IN_HOURS 2
DUPLICATION_SESSION_INTERVAL_MINUTES 5

No comments:

Post a Comment