Showing posts with label netapp. Show all posts
Showing posts with label netapp. Show all posts

Monday, August 6, 2018

Disk Replacement for NetApp filer (FAS3170, FAS3240)


1. Login to filer and verify the device address of the failed disk via command line

filer02> vol status -f

Note: device address usually has the form ., e.g, 0a.18, 0a.20

2. Physically identify the failed disk by locating an amber light blinking on and off on the lower part of the disk.

3. If you cannot see any amber light, got to advanced mode and turn ON the led by issuing the commands below:

filer02> priv set advanced
filer02*> led_on 0a.19

4. If you still cannot see any amber light blinking, manually turn on the leds on the neighboring disks to locate the faulty disk:

filer02*> led_on 0a.18
filer02*> led_on 0a.20

5. When the location of the failed disk has been verified, remove it by pulling the latch and pulling it partially.
Wait for 2 minutes to let it spin down then remove it afterwards. During this process, you can see the messages below
popping out from the terminal.

filer02*> Wed Mar  5 20:21:23 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 1a link    online.
Wed Mar  5 20:21:23 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 0a link online.
Wed Mar  5 20:21:30 EST [filer02:raid.disk.missing:info]: Disk 0a.19 Shelf 1 Bay 3 [NETAPP          X291_S15K6420F15 NA01] S/N [3QQ0MSBX00009922S2N5] is missing from the system
Wed Mar  5 20:21:54 EST [filer02:config.BadPoolAssign:warning]: Disk 1a.27 is in Pool1 and other disks on this loop/domain are in Pool0. Disks/Interfaces need to be in separate pools for SyncMirror.

6. Insert the new disk by pushing the latch in upward position until it snaps into place. Wait another 2 minutes
to let the synchronization finish. Messages below will appear during this process.

filer02*> Wed Mar  5 20:23:16 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 1a link online.
Wed Mar  5 20:23:16 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 0a link online.

NOTE: sometimes, that message don;t appear

7. The amber light should stop illuminating by now. If not, turn it off manually:

filer02*> led_off 0a.19

8. Go back to admin mode (exit advanced mode) and assign the disk to the filer.

filer02*> priv set
filer02> disk assign 0a.19  --> use the device ID on the line with .. is missing from the system

10. Verify the new disk. You can see below that the new disk replaced the spare disk that was used during the rebuilt of data when the
original disk has failed.

filer02> aggr status -s

Spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block checksum
spare           1a.19           1a    1   3   FC:B   -  FCAL 15000 418000/856064000  420584/861357448
spare           2d.03.8         2d    3   8   SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.9         2d    3   9   SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.10        2d    3   10  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.11        2d    3   11  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.13        2d    3   13  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.16        2d    3   16  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.17        2d    3   17  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.18        2d    3   18  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.19        2d    3   19  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.20        2d    3   20  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.21        2d    3   21  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
spare           2d.03.22        2d    3   22  SA:B   -   SAS 15000 560000/1146880000 560879/1148681096
filer02>


Wednesday, June 20, 2018

Netapp LUNs (7-Mode)


Introduction
------------

- by default,block size is 4096 bytes (4 KB)
- max size that can be created is 16 TB (but in practice, I was only able to
  create a 15.9 TB LUN in Data ONTAP 8.2.2RC2 7-mode):

Commands
--------

Displaying
# quick check of status
lun show

# detailed info
lun show -v

# lun mappings
lun show -m

Creating
# w/ space reservation
lun create -s [k|m|g|t] -t
lun create -s 6t -t vmware /vol/sql30_vol/lun0

# w/o space reservation
lun create -s [k|m|g|t] -t -o noreserve
lun create -s 6t -t vmware -o noreserve /vol/sql30_vol/lun0

** ostypes: vmware, windows, windows_2008, linux, solaris, etc
** NOTE: use  "vmware" ostype for provisioning LUN to ESXi

Mapping & Controlling availability
# mapping
lun map
lun map /vol/devLuns/lun16 DEVCLUSTER 16
** LUN ID generates next available number if you don’t enter a value

# unmapping
lun unmap
lun unmap /vol/devLuns/lun16 DEVCLUSTER 16

# availability
lun online
lun offline


Resizing
# incremental
lun resize +[g|M]

# absolute (BE CAUTIOUS OF THIS COMMAND!!!)
lun resize

Modifying
# setting space reservation
lun set reservation [enable|disable]

Deleting
# deletes a LUN
lun destroy
** this will automatically reclaim space inside containing volume


Tutorials / Tips and Tricks
---------------------------

to verify if a LUN is thin or thick provisioned
1. check its footprint on its containing aggregate
2. see "guarantee" option under "vol status -v"
unmapping a LUN in live environment
1. stop all host side applications accessing the LUN
2. bring the LUN offline
3. unmap the LUN
How to identify the correct LUN in vSphere?
1. Go to ESX host Configuration > Storage Adapters >
   Select the LUN below > CTRL+C to copy
   the identifier
3. Login to Netapp device and hit "lun show -v /path/to/lun"
4. Compare the serial numbers in #2 and #3 and see if it matches
   (just ignore some odd ascii characters in #2)
Actual commands in destroying a LUN
LUN name: /vol/devsql13_vol/lun0

df -h /vol/devsql13_vol/
lun offline /vol/devsql13_vol/lun0
lun unmap /vol/devsql13_vol/lun0 ESX_NON_PROD
lun destroy /vol/devsql13_vol/lun0
df -h /vol/devsql13_vol/
expanding a windows drive by 250 GB
1. create lun in netapp: lun create -s 250g -t windows /vol/devluns/lun11
2. resize filesystem on host side: use diskpart or diskmgmt.msc
Checking for WWN
1. check serial
filer> lun show -v
/vol/vol1/lun0  40m (41943040)
Serial#: "OdCvFnbbKzih"
---truncated---

2. check wwn
filer>  igroup show
shpc (FCP) (ostype: windows2000):
10:00:00:00:c9:2b:fd:8e

Troubleshooting
---------------

VMWare VMs are inaccessible after reboot of Netapp filers (inconsistent state)
- refresh the storage nodes on ESX clusters to resolve the issue
netapp slow lun


Monday, June 18, 2018

Qtrees


Overview
--------

Basic Details:

- partition volumes into smaller segments
- properties are: quotas, backups, security style, CIFS oplocks settings
- can be used in creating CIFS shares
- you can create several qtrees inside a volume and each of them can have
  different quotas

Properties:

Opportunistic and Lease oplocks

  Traditional oplocks (opportunistic locks) and lease oplocks enable a CIFS
client in certain file-sharing scenarios to perform client-side caching of
read-ahead, write-behind, and lock information

  A client can then read from or write to a file without regularly reminding the
server that it needs access to the file in question. This improves performance
by reducing network traffic.

Commands
--------

Displaying
# quick view of status
qtree status [-i|-v]

# displays statistics
qtree stats
** stats are reset upon system reboot
** stats are reset when volume containing it is brought online
** stats are reset when you trigger this command: qtree stats -z

Creating
# creates a qtree under a specified volume
qtree create /vol// -m
  ** mode is permission
  ** you can see default mode under wafl.default_qtree_mode option

# creates a qtree under the root volume (/vol/vol0)
qtree create

Modifying
# Enabling/Disabling for entire storage
cifs.oplocks.enable on
cifs.oplocks.enable off

# Enabling/Disabling for qtrees qtree oplocks /vol/vol2/proj enable
qtree oplocks /vol/vol1/qtree enable
qtree oplocks /vol/vol1/qtree disable

Renaming
1. volume that contains the qtree must be available
  on a UNIX client: mount filer:/vol/my_vol/ /mnt
  on a Windows client: map the qtree into windows explorer

2. find the qtree and rename it
  on a UNIX client: mv /mnt/qtree_old /mnt/qtree_new
  on a Windows client: rename using windows explorer
** you can rename qtree depending on the qtree permissions

Deleting
# basic
qtree delete /path/to/qtree

# force (if for some reason directory is not emtpy)
qtree -f delete /path/to/qtree
  * you might need to go to "priv set advanced"

Tutorials
---------

Actual commands on CIFS creation
qtree create /vol/data_share_vol/data_share_qtree
qtree security /vol/data_share_vol/data_share_qtree ntfs
qtree oplocks /vol/data_share_vol/data_share_qtree enable
cifs shares -add data$ /vol/data_share_vol/data_share_qtree


Sunday, June 17, 2018

Igroups


What are igroups?
-----------------

--> contains initiators that can access your storage
--> tables of WWPNs (for FCP) or iqn host node names (for iSCSI) that are
    allowed to access a LUN
--> igroups can have multiple initiators
--> multiple igroups can have same initiator
--> a LUN cannot be mapped to multiple igroups having same initiator
--> an initiator can be a member of igroups of different ostypes

Commands
--------

Displaying
# prints igroups
igroup show
igroup show -v

Creating
# creates an ISCSI igroup
igroup create -i -t
igroup create -i -t windows_2008 win_host5_group2 iqn.1991-05.com.microsoft:host5.domain.com
** os types: solaris, Solaris_efi, windows, windows_gpt,
             windows_2008, hpux, aix, linux, netware, vmware, xen, hyper_v

# creates an FCP igroup
igroup create -f -t
igroup create -f -t aix aix-igroup3 10:00:00:00:0c:2b:cc:92
** NOTE: you must use WWPN (portname) and not WWN

Managing
# adding an initiator
igroup add
igroup add INTDB01 10:00:00:00:c9:5e:ca:5e

# removing an initiator
iggroup remove
igroup remove INTDB01 10:00:00:00:c9:5e:ca:5e

Deleting
# LUNS must be unmapped first
igroup destroy

# this will remove all LUN maps and destroy the igroup
igroup destroy -f

Modifying
# renaming
igroup rename
z
** this will not impact access to LUNS that are inside the igroup

changing the WWNN (World Wide Node Name) of a system
----------------------------------------------------

- both HA pairs must have same WWNN
- in order to change WWN, use this command: fcp nodename 50:0a:09:80:82:02:8d:ff
- for that to take effect, both HA pairs must be rebooted

Troubleshooting
---------------

different WWNN on HA pairs
message:

from autosupport:
HA Group Notification from filer01 (FILER SCSI TARGET MISCONFIGURED) ERROR

from /etc/messages:
Tue Feb 17 17:12:39 EST [filer01:scsitarget.cluster.misconfigured:notice]: Filer SCSI Target Misconfigured. Run 'lun config_check'.

resolution / fix:
1. match WWNs of both nodes (they must be the same)
2. reboot both nodes

Friday, June 15, 2018

Netapp Volumes (7-Mode)


Root Volume: /vol/vol0
----------------------

--> this is were DataONTAP is installed and booted
--> minimum size depends on hardware model (consult hwu)
--> fractional reserve must be 100%
--> can be traditional or FlexVol
--> default RAID type is RAID-DP (starting from Data ONTAP 7.3)
--> you can change RAID type by: vol options vol0 raidtype raid4
--> you can designate another root volume: vol options root

Commands
--------

Displaying        
# quick view of size and snapshots
df [-g|-h]

# displays block size
vol status -b

# detailed breakdown of space consumed inside a volume
vol status -S

# amount of space a volume is using within the aggregate (footprint)
vol status -F

# displays language used on each volumes
vol status -l

Creating
# basic
vol create [k|m|g|t]
  -> if -l is not specified, language will be same as the root volume's language

# thick provisioned (w/ space reservation, enabled by default)
vol create [k|m|g|t]
vol create myvolume myaggregate 1g
** you can also add "-s volume" option

# thin provisioned (w/o space reservation)
vol create -s none [k|m|g|t]
vol create -s none myvolume myaggregate 1g

# w/ language is specified --> not sure if this is required in creating ESX LUNS???
vol create -l en_US [k|m|g|t]
vol create devsql13_vol -l en_US aggr2 4t

TIPS:
- it is better to specify values in lower units to increase percentage of getting the desired size: e.g use 15360 GB instead of 15 TB

Modifying
# turns off snap reserve
snap reserve 0

# disables snapshots (any of the 2 commands is applicable)
vol options nosnap on
vol options nosnap 1

# renames a volume (non-disruptive)
vol rename

Deleting
# do steps in order
filer> vol offline /vol/
filer> vol destroy /vol/

NOTES:
  - deleting large volumes will not reclaim aggregate space rightaway
    it will take some time to reclaim all space
  - As an example, a 34 TB can be reclaimed in 24 hours

Resizing
# increase
vol size +[k|m|g|t]

## reduce
vol size -[k|m|g|t]

Tutorials
---------

Actual commands on Volume Creation
vol create data_share_vol -l en_US -s volume aggr2 30720g
vol options data_share_vol nosnap 1
snap reserve data_share_vol 0
Netapp share creation
1. Locate aggregate with enough space for new share(s)

aggr show_space -g
df -g -A

2. Create volumes - create volume, turn off automatic snapshots
   & remove snapshot reserve space

vol create -l en_US -s volume g
vol options nosnap 1
snap reserve 0

note: is usually images[0-9][0-9] & is 250. As of this
      writing filer01 has 2 available: aggr0 & aggr1

2. Create QTrees - create QTree, set security mode to NTFS and
   enable "opportunistic locks"

qtree create /vol//
qtree security /vol// ntfs
qtree oplocks /vol// enable

note: is usually the same as

3. Create CIFS shares
cifs shares -add /vol//

note: is usually "$"


Troubleshooting
---------------

running out of inodes?
Related message:

Fri Nov 20 14:52:54 EST [filer01:wafl.vol.outOfInodes:notice]: file system on Volume data_share_vol is out of inodes

Solution:

1. Check the current inode value of the volume
maxfiles data_share_vol
df -i data_share_vol

2. Increase the max inodes
maxfiles data_share_vol 35000000
  -> 35000000 is a value greater than the current max inodes

3. Verify
maxfiles data_share_vol
df -i data_share_vol
creare_ucode config error
Log message:
Fri Apr 15 14:25:26 GMT [filer02:cmds.sysconf.logErr:error]: sysconfig: Unless directed by NetApp Global Services volumes vol0, backups_vol, backups2_vol, and vdi_vol should have the volume option create_ucode set to On. . 
Fri Apr 15 14:25:26 GMT [filer02:callhome.sys.config:error]: Call home for SYSTEM CONFIGURATION WARNING

Solution:
For each volume, run the following command:
filer> vol options [volname] create_ucode on

Root cause:
Clustered filers in a NetApp Storage Area Network (SAN) environment require the following options to be enabled to guarantee that failover and giveback occur quickly enough to not interfere with host requests to the LUNs. These options are automatically enabled when FCP/iSCSI service is turned on:
  •        volume option create_ucode to on
  •        coredump.timeout.enable to on
  •        coredump.timout.seconds set to 60 or less

Wednesday, June 13, 2018

Netapp CDot Administration


Cluster vs SVM Admins
---------------------

SVM Admins
 - can only administer their own SVM
 - SVMs are short for Storage Virtual Machines (formerly called vservers)
 - manages resources to that SVM (volumes, protocols, lifs, etc..)

Cluster Admins
 - can administer both the cluster and all SVMs underneath
 - can setup SVMs and delegate roles to SVM admins


How to manage DataONTAP?
------------------------

Ways:

1. command line (tcsh shell)
  cluser admins:
    a. serial port (default admin account: admin)
    b. ssh
        - enabled by default
        - account must be permitted via ssh login (`security login -application`)
        - if using AD, domain ssh access methos must be "domain"
        - if using ipv6, ipv6 must be configure on the cluster
    c. rsh/telnet
        - disabled by default since they are insecure protocols
        - to enable, see tutorial below

2. url

Some notes on SSH
-----------------

 - SSHv1 is not supported, only SSHv2 (cDOT 8.3)
 - DOT supports 64 concurrent SSH connections per node
 - if rate of incoming connections is higher than 10 per second,
   service is temporarily disabled for 60 seconds
 - if using AD, use same username and domain create from DOT

Privileged Levels
-----------------

Levels:

admin
cluster_name::>

- most commands and parameters are available
- used for common routine tasks
advanced
cluster_name::*>

- commands here are infrequently used
- requires advanced knowledge
diagnostics
what does the prompt looks like??

- commands here are potentially disruptve
- used by support personnel to diagnose and fix problems

note:
  - command preceded by `*` can only be executed under advanced
    privileged level or higher

Different Shells
----------------

* for cluster admins only *

1. clustershell
    - default shell when you log in
    - used to manage the cluster

2. nodeshell
    - shell for a specific node
    - many commands from nodeshell can be accessed from clustershell

3. systemshell
    - used for diagnostics/troubleshooting purposes
    - requires diag priveleged level
    - intended for technical support use

Display Preferences
-------------------

What preferences can I set?

- privilege level of the command session
- whether confirmations are issued for potentially disruptive commands
- whether show commands display all fields
- the character or characters to use as the field separator
- the default unit when reporting data sizes
- the number of rows the screen displays in the current cli session
  before the interface pauses output (if the preferred number of rows
  is not specified, it is automatically adjusted based on the actual
  height of the terminal. if the actual height is undefined, the default
  number of rows is 24)
- the default storage virtual machine (svm) or node
- whether a continuing command should stop if it encounters an error

Ways of executing Commands
--------------------------

1. full path

cluster1::> storage disk show

2. per directory

cluster1::> storage
cluster1::storage> disk
cluster1::storage disk> show

  * use `top` to go to top level
  * use `up` or `..` to go one level higher

3. abbreviating commands

cluster1::> st d sh

Rules for specifying values in CLI
----------------------------------

- a value can be a number, string, boolean specifier
- some accepts a comma-separated list (doesn't need "")
- enclose values with spaces inside ""
- `?` is interpreted as help
- example of case-insensitive command is `vserver cifs `
- nodenames, volumes, aggregates, LIFs, etc .. are case-sensitive
- to clear a value, use "" or -
- lines starting with # are comments

some examples:

# sets a comment then deletes it
cluster1::> vserver create -vserver vs0 -subtype default -rootvolume root_vs0
-aggregate aggr1 -rootvolume-security-style unix -language C.UTF-8 -is-repository
false -ipspace ipspaceA -comment "My SVM"
cluster1::> vserver modify -vserver vs0 -comment ""

# tells that the command do
cluster1::> security login create -vserver vs0 -user-or-group-name new-admin
-application ssh -authmethod password #This command creates a new user account

Query operators
---------------

*
match all entries

# list all volumes with "tmp" on their name
volume show -volume *tmp*
!
NOT operator

# indicates not to match vs0
!vs0
|
OR operator

# vs0 or vs1
vs0 | vs1

# matches a, anything that starts with b, or those with c
a | b* | *c*
..
range operator

# any value from 5 to 10
5..10
<
less than operator
>
greater than operator
<=
less then or equal to
>=
greater than or equal to
{query}
extended query
  - must be specified as the 1st argument after
    the command name before any other parameters
  - can only be used in `modify` and `delete`
    commands
  - not applicable on `create` or `show` commands
  - example of confusing exended query: p.22 of
    "ONTAP 9 System Administration Reference"

# offlines all volumes whose names contain "tmp"
volume modify {-volume *tmp*} -state offline
"string literal"
you may also query any characters as literals by
enclosing them in ""

e.g
"^"
"*"
using multiple query operators
# displays all volumes whose size is greater than 1GB,
# percent used is less than 50% and not in SVM vs1
volume show -size >1GB -percent-used <50 -vserver="" p="" vs1="">

Commands
--------

nodeshell/clustershell
# querying clustershell cli
vserver options -vserver -option-name ?

# accessing vserver man page
man vserver options

# cluster shell help
help
[?|help]

# node shell help
help
[?|help]

# accessing node shell
system node run -node
  * local - node you use to access the cluster
  * `system node run` alias is `run`

# exits/return to previous shell (if there is any)
exit
CTRL+D

ssh
# connecting using local ssh account
ssh joe@cluster.ip

# connecting using AD account
ssh DOMAIN\\joe@cluster.ip
ssh "DOMAIN\joe"@cluster.ip

# executing remote command via ssh
ssh joe@cluster.ip cluster show

history/redo/reissue
# prints history
history

# redo nth command in history
redo

# redo command executed Nth number ago
redo -

privelege levels
# changes privilege level
set -privilege

setting display preferences
# key command
set

# sets number of rows on the current session
rows

# changes separator and units used
set -showseparator "," -units GB

displaying
# displays full details
cluster1::> volume show -instance
Vserver Name: cluster1-1
Volume Name: vol0
Aggregate Name: aggr0
...
Space Guarantee Style: volume
Space Guarantee in Effect: true
...
Press to page down, for next line, or 'q' to quit...
...
cluster1::>

# displays only the fields you specify
cluster1::> volume show -fields space-guarantee,space-guarantee-enabled
vserver volume space-guarantee space-guarantee-enabled
-------- ------ --------------- -----------------------
cluster1-1 vol0 volume true
cluster1-2 vol0 volume true
...
cluster1::>

# show valid fields
show -fields ?

Command Shortcuts
-----------------

- DataOntap shell is based on unix tcsh
- below are copy pasted from the pdf

If you want to ..
Use the ff keyboard shortcut
move the cursor back by one character
Ctrl-B / Back arrow
move the cursor forward by one character
Ctrl-F / Forward arrow
move the cursor back by one word
Esc-B
move the cursor forward by one word
Esc-F
move the cursor to the beginning of the line
Ctrl-A
move the cursor to the end of the line
Ctrl-E
Remove the content of the command line from the beginning of the
line to the cursor, and save it in the cut buffer
The cut buffer acts like temporary memory, similar to what is called a
clipboard in some programs.
Ctrl-U
Remove the content of the command line from the cursor to the end
of the line, and save it in the cut buffer
Ctrl-K
Remove the content of the command line from the cursor to the end
of the following word, and save it in the cut buffer
Esc-D
Remove the word before the cursor, and save it in the cut buffer
Ctrl-W
Yank the content of the cut buffer, and push it into the command line
at the cursor
Ctrl-Y
Delete the character before the cursor
Ctrl-H / Backspace
Delete the character where the cursor is
Ctrl-D
Clear the line
Ctrl-C
Clear the screen
Ctrl-L
Replace the current content of the command line with the previous
entry on the history list
With each repetition of the keyboard shortcut, the history cursor
moves to the previous entry.
Ctrl-P / Esc-P / Up arrow
Replace the current content of the command line with the next entry
on the history list
With each repetition of the keyboard shortcut, the history cursor
moves to the next entry
Ctrl-N / Esc-N / Down arrow
Expand a partially entered command or list valid input from the
current editing position
Tab / Ctrl-I
Display context-sensitive help
?
Escape the special mapping for the question mark (“?”) character
For instance, to enter a question mark into a command's argument,
press Esc and then the “?” character.
Esc-?
Start TTY output
Ctrl-Q
Stop TTY output
Ctrl-S

Tutorials
---------

Enabling rsh/telnet
1. Use `system services firewall policy clone` command to create
   a new management firewall policy based from the default which
   is "mgmt" firewall policy

2. Use `system services firewall policy create` command to enable
   telnet or rsh on the new firewall policy

3. Use `network interfaces modify` command to associate the new
   policy with the cluster management LIF

4. Then to access your cluster:
     telnet cluster.ip
     rsh cluster.ip -l username:password