My Lazy Admin: netapp

Showing posts with label netapp. Show all posts

Monday, August 6, 2018

Disk Replacement for NetApp filer (FAS3170, FAS3240)

1. Login to filer and verify the device address of the failed disk via command line

filer02> vol status -f

Note: device address usually has the form ., e.g, 0a.18, 0a.20

2. Physically identify the failed disk by locating an amber light blinking on and off on the lower part of the disk.

3. If you cannot see any amber light, got to advanced mode and turn ON the led by issuing the commands below:

filer02> priv set advanced

filer02*> led_on 0a.19

4. If you still cannot see any amber light blinking, manually turn on the leds on the neighboring disks to locate the faulty disk:

filer02*> led_on 0a.18

filer02*> led_on 0a.20

5. When the location of the failed disk has been verified, remove it by pulling the latch and pulling it partially.

Wait for 2 minutes to let it spin down then remove it afterwards. During this process, you can see the messages below

popping out from the terminal.

filer02*> Wed Mar 5 20:21:23 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 1a link online.

Wed Mar 5 20:21:23 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 0a link online.

Wed Mar 5 20:21:30 EST [filer02:raid.disk.missing:info]: Disk 0a.19 Shelf 1 Bay 3 [NETAPP X291_S15K6420F15 NA01] S/N [3QQ0MSBX00009922S2N5] is missing from the system

Wed Mar 5 20:21:54 EST [filer02:config.BadPoolAssign:warning]: Disk 1a.27 is in Pool1 and other disks on this loop/domain are in Pool0. Disks/Interfaces need to be in separate pools for SyncMirror.

6. Insert the new disk by pushing the latch in upward position until it snaps into place. Wait another 2 minutes

to let the synchronization finish. Messages below will appear during this process.

filer02*> Wed Mar 5 20:23:16 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 1a link online.

Wed Mar 5 20:23:16 EST [filer02:fci.adapter.link.online:info]: Fibre Channel adapter 0a link online.

NOTE: sometimes, that message don;t appear

7. The amber light should stop illuminating by now. If not, turn it off manually:

filer02*> led_off 0a.19

8. Go back to admin mode (exit advanced mode) and assign the disk to the filer.

filer02*> priv set

filer02> disk assign 0a.19 --> use the device ID on the line with .. is missing from the system

10. Verify the new disk. You can see below that the new disk replaced the spare disk that was used during the rebuilt of data when the

original disk has failed.

filer02> aggr status -s

Spare disks

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ----- -------------- --------------

Spare disks for block checksum

spare 1a.19 1a 1 3 FC:B - FCAL 15000 418000/856064000 420584/861357448

spare 2d.03.8 2d 3 8 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.9 2d 3 9 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.10 2d 3 10 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.11 2d 3 11 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.13 2d 3 13 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.16 2d 3 16 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.17 2d 3 17 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.18 2d 3 18 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.19 2d 3 19 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.20 2d 3 20 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.21 2d 3 21 SA:B - SAS 15000 560000/1146880000 560879/1148681096

spare 2d.03.22 2d 3 22 SA:B - SAS 15000 560000/1146880000 560879/1148681096

filer02>

Wednesday, June 20, 2018

Netapp LUNs (7-Mode)

Introduction

------------

- by default,block size is 4096 bytes (4 KB)

source: https://kb.netapp.com/index?page=content&id=3011193&pmv=print&impressions=false

- max size that can be created is 16 TB (but in practice, I was only able to

create a 15.9 TB LUN in Data ONTAP 8.2.2RC2 7-mode):

source: https://library.netapp.com/ecmdocs/ECMP1196906/html/GUID-AA1419CF-50AB-41FF-A73C-C401741C847C.html

Commands

--------

Displaying

# quick check of status

lun show

# detailed info

lun show -v

# lun mappings

lun show -m

Creating

# w/ space reservation

lun create -s [k|m|g|t] -t

lun create -s 6t -t vmware /vol/sql30_vol/lun0

# w/o space reservation

lun create -s [k|m|g|t] -t -o noreserve

lun create -s 6t -t vmware -o noreserve /vol/sql30_vol/lun0

** ostypes: vmware, windows, windows_2008, linux, solaris, etc

** NOTE: use "vmware" ostype for provisioning LUN to ESXi

Mapping & Controlling availability

# mapping

lun map

lun map /vol/devLuns/lun16 DEVCLUSTER 16

** LUN ID generates next available number if you don’t enter a value

# unmapping

lun unmap

lun unmap /vol/devLuns/lun16 DEVCLUSTER 16

# availability

lun online

lun offline

Resizing

# incremental

lun resize +[g|M]

# absolute (BE CAUTIOUS OF THIS COMMAND!!!)

lun resize

Modifying

# setting space reservation

lun set reservation [enable|disable]

Deleting

# deletes a LUN

lun destroy

** this will automatically reclaim space inside containing volume

Tutorials / Tips and Tricks

---------------------------

to verify if a LUN is thin or thick provisioned	1. check its footprint on its containing aggregate 2. see "guarantee" option under "vol status -v"
unmapping a LUN in live environment	1. stop all host side applications accessing the LUN 2. bring the LUN offline 3. unmap the LUN
How to identify the correct LUN in vSphere?	1. Go to ESX host Configuration > Storage Adapters > Select the LUN below > CTRL+C to copy 2. Go to http://string-functions.com/hex-string.aspx and convert the identifier 3. Login to Netapp device and hit "lun show -v /path/to/lun" 4. Compare the serial numbers in #2 and #3 and see if it matches (just ignore some odd ascii characters in #2)
Actual commands in destroying a LUN	LUN name: /vol/devsql13_vol/lun0 df -h /vol/devsql13_vol/ lun offline /vol/devsql13_vol/lun0 lun unmap /vol/devsql13_vol/lun0 ESX_NON_PROD lun destroy /vol/devsql13_vol/lun0 df -h /vol/devsql13_vol/
expanding a windows drive by 250 GB	1. create lun in netapp: lun create -s 250g -t windows /vol/devluns/lun11 2. resize filesystem on host side: use diskpart or diskmgmt.msc
Checking for WWN	1. check serial filer> lun show -v /vol/vol1/lun0 40m (41943040) Serial#: "OdCvFnbbKzih" ---truncated--- 2. check wwn filer> igroup show shpc (FCP) (ostype: windows2000): 10:00:00:00:c9:2b:fd:8e

Troubleshooting

---------------

VMWare VMs are inaccessible after reboot of Netapp filers (inconsistent state)	- refresh the storage nodes on ESX clusters to resolve the issue
netapp slow lun	http://networkadminkb.com/KB/a510/how-to-fix-slow-read-performance-increased-latency.aspx

Monday, June 18, 2018

Qtrees

Overview

--------

Basic Details:

- partition volumes into smaller segments

- properties are: quotas, backups, security style, CIFS oplocks settings

- can be used in creating CIFS shares

- you can create several qtrees inside a volume and each of them can have

different quotas

Properties:

Opportunistic and Lease oplocks

Traditional oplocks (opportunistic locks) and lease oplocks enable a CIFS

client in certain file-sharing scenarios to perform client-side caching of

read-ahead, write-behind, and lock information

A client can then read from or write to a file without regularly reminding the

server that it needs access to the file in question. This improves performance

by reducing network traffic.

Commands

--------

Displaying

# quick view of status

qtree status [-i|-v]

# displays statistics

qtree stats

** stats are reset upon system reboot

** stats are reset when volume containing it is brought online

** stats are reset when you trigger this command: qtree stats -z

Creating

# creates a qtree under a specified volume

qtree create /vol// -m

** mode is permission

** you can see default mode under wafl.default_qtree_mode option

# creates a qtree under the root volume (/vol/vol0)

qtree create

Modifying

# Enabling/Disabling for entire storage

cifs.oplocks.enable on

cifs.oplocks.enable off

# Enabling/Disabling for qtrees qtree oplocks /vol/vol2/proj enable

qtree oplocks /vol/vol1/qtree enable

qtree oplocks /vol/vol1/qtree disable

Renaming

1. volume that contains the qtree must be available

on a UNIX client: mount filer:/vol/my_vol/ /mnt

on a Windows client: map the qtree into windows explorer

2. find the qtree and rename it

on a UNIX client: mv /mnt/qtree_old /mnt/qtree_new

on a Windows client: rename using windows explorer

** you can rename qtree depending on the qtree permissions

Deleting

# basic

qtree delete /path/to/qtree

# force (if for some reason directory is not emtpy)

qtree -f delete /path/to/qtree

* you might need to go to "priv set advanced"

Tutorials

---------

Actual commands on CIFS creation

qtree create /vol/data_share_vol/data_share_qtree

qtree security /vol/data_share_vol/data_share_qtree ntfs

qtree oplocks /vol/data_share_vol/data_share_qtree enable

cifs shares -add data$ /vol/data_share_vol/data_share_qtree

Sunday, June 17, 2018

Igroups

What are igroups?

-----------------

--> contains initiators that can access your storage

--> tables of WWPNs (for FCP) or iqn host node names (for iSCSI) that are

allowed to access a LUN

--> igroups can have multiple initiators

--> multiple igroups can have same initiator

--> a LUN cannot be mapped to multiple igroups having same initiator

--> an initiator can be a member of igroups of different ostypes

Commands

--------

Displaying

# prints igroups

igroup show

igroup show -v

Creating

# creates an ISCSI igroup

igroup create -i -t

igroup create -i -t windows_2008 win_host5_group2 iqn.1991-05.com.microsoft:host5.domain.com

** os types: solaris, Solaris_efi, windows, windows_gpt,

windows_2008, hpux, aix, linux, netware, vmware, xen, hyper_v

# creates an FCP igroup

igroup create -f -t …

igroup create -f -t aix aix-igroup3 10:00:00:00:0c:2b:cc:92

** NOTE: you must use WWPN (portname) and not WWN

Managing

# adding an initiator

igroup add

igroup add INTDB01 10:00:00:00:c9:5e:ca:5e

# removing an initiator

iggroup remove

igroup remove INTDB01 10:00:00:00:c9:5e:ca:5e

Deleting

# LUNS must be unmapped first

igroup destroy

# this will remove all LUN maps and destroy the igroup

igroup destroy -f

Modifying

# renaming

igroup rename

** this will not impact access to LUNS that are inside the igroup

changing the WWNN (World Wide Node Name) of a system

----------------------------------------------------

- both HA pairs must have same WWNN

- in order to change WWN, use this command: fcp nodename 50:0a:09:80:82:02:8d:ff

- for that to take effect, both HA pairs must be rebooted

Troubleshooting

---------------

different WWNN on HA pairs

message:

from autosupport:

HA Group Notification from filer01 (FILER SCSI TARGET MISCONFIGURED) ERROR

from /etc/messages:

Tue Feb 17 17:12:39 EST [filer01:scsitarget.cluster.misconfigured:notice]: Filer SCSI Target Misconfigured. Run 'lun config_check'.

resolution / fix:

1. match WWNs of both nodes (they must be the same)

2. reboot both nodes

Friday, June 15, 2018

Netapp Volumes (7-Mode)

Root Volume: /vol/vol0

----------------------

--> this is were DataONTAP is installed and booted

--> minimum size depends on hardware model (consult hwu)

--> fractional reserve must be 100%

--> can be traditional or FlexVol

--> default RAID type is RAID-DP (starting from Data ONTAP 7.3)

--> you can change RAID type by: vol options vol0 raidtype raid4

--> you can designate another root volume: vol options root

Commands

--------

Displaying

# quick view of size and snapshots

df [-g|-h]

# displays block size

vol status -b

# detailed breakdown of space consumed inside a volume

vol status -S

# amount of space a volume is using within the aggregate (footprint)

vol status -F

# displays language used on each volumes

vol status -l

Creating

# basic

vol create [k|m|g|t]

-> if -l is not specified, language will be same as the root volume's language

# thick provisioned (w/ space reservation, enabled by default)

vol create [k|m|g|t]

vol create myvolume myaggregate 1g

** you can also add "-s volume" option

# thin provisioned (w/o space reservation)

vol create -s none [k|m|g|t]

vol create -s none myvolume myaggregate 1g

# w/ language is specified --> not sure if this is required in creating ESX LUNS???

vol create -l en_US [k|m|g|t]

vol create devsql13_vol -l en_US aggr2 4t

TIPS:

- it is better to specify values in lower units to increase percentage of getting the desired size: e.g use 15360 GB instead of 15 TB

Modifying

# turns off snap reserve

snap reserve 0

# disables snapshots (any of the 2 commands is applicable)

vol options nosnap on

vol options nosnap 1

# renames a volume (non-disruptive)

vol rename

Deleting

# do steps in order

filer> vol offline /vol/

filer> vol destroy /vol/

NOTES:

- deleting large volumes will not reclaim aggregate space rightaway

it will take some time to reclaim all space

- As an example, a 34 TB can be reclaimed in 24 hours

Resizing

# increase

vol size +[k|m|g|t]

## reduce

vol size -[k|m|g|t]

Tutorials

---------

Actual commands on Volume Creation

vol create data_share_vol -l en_US -s volume aggr2 30720g

vol options data_share_vol nosnap 1

snap reserve data_share_vol 0

Netapp share creation

1. Locate aggregate with enough space for new share(s)

aggr show_space -g

df -g -A

2. Create volumes - create volume, turn off automatic snapshots

& remove snapshot reserve space

vol create -l en_US -s volume g

vol options nosnap 1

snap reserve 0

note: is usually images[0-9][0-9] & is 250. As of this

writing filer01 has 2 available: aggr0 & aggr1

2. Create QTrees - create QTree, set security mode to NTFS and

enable "opportunistic locks"

qtree create /vol//

qtree security /vol// ntfs

qtree oplocks /vol// enable

note: is usually the same as

3. Create CIFS shares

cifs shares -add /vol//

note: is usually "$"

Troubleshooting

---------------

running out of inodes?

Related message:

Fri Nov 20 14:52:54 EST [filer01:wafl.vol.outOfInodes:notice]: file system on Volume data_share_vol is out of inodes

Solution:

1. Check the current inode value of the volume

maxfiles data_share_vol

df -i data_share_vol

2. Increase the max inodes

maxfiles data_share_vol 35000000

-> 35000000 is a value greater than the current max inodes

3. Verify

maxfiles data_share_vol

df -i data_share_vol

creare_ucode config error

Log message:

Fri Apr 15 14:25:26 GMT [filer02:cmds.sysconf.logErr:error]: sysconfig: Unless directed by NetApp Global Services volumes vol0, backups_vol, backups2_vol, and vdi_vol should have the volume option create_ucode set to On. .

Fri Apr 15 14:25:26 GMT [filer02:callhome.sys.config:error]: Call home for SYSTEM CONFIGURATION WARNING

Solution:

For each volume, run the following command:

filer> vol options [volname] create_ucode on

Root cause:

Clustered filers in a NetApp Storage Area Network (SAN) environment require the following options to be enabled to guarantee that failover and giveback occur quickly enough to not interfere with host requests to the LUNs. These options are automatically enabled when FCP/iSCSI service is turned on:

• volume option create_ucode to on

• coredump.timeout.enable to on

• coredump.timout.seconds set to 60 or less

Wednesday, June 13, 2018

Netapp CDot Administration

Cluster vs SVM Admins

---------------------

SVM Admins

- can only administer their own SVM

- SVMs are short for Storage Virtual Machines (formerly called vservers)

- manages resources to that SVM (volumes, protocols, lifs, etc..)

Cluster Admins

- can administer both the cluster and all SVMs underneath

- can setup SVMs and delegate roles to SVM admins

How to manage DataONTAP?

------------------------

Ways:

1. command line (tcsh shell)

cluser admins:

a. serial port (default admin account: admin)

b. ssh

- enabled by default

- account must be permitted via ssh login (`security login -application`)

- if using AD, domain ssh access methos must be "domain"

- if using ipv6, ipv6 must be configure on the cluster

c. rsh/telnet

- disabled by default since they are insecure protocols

- to enable, see tutorial below

2. url

Some notes on SSH

-----------------

- SSHv1 is not supported, only SSHv2 (cDOT 8.3)

- DOT supports 64 concurrent SSH connections per node

- if rate of incoming connections is higher than 10 per second,

service is temporarily disabled for 60 seconds

- if using AD, use same username and domain create from DOT

Privileged Levels

-----------------

Levels:

admin	cluster_name::> - most commands and parameters are available - used for common routine tasks
advanced	cluster_name::*> - commands here are infrequently used - requires advanced knowledge
diagnostics	what does the prompt looks like?? - commands here are potentially disruptve - used by support personnel to diagnose and fix problems

note:

- command preceded by `*` can only be executed under advanced

privileged level or higher

Different Shells

----------------

* for cluster admins only *

1. clustershell

- default shell when you log in

- used to manage the cluster

2. nodeshell

- shell for a specific node

- many commands from nodeshell can be accessed from clustershell

3. systemshell

- used for diagnostics/troubleshooting purposes

- requires diag priveleged level

- intended for technical support use

Display Preferences

-------------------

What preferences can I set?

- privilege level of the command session

- whether confirmations are issued for potentially disruptive commands

- whether show commands display all fields

- the character or characters to use as the field separator

- the default unit when reporting data sizes

- the number of rows the screen displays in the current cli session

before the interface pauses output (if the preferred number of rows

is not specified, it is automatically adjusted based on the actual

height of the terminal. if the actual height is undefined, the default

number of rows is 24)

- the default storage virtual machine (svm) or node

- whether a continuing command should stop if it encounters an error

Ways of executing Commands

--------------------------

1. full path

cluster1::> storage disk show

2. per directory

cluster1::> storage

cluster1::storage> disk

cluster1::storage disk> show

* use `top` to go to top level

* use `up` or `..` to go one level higher

3. abbreviating commands

cluster1::> st d sh

Rules for specifying values in CLI

----------------------------------

- a value can be a number, string, boolean specifier

- some accepts a comma-separated list (doesn't need "")

- enclose values with spaces inside ""

- `?` is interpreted as help

- example of case-insensitive command is `vserver cifs `

- nodenames, volumes, aggregates, LIFs, etc .. are case-sensitive

- to clear a value, use "" or -

- lines starting with # are comments

some examples:

# sets a comment then deletes it

cluster1::> vserver create -vserver vs0 -subtype default -rootvolume root_vs0

-aggregate aggr1 -rootvolume-security-style unix -language C.UTF-8 -is-repository

false -ipspace ipspaceA -comment "My SVM"

cluster1::> vserver modify -vserver vs0 -comment ""

# tells that the command do

cluster1::> security login create -vserver vs0 -user-or-group-name new-admin

-application ssh -authmethod password #This command creates a new user account

Query operators

---------------

*	match all entries # list all volumes with "tmp" on their name volume show -volume tmp
!	NOT operator # indicates not to match vs0 !vs0
\|	OR operator # vs0 or vs1 vs0 \| vs1 # matches a, anything that starts with b, or those with c a \| b* \| c
..	range operator # any value from 5 to 10 5..10
<	less than operator
>	greater than operator
<=	less then or equal to
>=	greater than or equal to
{query}	extended query - must be specified as the 1st argument after the command name before any other parameters - can only be used in `modify` and `delete` commands - not applicable on `create` or `show` commands - example of confusing exended query: p.22 of "ONTAP 9 System Administration Reference" # offlines all volumes whose names contain "tmp" volume modify {-volume tmp} -state offline
"string literal"	you may also query any characters as literals by enclosing them in "" e.g "^" "*"
using multiple query operators	# displays all volumes whose size is greater than 1GB, # percent used is less than 50% and not in SVM vs1 volume show -size >1GB -percent-used <50 -vserver="" p="" vs1="">

Commands

--------

nodeshell/clustershell

# querying clustershell cli

vserver options -vserver -option-name ?

# accessing vserver man page

man vserver options

# cluster shell help

help

[?|help]

# node shell help

help

[?|help]

# accessing node shell

system node run -node

* local - node you use to access the cluster

* `system node run` alias is `run`

# exits/return to previous shell (if there is any)

exit

CTRL+D

ssh	# connecting using local ssh account ssh joe@cluster.ip # connecting using AD account ssh DOMAIN\\joe@cluster.ip ssh "DOMAIN\joe"@cluster.ip # executing remote command via ssh ssh joe@cluster.ip cluster show

history/redo/reissue

# prints history

history

# redo nth command in history

redo

# redo command executed Nth number ago

redo -

privelege levels

# changes privilege level

set -privilege

setting display preferences

# key command

set

# sets number of rows on the current session

rows

# changes separator and units used

set -showseparator "," -units GB

displaying

# displays full details

cluster1::> volume show -instance

Vserver Name: cluster1-1

Volume Name: vol0

Aggregate Name: aggr0

...

Space Guarantee Style: volume

Space Guarantee in Effect: true

...

Press to page down, for next line, or 'q' to quit...

...

cluster1::>

# displays only the fields you specify

cluster1::> volume show -fields space-guarantee,space-guarantee-enabled

vserver volume space-guarantee space-guarantee-enabled

-------- ------ --------------- -----------------------

cluster1-1 vol0 volume true

cluster1-2 vol0 volume true

...

cluster1::>

# show valid fields

show -fields ?

Command Shortcuts

-----------------

- DataOntap shell is based on unix tcsh

- below are copy pasted from the pdf

If you want to ..	Use the ff keyboard shortcut
move the cursor back by one character	Ctrl-B / Back arrow
move the cursor forward by one character	Ctrl-F / Forward arrow
move the cursor back by one word	Esc-B
move the cursor forward by one word	Esc-F
move the cursor to the beginning of the line	Ctrl-A
move the cursor to the end of the line	Ctrl-E
Remove the content of the command line from the beginning of the line to the cursor, and save it in the cut buffer The cut buffer acts like temporary memory, similar to what is called a clipboard in some programs.	Ctrl-U
Remove the content of the command line from the cursor to the end of the line, and save it in the cut buffer	Ctrl-K
Remove the content of the command line from the cursor to the end of the following word, and save it in the cut buffer	Esc-D
Remove the word before the cursor, and save it in the cut buffer	Ctrl-W
Yank the content of the cut buffer, and push it into the command line at the cursor	Ctrl-Y
Delete the character before the cursor	Ctrl-H / Backspace
Delete the character where the cursor is	Ctrl-D
Clear the line	Ctrl-C
Clear the screen	Ctrl-L
Replace the current content of the command line with the previous entry on the history list With each repetition of the keyboard shortcut, the history cursor moves to the previous entry.	Ctrl-P / Esc-P / Up arrow
Replace the current content of the command line with the next entry on the history list With each repetition of the keyboard shortcut, the history cursor moves to the next entry	Ctrl-N / Esc-N / Down arrow
Expand a partially entered command or list valid input from the current editing position	Tab / Ctrl-I
Display context-sensitive help	?
Escape the special mapping for the question mark (“?”) character For instance, to enter a question mark into a command's argument, press Esc and then the “?” character.	Esc-?
Start TTY output	Ctrl-Q
Stop TTY output	Ctrl-S

Tutorials

---------

Enabling rsh/telnet

1. Use `system services firewall policy clone` command to create

a new management firewall policy based from the default which

is "mgmt" firewall policy

2. Use `system services firewall policy create` command to enable

telnet or rsh on the new firewall policy

3. Use `network interfaces modify` command to associate the new

policy with the cluster management LIF

4. Then to access your cluster:

telnet cluster.ip

rsh cluster.ip -l username:password