we had to implement 2 node cluster sles9a(11.11.5.60) and sles9b(11.11.5.61):

1. Creating nodes: this command has to be executed on both the machine
# preprpnode sles9a sles9b
2. Create the TSA cluster domain:
# mkrpdomain hadomain sles9a sles9b
3. Start (online) the domain:
# startrpdomain hadomain
4. Ensure the domain is online:
# lsrpdomain
you should see the following
sles9a:~ # lsrpdomain
Name OpState RSCTActiveVersion MixedVersions TSPort GSPort
hadomain Online 2.4.7.3 No 12347 12348

5. Ensure all nodes in the domain are online:
# lsrpnode
You should see output similar to the following:
sles9a:~ # lsrpnode
Name OpState RSCTVersion
sles9a Online 2.4.7.3
sles9b Online 2.4.7.3

here we had to struggle a lot ,some strange thing was happening ..we were getting the below out put
sles9a:/ # lsrpnode
Name OpState RSCTVersion
sles9a Online 2.4.7.3
sles9b Offline 2.4.7.3

sles9b:/ # lsrpnode
Name OpState RSCTVersion
sles9a Offline 2.4.7.3
sles9b Online 2.4.7.3

we figured out the problem from the command below that there was a brodcast addresss problem

sles9a:/var/ct/hadomain/log/cthats # lssrc -ls cthats
Subsystem Group PID Status
cthats cthats 4688 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
CG1 [ 0] 2 0 D 10.10.5.60
CG1 [ 0] eth0 Broadcast address is misconfigured

HB Interval = 1.000 secs. Sensitivity = 4 missed beats
2 locally connected Clients with PIDs:
rmcd( 4930) hagsd( 4716)
Client Heartbeating Enabled. Period: 64 secs. Timeout: 300 secs.
Configuration Instance = 1208930791
Daemon employs no security
Segments pinned: Text Data Stack.
Text segment size: 793 KB. Static data segment size: 1568 KB.
Dynamic data segment size: 609. Number of outstanding malloc: 68
User time 0 sec. System time 0 sec.
Number of page faults: 0. Process swapped out 0 times.
Number of nodes up: 1. Number of nodes down: 1.
Nodes up : 1

we corrected the problem editing the n/w configuration file

it will be located in

sles9a:/etc/sysconfig/network #

we edited the highlighted one

sles9a:/etc/sysconfig/network # cat ifcfg-qeth-bus-ccw-0.0.0800
BOOTPROTO=’static’
UNIQUE=”
STARTMODE=’onboot’
IPADDR=’11.11.5.60′
MTU=’1500′
NETMASK=’255.255.255.0′
NETWORK=’11.11.5.0′
BROADCAST=’11.11.5.255′

after we edit the file on both the machine we need to restart the machines.

next part in configuring the Tie-breaker disk..since it is a two node cluster..we don’t need tie breaker for

3 node cluster…

# mkrsrc IBM.TieBreaker Name=myTieBreaker Type=ECKD DeviceInfo=”ID=0152″ HeartbeatPeriod=5

tie -breaker details will be provided by linux admin

# lsrsrc IBM.TieBreaker

sles9a:/etc/sysconfig/network # lsrsrc IBM.TieBreaker
Resource Persistent Attributes for IBM.TieBreaker
resource 1:
Name = “myTieBreaker”
Type = “ECKD”
DeviceInfo = “ID=0152”
ReprobeData = “”
ReleaseRetryPeriod = 0
HeartbeatPeriod = 5
PreReserveWaitTime = 0
PostReserveWaitTime = 0
NodeInfo = {}
ActivePeerDomain = “hadomain”
resource 2:
Name = “Fail”
Type = “Fail”
DeviceInfo = “”
ReprobeData = “”
ReleaseRetryPeriod = 0
HeartbeatPeriod = 0
PreReserveWaitTime = 0
PostReserveWaitTime = 0
NodeInfo = {}
ActivePeerDomain = “hadomain”
resource 3:
Name = “Operator”
Type = “Operator”
DeviceInfo = “”
ReprobeData = “”
ReleaseRetryPeriod = 0
HeartbeatPeriod = 0
PreReserveWaitTime = 0
PostReserveWaitTime = 0
NodeInfo = {}
ActivePeerDomain = “hadomain”

Change OpQuorumTieBreaker attribute in IBM.PeerNode class to one of the tie-breaker resource objects.
# chrsrc -c IBM.PeerNode OpQuorumTieBreaker=”myTieBreaker”
# lsrsrc -c IBM.PeerNode
Resource Class Persistent Attributes for IBM.PeerNode
resource 1:
CommittedRSCTVersion = “”
ActiveVersionChanging = 0
OpQuorumOverride = 0
CritRsrcProtMethod = 1
OpQuorumTieBreaker = “myTieBreaker”
QuorumType = 0
Now the cluster is ready to host a highly available DB2 instance.

 

some usefull links

RSCT Diagnosis Guide

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_14010/bl5dia00/bl5dia0031.html 

Diagnostic procedure RSCT

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_14010/bl5dia00/bl5dia0025.html

Error symptoms, responses, and recoveries

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_14010/bl5dia00/bl5dia0018.html

Accessing logged errors

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_14010/bl5dia00/bl5dia0018.html

RSCT for Linux Technical Reference

http://publib.boulder.ibm.com/epubs/pdf/bl5trl09.pdf

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_linux15/bl5trl0928.html

Advertisements