红帽5.0使用的是OpenAIS作为内核中的信息通信API,然后借助CMAN作为Messager Layer,再使用ramanager作为CRM进行资源的管理

Corosync具有比heartbeat在设计上更好的信息通信机制

红帽6.0直接使用Corosync用来作为集群的Messager Layer

 不同的公司的API机制所调用的库,函数类型,返回方式各不相同,这就必须使用一个标准,使不同公司的API保持最大的兼容

 比如你买了华硕的主板使用其他公司的鼠标照样可以使用

 应用接口规范(AIS)就是用来定义应用程序接口(API)的开放性规范的集合,这些应用程序作为中间件作为应用服务提供了一种开放,高移植性的程序接口,使用AIS的应用程序接口API,减少了应用程序的复杂性和开放时间

OpenAIS组件:CLM  CKPT  EVT LCK MSG......

OpenAIS的版本:Picacho Whitetank Wilson 其中Wilson是最新的

Corosync是OpenAIS发展到Wilson版本后独立出来的开放性集群引擎工程

OpenAIS从0.9开始分为wilson和Corosync

  Corosync本身只是一个集群引擎,用来处理集群的事物信息传递,也就是用来作为Mssager Layer,而Corosync并不具备集群资源的管理功能,其CRM必须有pacemaker扮演提供资源管理pacemaker是由heartbeat V3独立出去的项目,并且Pacemaker独立后的开发着重点也是Corosync而不是heartbeat V3

  Corosync可以完全使用命令来进行集群资源的配置,但也有许多图形化工具

  • corosync是高可用集群的底层信息传递层, 主要负责与上层交互并完成心跳和上层所要发送的事务信息。还有,为了防止发生Split brain以后所带来的问题,还有法定票数(quorum)这一概念。这里所要安装的是1.4版本的,负责集群票数的统计,每个节点一张票,到了2.*版本以后有了投票的功能,可以设定某节点可以持有多少张票。 最后完成票数的统计并交于CRM层来决策节点集群是否还要运行。 更多概念朋友们自己去查吧, 我自己对这方面了解的也少。而且我打字真的很慢。

  • pacemaker是高可用集群中的CRM(Cluster Resource Manager)资源管理层,它是一个服务,可以做为一个单独的服务启动,  不过在我们使用corosync-1.4的版本中,可以设置为corosync来启动pacemaker.  

  • pacemaker的配置接口可以在任意节点上安装crmsh或者pcs还有一些GUI界面的软件来完成。crmsh好像在RrdHat6.4以后都不是官方自带的了,官方的是pcs。  而crmsh好像是OpenSUSE所开发的。

Corosync的官网 

OPenAIS的官网 

Pacemaker官网 

所以集群的Messager Layer与CRM 组合如下:

1 haresource + heartbeat v1/v2 

2 crm + heartbeat v2

3 pacemaker + corosync

4 pacemaker + heartbeat v3

5 cman + ragmanager

今天将使用Pacemaker + Corosync用来定义并管理一个集群服务

可以用rpm装 也可以进行源码编译,也可以用yum直接装

________________________________________________________________________________________________________

192.168.139.2

[root@www ~]# ntpdate cn.ntp.org.cn \\ntp同步时间,我找的是中国区的一个全球ntp-server

[root@www .ssh]# ssh-keygen -t rsa -P '' //做ssh双机互信

[root@www .ssh]# ssh-copy-id -i ./id_rsa.pub root@192.168.139.4

[root@www html]# uname -n \\本节点名称

[root@www mysql]# yum install corosync pacemaker \\直接yum安装

________________________________________________________________________________________________________

192.168.139.4

[root@www ~]# ntpdate cn.ntp.org.cn 

[root@www .ssh]# ssh-keygen -t rsa -P '' 

[root@www .ssh]# ssh-copy-id -i ./id_rsa.pub root@192.168.139.2

[root@www html]# uname -n

www.rs2.com

[root@www mysql]# yum install corosync pacemaker                          

Installed:

 corosync.x86_64 0:1.4.7-5.el6                                                                                                      pacemaker.x86_64 0:1.1.14-8.el6_8.1                                                                                                     

Dependency Installed:

  clusterlib.x86_64 0:3.0.12.1-78.el6        corosynclib.x86_64 0:1.4.7-5.el6         libibverbs.x86_64 0:1.1.8-4.el6           libqb.x86_64 0:0.17.1-2.el6           librdmacm.x86_64 0:1.0.21-0.el6           lm_sensors-libs.x86_64 0:3.1.1-17.el6     

  net-snmp-libs.x86_64 1:5.5-57.el6_8.1       pacemaker-cli.x86_64 0:1.1.14-8.el6_8.1

  pacemaker-cluster-libs.x86_64 0:1.1.14-8.el6_8.1      

  pacemaker-libs.x86_64 0:1.1.14-8.el6_8.1     pciutils.x86_64 0:3.1.10-4.el6         rdma.noarch 0:6.8_4.1-1.el6  

[root@www mysql]# rpm -ql corosync

/etc/corosync //此目录下有Corosync的配置文件

/etc/corosync/corosync.conf.example //Corosync的配置文件样例

/usr/sbin/corosync-keygen //可以用此命令生成秘钥

[root@www mysql]# cd /etc/corosync

[root@www corosync]# ll

total 16

-rw-r--r--. 1 root root 2663 May 11  2016 corosync.conf.example

[root@www corosync]# cp corosync.conf.example corosync.conf

[root@www corosync]# vim corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

        version: 2 //配置文件版本号

        secauth: off //开启安全认证功能,安全的认证,当使用aisexec时,会非常消耗CPU

        threads: 0 //线程数,根据CPU个数和核心数确定,secauth为off时无意义

       

        interface {

            

                ringnumber: 0 //冗余环号,防止多播环路定义每个节点的环号,每个节点                           //一个网卡就不用指,默认为0

              

                bindnetaddr: 192.168.139.0 //网卡的网络地址不是IP地址

              

                mcastaddr: 239.255.1.1 //心跳信息传递的组播地址

               

                mcastport: 5405 //组播使用的端口

               

                ttl: 1 //

        }

}

logging {

       

        fileline: off //指定要打印的行

      

        to_stderr: no //错误信息的是否发到标准错误前段,建议不开启

       

        to_logfile: yes //定义是否记录到日志文件

        logfile: /var/log/cluster/corosync.log //定义独立日志文件的位置,此目录要自己创                                  //建

       

        to_syslog: no //定义是否记录到syslog,和to_logfile只启用一个即可

        debug: off //是否开启debug功能

        timestamp: on //是否打印时间戳,利于错误定位,但每次记录都要通过系统调用获取时                  //间,消耗CPU

        logger_subsys {

                subsys: AMF //是否记录AMF子系统的信息,没有启用OpenAIS,则不用启用

                debug: off

        }

}

amf {

     mode:  disabled //与编程相关的,可以不设置

}

server {

       ver: 0

       name: pacemaker //启动pacemaker

}

aisexec {          //这项可以不用加

        user: root

        group: root

}

___________________________________________________________________________________________

[root@www ~]# corosync-keygen //生成通信密钥,并保存在/etc/corosync/authkey

Writing corosync key to /etc/corosync/authkey             

[root@www cluster]# corosync-keygen             //

 由于要使用/dev/random生成随机数,因此如果新装的系统操作不多,如果没有足够的熵,可能会出现如下的提示.................... 一定要在本地乱敲键盘,ssh登录的好像没有用
Gathering 1024 bits for key from //random. 
Press keys on your keyboard to generate entropy. 
Press keys on your keyboard to generate entropy (bits = 240).

[root@www ~]# cd /etc/corosync/ 

[root@www cluster]#scp /etc/corosync/corosync.conf 192.168.139.2:/etc/corosync/ //将文件复制到另一个节点

[root@www ~]# service corosync start //开启本节点的corosync

[root@www ~]# ssh 192.168.139.2 service corosync start //开启另一个节点的corosync

__________________________________________________________________________________________

 //看启动中是否出现错误,网上搜了也不知道为啥,但我仍然顺利完成了整个实验,看来不是什么大错误

[root@www cluster]# grep ERROR: /var/log/cluster/corosync.log

Nov 11 15:05:10 www corosync[3470]:   [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Nov 11 15:05:10 www corosync[3470]:   [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

__________________________________________________________________________________________

[root@www ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log //查看corosync引擎是否启动正常

Nov 11 16:34:19 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Nov 11 16:34:19 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

Nov 11 16:34:19 [1908] www.rs2.com        cib:     info: retrieveCib:Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig)

Nov 11 16:34:19 [1908] www.rs2.com        cib:     info: cib_file_write_with_digest:Reading cluster configuration file /var/lib/pacemaker/cib/cib.DU5D4x (digest: /var/lib/pacemaker/cib/cib.zBJmL2)

__________________________________________________________________________________________

[root@www ~]#  grep TOTEM /var/log/cluster/corosync.log //查看初始化成员节点通知是否正常

Nov 11 16:34:07 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Nov 11 16:34:07 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Nov 11 16:34:08 corosync [TOTEM ] The network interface [192.168.139.4] is now up.

Nov 11 16:34:08 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

__________________________________________________________________________________________

[root@www ~]# grep error /var/log/cluster/corosync.log //看启动中是否出现错误.主要是没有 //配置STONISH设备,可以忽略的错误,最后用crm命令 prorerty stonith-enabled=false 便可禁用

Nov 11 16:34:32 [2174] www.rs2.com    pengine:    error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined

Nov 11 16:34:32 [2174] www.rs2.com    pengine:    error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option

Nov 11 16:34:32 [2174] www.rs2.com    pengine:    error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity

___________________________________________________________________________________________

[root@www ~]#  grep pcmk_startup /var/log/cluster/corosync.log //查看pacemaker是否正常                                              //启动

Nov 11 16:34:08 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized

Nov 11 16:34:08 corosync [pcmk  ] Logging: Initialized pcmk_startup

Nov 11 16:34:08 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615

Nov 11 16:34:08 corosync [pcmk  ] info: pcmk_startup: Service: 9

Nov 11 16:34:08 corosync [pcmk  ] info: pcmk_startup: Local hostname:

___________________________________________________________________________________________

[root@www ~]# crm_mon \\可以用来监控集群的当前状态

Last updated: Fri Nov 11 16:19:10 2016          Last change: Fri Nov 11 16:10:18 2016 by hacluster via crmd on www.rs2.com

Stack: classic openais (with plugin)

Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum

2 nodes and 0 resources configured, 2 expected votes

 //两个节点,0个资源,但不知道为什么rs1 为UNCLEAN (offline)

Node www.rs1.com: UNCLEAN (offline)

Online: [ www.rs2.com ]

//将一切停掉,重新生成了一个corosync配置文件后再此启动又变好了

[root@www .ssh]# crm_mon 

Last updated: Fri Oct 28 21:29:51 2016          Last change: Fri Nov 11 22:33:32 2016 by hacluster via crmd on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 0 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ] //两个节点正常

__________________________________________________________________________________________

用crm命令配置集群的资源

[root@www ~]# crm

-bash: crm: command not found

[root@www ~]# rpm -qa pacemaker //pacemaker为1.1.14

pacemaker-1.1.14-8.el6_8.1.x86_64

从pacemaker 1.1.8开始,crm发展成了一个独立项目,叫crmsh。也就是说,我们安装了pacemaker后,并没有crm这个命令,我们要实现对集群资源管理,还需要独立安装crmsh。crmsh的rpm安装可从如下地址下载:

crmsh依赖于许多包如:pssh,因此也需要通过上面地址下载pssh.rpm  上面链接还可以下载corosync和pacemaker但我用的是yum直接装的

 

 

或者直接下载openSUSE的ha集群yum源直接安装

[root@www tool]# wget  

就一个yum库:

[network_ha-clustering_Stable]name=Stable High Availability/Clustering packages (CentOS_CentOS-6)type=rpm-mdbaseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/gpgcheck=1gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6//repodata/repomd.xml.keyenabled=1

[root@www tool]# mv network\:ha-clustering\:Stable.repo /etc/yum.repos.d/

[root@www yum.repos.d]# ll //这是我主机上的所有yum源

total 52

-rw-r--r--. 1 root root  CentOS-Base.repo

-rw-r--r--. 1 root root  CentOS-Debuginfo.repo

-rw-r--r--. 1 root root  2015 CentOS-fasttrack.repo

-rw-r--r--. 1 root root  2015 CentOS-Media.repo

-rw-r--r--. 1 root root  2015 CentOS-Vault.repo

-rw-r--r--. 1 root root  2014 elrepo.repo

-rw-r--r--. 1 root root  2012 epel.repo

-rw-r--r--. 1 root roo   2012 epel-testing.repo

-rw-r--r--. 1 root root  network:ha-clustering:Stable.repo

-rw-r--r--. 1 root root  openSUSE-13.2-NonFree-Update.repo.back

-rw-r--r--. 1 root root  openSUSE-Leap-42.1-Update.repo.bak

-rw-r--r--. 1 root root  zxl.repo

[root@www tool]# yum install crmsh //直接yum安装

 网上找到的很详细的一篇关于crm命令使用

[root@www tool]# crm

crm(live)# help //获取帮助

    cib //cib管理模块

    resource //资源管理模块

    configure //crm配置,包括资源的粘性,资源的类型,资源的约束等

    node  //集群节点管理子命令

    options //用户优先级

    history //crm命令的历史

    site  //地理集群支持

    ra   //管理资源代理

    status //查看集群的状态

    help,? //查看帮助

    end.cd.up //返回上一级

    quit,bye,exit //退出crm

crm(live)# cd resource

crm(live)resource# help

.........................

.........................

crm(live)resource# cd

crm(live)# configure //进入配置模式

crm(live)configure# show //查看集群的当前配置

node www.rs1.com

node www.rs2.com

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2

crm(live)configure# verify //查看配置语法,因为没有安装STONITH设备,所以报错

ERROR: error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined

   error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option

   error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

crm(live)configure# property stonith-enabled=false  //禁用STONISH设备

crm(live)configure# show

node www.rs1.com

node www.rs2.com

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=flase

crm(live)configure# verify //继续检查,不再报错误

crm(live)configure# commit //提交让配置生效

crm(live)configure# cd 

crm(live)# ra

crm(live)ra# help

Resource Agents (RA) lists and documentation

Commands:

classes        //查看RA类型和提供商

info          //查看RA的详细信息 

list          //查看某一个类别下某个提供商所提供的所有RA

providers       //查看指定资源的提供商和类型

validate        // 

     meta          //显示一个RA的源信息

cd            //返回上一层

help           

ls          

quit          

up           //返回上一层

如何获取一个命令的详细信息?

crm(live)ra# help list //获取list命令的详细使用信息

List RA for a class (and provider)

List available resource agents for the given class. If the class

is ocf, supply a provider to get agents which are available

only from that provider.

Usage:

list <class> [<provider>]

Example:

list ocf pacemaker

crm(live)ra# classes //查看RA类型

lsb //lsb类别

ocf /  heartbeat pacemaker //ocf 有两个提供商heartbeat和pacemaker

service

stonith //stonith类别

crm(live)ra# list ocf pacemaker //显示ocf类型下由pacemaker提供的所有RA

ClusterMon    Dummy         HealthCPU     HealthSMART   Stateful      SysInfo SystemHealth  controld      ping          pingd         remote

crm(live)ra# list lsb //显示所有lsb类型所提供的所有RA

auditd              blk-availability    corosync            corosync-notifyd crond               halt                heartbeat           htcacheclean

crm(live)ra# help meta //meta用来显示一个RA的源信息

Usage:

info [<class>:[<provider>:]]<type> 哪一个类型:哪一个提供商:哪一个资源代理(RA)

info <type> <class> [<provider>] (obsolete)

如:

  info apache

  info ocf:pacemaker:Dummy //ocf类型:pacemaker所提供的:Dummy为资源代理

  info stonith:ipmilan

  info pengine

crm(live)ra# meta ocf:heartbeat:IPaddr //查看ocf类别由heartbeat提供资源代理微IPaddr的源信息

Parameters (*: required, []: default): //带*的为必须的,[ ]为默认的

ip* (string): IPv4 or IPv6 address //ip必须有

    The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation)

    example IPv4 "192.168.1.1".

    example IPv6 "2001:db8:DC28:0:0:FC57:D4C8:1FFF".

nic (string): Network interface

......................

........................

Operations' defaults (advisory minimum) //对资源来说,建议的监控最小默认值

    start         timeout=20s //启动资源时最多等待20秒

    stop          timeout=20s //停止资源时最多等待20秒

    status        timeout=20s interval=10s

    monitor       timeout=20s interval=10s //每隔10秒检测一次,若梅检测到等待20秒,否                                 则资源转移

如何得知一个RA是有谁提供的?

在ra子模式下用providers命令可以如?

crm(live)ra# providers IPaddr //查看IPaddr这个资源的提供商,有heartbeat提供

heartbeat

___________________________________________________________________________________________

配置资源

crm(live)ra# cd 

crm(live)# configure

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.139.10 nic=eth0 cidr_netmask=24

primitive定义主资源 webip为资源名称 ocf资源类别:heartbeat为provider:IPaddr为RA

params指定参数 ip 192.168.139.10(必须有) nic=eth0 (默认就是eth0)cidr_netmask=24 (掩码24)

crm(live)configure# show 

node www.rs1.com

node www.rs2.com

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false

crm(live)configure# verify //看有没有错误

crm(live)configure# commit //无错误后提交

crm(live)configure# show xml //也可以查看xml格式的配置,更加详细

  

<?xml version="1.0" ?>

<cib num_updates="2" dc-uuid="www.rs1.com" update-origin="www.rs2.com" crm_feature_set="3.0.10" validate-with="pacemaker-2.4" update-client="cibadmin" epoch="5" admin_epoch="0" update-user="root" cib-last-written="Fri Nov 11 22:42:08 2016" have-quorum="1">

  <configuration>

    <crm_config>

      <cluster_property_set id="cib-bootstrap-options">

        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.14-8.el6_8.1-70404b0"/>

        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>

        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>

        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>

      </cluster_property_set>

    </crm_config>

    <nodes>

      <node id="www.rs1.com" uname="www.rs1.com"/>

      <node id="www.rs2.com" uname="www.rs2.com"/>

    </nodes>

    <resources>

      <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">

        <instance_attributes id="webip-instance_attributes">

          <nvpair name="ip" value="192.168.139.10" id="webip-instance_attributes-ip"/>

          <nvpair name="nic" value="eth0" id="webip-instance_attributes-nic"/>

          <nvpair name="cidr_netmask" value="24" id="webip-instance_attributes-cidr_netmask"/>

        </instance_attributes>

      </primitive>

    </resources>

    <constraints/>

  </configuration>

</cib>

crm(live)configure# cd

crm(live)# 

crm(live)# status //此时资源其实已经开始运行,查看资源运行情况

Online: [ www.rs1.com www.rs2.com ]

Full list of resources:

 webip(ocf::heartbeat:IPaddr):Started  \\可以看到rs1被选为了DC,资源webip运行                                     \\在

___________________________________________________________________________________________

192.168.139.2

[root@www corosync]# ip addr show //可以看到VIP192.168.139.10在eth0:0上

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000

    inet 192.168.139.2/24 brd 192.168.139.255 scope global eth0

    inet 192.168.139.10/24 brd 192.168.139.255 scope global secondary eth0

[root@www .ssh]# crm

crm(live)# resource

crm(live)resource# stop webip //停止webip资源

crm(live)resource# list 

webip(ocf::heartbeat:IPaddr):(target-role:Stopped) Stopped

crm(live)resource# start webip

crm(live)resource# list

 webip(ocf::heartbeat:IPaddr):Started

crm(live)resource# migrate webip //有风险实验迁移资源报错,用强制方法后webip资源启动不了,只能重启corosync

ERROR: resource.move: No target node: Move requires either a target node or 'force'

 用status,可以看到如下错误

* webip_start_0 on www.rs2.com 'not configured' (6): call=12, status=complete, exitreason='none',

    last-rc-change='Sat Oct 29 08:55:24 2016', queued=1ms, exec=250ms

  最后发现我rs2主机是克隆的,上面没有eth0网卡,只有eth1,而webip是定义在eth0上的(^_^)最后将eth1网卡改为了eth0,然后重启操作系统好了,以下是一个改网卡名称的文章

在定义一个httpd资源

_____________________________________________________________

192.168.139.4

[root@www corosync]# rpm -qa httpd //本机无httpd

[root@www corosync]# yum install httpd //直接yum装

[root@www html]# vim index.html

<h1>

[root@www html]# service httpd stop

Stopping httpd:                                            [  OK  ]

[root@www html]# chkconfig httpd off //集群资源千万别让开机自启动

___________________________________________________________________________________________

192.168.139.2

[root@www corosync]# rpm -qa httpd //本机无httpd

[root@www corosync]# yum install httpd //直接yum装

[root@www html]# vim index.html \\编辑httpd主页面,以区别不同的主机

<h1>

[root@www html]# service httpd stop

Stopping httpd:                                            [  OK  ]

[root@www html]# chkconfig httpd off \\集群资源千万不能开机自启动

___________________________________________________________________________________________

192.168.139.4

[root@www corosync]# rpm -qa httpd //本机无httpd

[root@www corosync]# yum install httpd //直接yum装

[root@www html]# vim index.html

<h1>

[root@www html]# service httpd stop

Stopping httpd:                                            [  OK  ]

[root@www html]# chkconfig httpd off

___________________________________________________________________________________________

192.168.139.2

[root@www ~]# crm

crm(live)# cd resource

crm(live)resource# list

 webip(ocf::heartbeat:IPaddr):Started

crm(live)resource# cd ..

crm(live)# cd ra

crm(live)ra# providers httpd //可以看到httpd无提供商

crm(live)ra# list lsb //httpd这个ra属于ocf类别

auditd             blk-availability   corosync          corosync-notifyd   crond             halt            htcacheclean       httpd

crm(live)ra# meta lsb:httpd //且用meta可以看到无其他参数,只有一些Operation

start and stop Apache HTTP Server (lsb:httpd)

server implementing the current HTTP standards.

Operations' defaults (advisory minimum):

    start          timeout=15

    stop          timeout=15

    status         timeout=15

    restart        timeout=15

    force-reload     timeout=15

    monitor        timeout=15 interval=15

crm(live)ra# cd 

crm(live)# configure

crm(live)configure# primitive httpd lsb:httpd op start timeout=20 \\定义httpd主资源

crm(live)configure# show

node www.rs1.com

node www.rs2.com

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# cd

crm(live)# status

Last updated: Sat Oct 29 10:39:04 2016Last change: Sat Oct 29 08:33:08 2016 by root via cibadmin on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到webip运行在rs1,而httpd运行在rs2 

 webip(ocf::heartbeat:IPaddr):Started www.rs1.com

 httpd(lsb:httpd):Started www.rs2.com

___________________________________________________________________________________________

192.168.139.4

[root@www ~]# netstat -tnlp |grep httpd

tcp      0      0 :::80         LISTEN      1718/httpd   

浏览器访问192.168.139.4

___________________________________________________________________________________________

192.168.139.2

将两个资源定义为一个组,让一起运行在同一个节点

crm(live)configure# help group \\不懂就help

Define a group

Usage:

group <name> <rsc> [<rsc>...]

 \\group 组名 资源1 资源2 还可以描述组description,定义组的params,及meta属性,组的params有哪些要查官方文档

  [description=<description>] \\描述

  [meta attr_list] \\meta属性

  [params attr_list] \\组的params

attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>

Example:

group internal_www disk0 fs0 internal_ip apache \

  meta target_role=stopped

group vm-and-services vm vm-sshd meta container="vm" \\vm-and-service 组名 vm 资源1 vm-sshd 资源2 meta container="vm" meta属性

crm(live)configure# group webserver webip httpd \\webserver 组名 webip httpd为组中的两个资                                   \\源

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

node www.rs1.com

node www.rs2.com

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

group webserver webip httpd

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

crm(live)configure# cd 

crm(live)# status

 cibadmin on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources:

 Resource Group: webserver \\资源组webserver定以后,两个资源会运行在一个节点上

     webip(ocf::heartbeat:IPaddr):Started www.rs1.com

     httpd(lsb:httpd):Started

浏览器测试192.168.139.10

crm(live)# node

crm(live)node# standby \\让rs1成为备用节点,资源转移到rs2上

crm(live)node# cd

crm(live)# status \\资源成功从rs1转移到了rs2

Last updated: Sat Oct 29 11:32:08 2016Last change: Sat Oct 29 11:31:51 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

 *{这里为什么不是without qurum,难道standby后还可以投票?}

2 nodes and 2 resources configured, 2 expected votes

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources: \\并且rs1被standby后资源照样运行正常,应该是只剩下rs2后票数只有一票                  

 Resource Group: webserver  \\票数只有一票,没有超过一半,资源被stop

     webip(ocf::heartbeat:IPaddr):Started www.rs2.com

     httpd(lsb:httpd):Started www.rs2.com

crm(live)# node

crm(live)node# online \\让重新上线

crm(live)node# cd

crm(live)# status

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: 

 Resource Group: webserver \\重新上线,票数够了,资源又启动

     webip(ocf::heartbeat:IPaddr):Started www.rs2.com

     httpd(lsb:httpd):Started www.rs2.com

这次直接让rs2停掉

192.168.139.4

[root@www ~]# service corosync stop

Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]

Waiting for corosync services to unload:.                  [  OK  ]

192.168.139.2

crm(live)# status

Last updated: Sat Oct 29 11:53:25 2016Last change: Sat Oct 29 11:52:39 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum

{这次是without quorum 没有达到法定票数,看来只有停掉服务才不能投票,standby后仍然可以}

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com ]

OFFLINE: [ www.rs2.com ]

Full list of resources:

 Resource Group: webserver \\票数没有到法定票数,默认会stop资源

     webip(ocf::heartbeat:IPaddr):Stopped

     httpd(lsb:httpd):Stopped

192.168.139.4

[root@www ~]# service corosync start

192.168.139.2

crm(live)# status

Last updated: Sat Oct 29 11:59:36 2016Last change: Sat Oct 29 11:52:39 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\rs2启动后,资源又启动了

 Resource Group: webserver

     webip(ocf::heartbeat:IPaddr):Started www.rs1.com

     httpd(lsb:httpd):Started www.rs1.com

将不够法定票数时的默认操作改为ignore

crm(live)# configure

crm(live)configure# property no-quorum-policy=ignore

 crm(live)configure# show

node www.rs1.com \

attributes standby=off

node www.rs2.com \

attributes standby=off

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

group webserver webip httpd

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore

crm(live)configure# verify

crm(live)configure# commit

192.168.139.4

[root@www ~]# service corosync stop

192.168.139.2

crm(live)# status

Last updated: Sat Oct 29 12:03:53 2016Last change: Sat Oct 29 12:03:25 2016 by root via cibadmin on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum

{without quorum 不够法定票数}

2 nodes and 2 resources configured, 2 expected votes

Online: [ www.rs1.com ]

OFFLINE: [ www.rs2.com ]

Full list of resources: \\但是服务照样运行,因为ignore

 Resource Group: webserver

     webip(ocf::heartbeat:IPaddr):Started www.rs1.com

     httpd(lsb:httpd):Started www.rs1.com

192.168.139.4

[root@www ~]# service corosync start

[root@www ~]# crm

crm(live)# node

crm(live)node# standby

crm(live)node# cd 

crm(live)# status

Last updated: Sat Oct 29 10:03:51 2016Last change: Sat Oct 29 10:03:46 2016 by root via crm_attribute on www.rs1.com

Stack: classic openais (with plugin)

Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

{此处仍然够票数,看来standby后仍然可以投票是对的}

2 nodes and 2 resources configured, 2 expected votes

Node www.rs2.com: standby

Online: [ www.rs1.com ]

Full list of resources:

 Resource Group: webserver \\已经为ignore,票数够不够资源都运行

     webip(ocf::heartbeat:IPaddr):Started www.rs1.com

     httpd(lsb:httpd):Started www.rs1.com

crm(live)# node 

crm(live)node# online

不用定义组直接用约束,让资源在一起运行

crm(live)# resource 

crm(live)resource# stop webserver

crm(live)resource# cleanup webserver

crm(live)resource# cd 

crm(live)# configure

crm(live)configure# delete webserver

crm(live)configure# show

node www.rs1.com \

attributes standby=off

node www.rs2.com \

attributes standby=off

primitive httpd lsb:httpd \

op start timeout=20 interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

meta target-role=Started

property cib-bootstrap-options: \

dc-version=1.1.14-8.el6_8.1-70404b0 \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore \

last-lrm-refresh=1477714758

crm(live)configure# verify

crm(live)configure# commit

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到两个资源又运行在不同节点上了

 webip(ocf::heartbeat:IPaddr):Started www.rs1.com

 httpd(lsb:httpd):Started

定义colocation(资源与资源是否能运行在同一个节点,inf表示无穷大)

crm(live)# configure

crm(live)configure# colocation webip_with_httpd inf: webip httpd \\定义排列约束,约束两个资源

crm(live)configure# show

.........

colocation webip_with_httpd inf: webip httpd \\好像定义反了,这是httpd在哪,webip在哪;应该改为webip在哪,httpd在哪,谁在后谁做主

crm(live)configure# edit \\直接用edit编辑改

colocation webip_with_httpd inf: webip httpd 

改为

colocation webip_with_httpd inf: httpd webip 

crm(live)configure# show xml

 <rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>

crm(live)configure# commit

crm(live)configure# cd 

crm(live)# status

.........

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\两个资源又运行在了一个节点上

 webip(ocf::heartbeat:IPaddr):Started www.rs1.com

 httpd(lsb:httpd):Started www.rs1.com

这样就用 colocation 排列约束将两个资源绑定了,资源启动也有先后顺序,定义Order顺序约束

crm(live)# configure

crm(live)configure# help order

Usage:

order <id> [{kind|<score>}:] first then [symmetrical=<bool>]

order <id> [{kind|<score>}:] resource_sets [symmetrical=<bool>]

kind :: Mandatory | Optional | Serialize 强制的|随意的|连续

first :: <rsc>[:<action>] \\资源后还可以定义action,将一个资源启动后采取什么操作在启动另一个,这些操作在resource下如start stop promote......

then ::  <rsc>[:<action>]

resource_sets :: resource_set [resource_set ...]

crm(live)configure# order webip_before_httpd mandatory: webip httpd \\webip_before_httpd 是id  mandatory 是kind,还可以是score: webip先启动 httpd后启动

crm(live)configure# commit

crm(live)configure# show xml

 <rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>

      <rsc_order id="webip_before_httpd" kind="Mandatory" first="webip" then="httpd"/>

first webip,then httpd

crm(live)configure# cd 

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\当前在rs1上运行

 webip(ocf::heartbeat:IPaddr):Started www.rs1.com

 httpd(lsb:httpd):Started www.rs1.com

crm(live)# node

crm(live)node# standby \\让rs1变为standby

crm(live)node# cd

crm(live)# status

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources: \\切换太快,没看出谁先启动的(^_^),反正资源转移了

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started www.rs2.com

crm(live)# node 

crm(live)node# online \\让rs1再上线

crm(live)node# cd 

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\但是资源没有回来

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started www.rs2.com

如果想让上线后资源又转移回来怎么办?

定义location,位置约束(资源倾向运行在哪个节点)

crm(live)# configure

crm(live)configure# help location

Usage:

location <id> <rsc> [<attributes>] {<node_pref>|<rules>}

........

node_pref :: <score>: <node>

rules :: \\规则可以用表达式定义

  rule [id_spec] [$role=<role>] <score>: <expression>

  [rule [id_spec] [$role=<role>] <score>: <expression> ...]

location conn_1 internal_www \  conn_1 是id/名称 internal_www 是资源名

  rule 50: #uname eq node1 \ 规则为 当uname等于node1时分数为50

crm(live)configure# location wibip_on_rs1 webip rule 100: #uname eq www.rs1.com

  \\当uname等于

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show xml

   

 <rsc_location id="wibip_on_rs1" rsc="webip">

        <rule score="100" id="wibip_on_rs1-rule">

          <expression attribute="#uname" operation="eq" value="www.rs1.com" id="wibip_on_rs1-rule-expression"/>

crm(live)configure# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\location已经生效所以资源自动转移到了rs1

 webip(ocf::heartbeat:IPaddr):Started www.rs1.com

 httpd(lsb:httpd):Started www.rs1.com

crm(live)# node 

crm(live)node# standby \\rs1转为备节点

crm(live)node# cd 

crm(live)# status

Node www.rs1.com: standby

Online: [ www.rs2.com ]

Full list of resources:\\资源转移到了rs2

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started www.rs2.com

crm(live)# node 

crm(live)node# online

crm(live)node# cd 

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\rs1上线后资源从rs2转移回来了

 webip(ocf::heartbeat:IPaddr):Started www.rs1.com

 httpd(lsb:httpd):Started www.rs1.com

为资源定义粘性(资源是否倾向运行在当前节点)

crm(live)# configure

crm(live)configure# rsc_defaults resource-stickiness=200 \\定义资源的粘性为200

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show xml

 <meta_attributes id="rsc-options">

        <nvpair name="resource-stickiness" value="200" id="rsc-options-resource-stickiness"/>

      </meta_attributes>

crm(live)configure# cd

crm(live)# node standby

crm(live)# status

Node www.rs1.com: standby 

Online: [ www.rs2.com ]

Full list of resources: \\资源转移到了rs2

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started www.rs2.com

crm(live)# node online \\重新上线

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\因为粘性stickiness(200)大于倾向性location(100),所以资源不会                  \\再转移回rs1

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started

  再加一个FileSystem,及192.168.139.8 NFS-Server,共享一个主页面让无论哪个节点运行资源,其通过浏览器访问的页面相同

_____________________________________________________________

192.168.139.8

[root@www ~]# vim /etc/exports 

/web/htdocs 192.168.139.0/24 (ro)

[root@www local]# cd /web/htdocs/

[root@www htdocs]# vim index.html

<h1>

[root@www ~]# service iptables stop

[root@www ~]# service nfs start

___________________________________________________________________________________________

192.168.139.4

root@www ~]# mount 192.168.139.8:/web/htdocs /mnt

[root@www ~]# cd /mnt

[root@www mnt]# ll

total 4

-rw-r--r--. 1 nobody nobody 21 Nov 12  2016 index.html

[root@www mnt]# cd

[root@www ~]# umount /mnt/

[root@www ~]# crm 

crm(live)# ra

crm(live)ra# list ocf \\Filesystem属于ocf类别

Filesystem         HealthCPU          HealthSMART        IPaddr

crm(live)ra# providers Filesystem \\Filesystem由heartbeat提供

heartbeat

crm(live)ra# meta ocf:heartbeat:Filesystem

device* (string): block device \\ddevice必须有

    The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.

directory* (string): mount point \\挂载点必须有

    The mount point for the filesystem.

fstype* (string): filesystem type \\文件系统必须有

    The type of filesystem to be mounted.

options (string): \\-o 指定挂载时的操作

    Any extra options to be given as -o options to mount.

    

    For bind mounts, add "bind" here and set fstype to "none".

    We will do the right thing for options such as "bind,ro".

crm(live)ra# cd

crm(live)# configure

crm(live)configure# primitive nfs ocf:heartbeat:Filesystem params  device=192.168.139.8:/web/htdocs/ directory=/var/www/html/ fstype=nfs op monitor timeout=60s

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

primitive nfs Filesystem \

params device="192.168.139.8:/web/htdocs/" directory="/var/www/html/" fstype=nfs \

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

order webip_before_httpd Mandatory: webip httpd

colocation webip_with_httpd inf: httpd webip

location wibip_on_rs1 webip \

rule 100: #uname eq www.rs1.com \

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore \

last-lrm-refresh=1477714758

rsc_defaults rsc-options: \

resource-stickiness=200

crm(live)configure# cd

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\可以看到三个资源都启动了,webip和httpd在一起都运行在rs2上,而nfs                  \\运行在rs1上,并且

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started www.rs2.com

 nfs(ocf::heartbeat:Filesystem):Started

___________________________________________________________________________________________

192.168.139.2

[root@www ~]# cd /var/www/html/

[root@www html]# ll

total 4

-rw-r--r--. 1 nobody nobody 21 Nov 12  2016 index.html

[root@www html]# vim index.html 

<h1> \\NFS共享的页面已经挂载了  

如何让三个资源运行在一个节点上?

为Filestytem定义location和order

crm(live)configure# colocation nfs_with_webip  inf: nfs webip \\nfs跟随webip,webip在哪nfs                                            \\在哪

crm(live)configure# order webip_before_nfs mandatory: webip nfs \\先启动webip,再启动nfs

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# show

primitive nfs Filesystem \

params device="192.168.139.8:/web/htdocs/" directory="/var/www/html/" fstype=nfs \

op monitor timeout=60s interval=0

primitive webip IPaddr \

params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \

colocation nfs_with_webip inf: nfs webip

order webip_before_httpd Mandatory: webip httpd

colocation webip_with_httpd inf: httpd webip

location wibip_on_rs1 webip \

rule 100: #uname eq www.rs1.com

expected-quorum-votes=2 \

stonith-enabled=false \

no-quorum-policy=ignore \

resource-stickiness=200

crm(live)configure# show xml

<rsc_order id="webip_before_httpd" kind="Mandatory" first="webip" then="httpd"/>

      <rsc_order id="webip_before_nfs" kind="Mandatory" first="webip" then="nfs"/>

 <rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>

      <rsc_colocation id="nfs_with_webip" score="INFINITY" rsc="nfs" with-rsc="webip"/>

crm(live)# status

2 nodes and 3 resources configured, 2 expected votes \\三个资源两个节点,期望票数为两票

Online: [ www.rs1.com www.rs2.com ]

Full list of resources:    \\可以看到所有的资源都在rs2上了,因为资源黏性200,webip在rs1上location只有100,且在未配置Filesystem前,webip和httpd都运行在rs2上,所以现在三个资源都在rs2上

 webip(ocf::heartbeat:IPaddr):Started www.rs2.com

 httpd(lsb:httpd):Started www.rs2.com

 nfs(ocf::heartbeat:Filesystem):Started

crm(live)# q

bye

[root@www html]# mount \\rs2上可以看到nfs已经挂载

192.168.139.8:/web/htdocs/ on /var/www/html type nfs (rw,vers=4,addr=192.168.139.8,clientaddr=192.168.139.4)

[root@www html]# cd /var/www/html/

[root@www html]# ll

total 4

-rw-r--r--. 1 nobody nobody 21 Nov 12  2016 index.html

[root@www html]# vim index.html \\可以看到NFS-Server共享的页面

<h1>

浏览器测试

[root@www html]# crm

crm(live)# node

crm(live)node# standby \\让rs2 standby

crm(live)# status

Online: [ www.rs1.com www.rs2.com ]

Full list of resources: \\资源全部转移到了rs1

 webip (ocf::heartbeat:IPaddr): Started www.rs1.com

 httpd (lsb:httpd): Started www.rs1.com

 nfs (ocf::heartbeat:Filesystem): Started

浏览器访问,仍然是 无论访问哪个节点,web页面一样