Postgresqlレプリケーション Slony-I構築メモ

ここではPostgreSQLのレプリケーションツール、Slony-Iについての構築と運用方法をメモしておきます。

Slony-Iについて

特徴
- マスタースレイブ型のレプリケーションシステム
- 同期対象のテーブルについては、マスタのみ更新可能。スレーブは参照のみ。
- 同期対象外のテーブルはどのノードでもフルアクセス可能。
- カスケード構成可能
- テーブル単位でレプリケーションの指定が可能
- プライマリキーの無いテーブルのレプリケーションも可能(注意点が多いので、全力で回避するほうがいいです)
- 同期されるのは、insert,update,deleteのみ
- DDLは同期されない(同期対象のテーブルに流すと、DDLによっては同期が壊れる)
- レプリケーション単位をset、setに含まれるDBをnodeと呼ぶ
- レプリケーション構成は、set単位で管理される

メリット
- 異なるバージョンのPostgreSQLのDBの同期が可能(8.3系 <-> 9.0系でもOK!)
- スレーブノードを増やしてもマスタノードにほとんど負荷はかからない
- 運用中にレプリケーションを構築できる。しかも同期が速い、マスタノードの負荷が低い
- 運用中にレプリケーションの構成を変更することができる
- 運用中にマスタノードを変更することができる(switch over)
- 手軽にレプリケーションデータの再同期を行える
- ノードごとにインデックスやプラグインを設定することができる
- 複数の異なるDBのテーブルを混ぜあわせたDBを構築することができる

デメリット
- 構築がめんどくさい
- 運用するには、slonyコマンドの理解が必要
- カラム、テーブルの追加、削除は専用のコマンドや手順が必要。
- Slony-Iのバージョンが異なると、同期が行えない。(マイナーバージョン違いでもだめな事が多かった･･･)

個人的な感想ですが、PostgreSQLのメジャーアップデート時のデータ移行や、slaveDBに(全文)検索機能を構築するのに便利です。
ただ、テーブルの追加や、vacuum fullなどの運用がめんどくさい。。。
さらに、プライマリキーの無いテーブルの同期は注意が必要。
ノード毎にDBの構成を変える必要がなければ、PostgreSQL9.0.2以降のストリーミングレプリケーションを使うほうがいいです。

構築

インストール

Slony-I

Slony-I 1.2系の最新版のslony1-1.2.21をインストールします。
PostgreSQLは既にインストール済みの前提です。

cd /usr/local/src
wget http://main.slony.info/downloads/1.2/source/slony1-1.2.21.tar.bz2
tar xvfj slony1-1.2.21.tar.bz2

cd slony1-1.2.21
./configure \
--prefix=/usr/local/slony \
--with-pgconfigdir=/usr/local/pgsql/bin \
--with-perltools
make

su
make install

exit

configureの指定

--prefix	インストールディレクトリ指定
--with-pgconfigdir	PostgreSQLのpg_configコマンドのディレクトリ
--with-perltools	追加で、slony管理スクリプトのインストールする

また、Slony-IのログのローテートでApacheのrotatelogsを使うため、必要ならApacheもあわせてインストールしてください。(インストールオプションは何でも大丈夫です)

レプリケーション構成

次のような構成で構築します。

レプリケーション構築

大まかな手順

(1)設定ファイル作成
(2)レプリケーションスキーマ作成
(3)マスタDBにレプリケーション構成作成
(4)slonyプロセス起動
(5)スレーブDBにレプリケーション構成作成(購読設定)
(6)同期確認

(1)設定ファイルの準備
Slony設定ファイル/usr/local/slony/etc/slon_tools_sample.confを編集します。
/usr/local/slony/etc/slon_tools.conf-sampleをコピーして作成します。

/usr/local/slony/etc/slon_tools_sample.conf

# $Id: slon_tools.conf-sample,v 1.7.2.2 2009-08-17 22:21:43 devrim Exp $
# Author: Christopher Browne
# Copyright 2004-2009 Afilias Canada
# Revised extensively by Steve Simms

# Keeping the following three lines for backwards compatibility in
# case this gets incorporated into a 1.0.6 release.
#
# TODO: The scripts should check for an environment variable
# containing the location of a configuration file.  That would
# simplify this configuration file and allow Slony-I tools to still work
# in situations where it doesn't exist.
#
if ($ENV{"SLONYNODES"}) {
    require $ENV{"SLONYNODES"};
} else {

    # The name of the replication cluster.  This will be used to
    # create a schema named _$CLUSTER_NAME in the database which will
    # contain Slony-related data.
    $CLUSTER_NAME = 'replication_sample';

    # The directory where Slony should record log messages.  This
    # directory will need to be writable by the user that invokes
    # Slony.
    $LOGDIR = '/var/log/slony1';

    # (Optional) If you would like to use Apache's rotatelogs tool to
    # manage log output, uncomment the following line and ensure that
    # it points to the executable.
    #
    # $APACHE_ROTATOR = '/usr/local/apache/bin/rotatelogs';

    # SYNC check interval (slon -s option)
    # $SYNC_CHECK_INTERVAL = 1000;

    # Which node is the default master for all sets?
    $MASTERNODE = 1;

    # Include add_node lines for each node in the cluster.  Be sure to
    # use host names that will resolve properly on all nodes
    # (i.e. only use 'localhost' if all nodes are on the same host).
    # Also, note that the user must be a superuser account.

    add_node(node     => 1,
             host     => '192.168.0.100',
             dbname   => 'testdb',
             port     => 5432,
             user     => 'postgres',
             password => '');

    add_node(node     => 2,
             host     => '192.168.0.101',
             dbname   => 'testdb',
             port     => 5432,
             user     => 'postgres',
             password => '');

    add_node(node     => 3,
             host     => '192.168.0.102',
             dbname   => 'testdb',
             port     => 5432,
             user     => 'postgres',
             password => '');

    # If the node should only receive event notifications from a
    # single node (e.g. if it can't access the other nodes), you can
    # specify a single parent.  The downside to this approach is that
}

# The $SLONY_SETS variable contains information about all of the sets
# in your cluster.

$SLONY_SETS = {

    # A unique name for the set
    "set_sample" => {

        # The set_id, also unique
        "set_id" => 1,

        # Uncomment the following line to change the origin
        # (a.k.a. master) for the set.  The default is $MASTERNODE.
        #
        "origin" => 1,

        # If this is set to 1, table and sequence names will be folded to lower-case
        # to match the way that PostgreSQL handles unquoted names.
        # For example, CREATE TABLE ACCOUNT(...) actually turns into CREATE TABLE account(...);
        # unless you put quotes around the table name
        # Slony always quotes object names, so you may get a mis-match between the table-name
        # as PostgreSQL understands it, and as Slony represents it.
        # default value is 0
        #
        # foldCase => 0,

        # The first ID to use for tables and sequences that are added
        # to the replication cluster.  This must be unique across the
        # cluster.
        #
        # TODO: This should be determined automatically, which can be
        # done fairly easily in most cases using psql.  create_set
        # should derive it, and give an option to override it with a
        # specific value.
        "table_id"    => 1000,
        "sequence_id" => 1000,

        # This array contains a list of tables that already have
        # primary keys.
        "pkeyedtables" => [
                           'table1',
                           'table2',
                           ],

        # For tables that have unique not null keys, but no primary
        # key, enter their names and indexes here.
        "keyedtables" => {},

        # If a table does not have a suitable key or set of keys that
        # can act as a primary key, Slony can add one.
        #
        # Note: The Slony development team does not recomment this
        # approach -- you should create your own primary keys instead.
        "serialtables" => [],

        # Sequences that need to be replicated should be entered here.
        "sequences" => ['table1_seq',
                        'table2_seq',
                        ],
    },

};

# Keeping the following three lines for backwards compatibility in
# case this gets incorporated into a 1.0.6 release.
#
# TODO: The scripts should check for an environment variable
# containing the location of a configuration file.  That would
# simplify this configuration file and allow Slony tools to still work
# in situations where it doesn't exist.
#
if ($ENV{"SLONYSET"}) {
    require $ENV{"SLONYSET"};
}

# Please do not add or change anything below this point.
1;

CLUSTER_NAME	slonyのレプリケーションクラスタ名。slony操作は、クラスタに対してsetやnodeを指定して実行します。また、管理スキーマはここで指定した_replication_sampleといった名前になります。
LOGDIR	ログ格納ディレクトリ
APACHE_ROTATOR	ログローテート
MASTERNODE	マスタノード。node番号を指定します。
add_node	ノード設定。node(ノード番号)毎にDBを割り当てます。
SLONY_SETS	set設定。
set_id	セット番号
origin	そのセットのマスタnode番号。MASTERNODEと同じなら必要ない
table_id	set内で管理するテーブルに割り当てるIDの最初の番号。以降、連番になる。レプリケーション単位でユニークにする必要がある。
sequence_id	set内で管理するシーケンスに割り当てるIDの最初の番号。以降、連番になる。レプリケーション単位でユニークにする必要がある。
pkeyedtables	同期対象とするテーブルのうちプライマリキーを持つもののリストを設定する
keyedtables	同期対象とするテーブルのうち、プライマリキーを持たず、レコードを特定するユニークなキーを持つもののリストを設定する
serialtables	同期対象とするテーブルのうちレコードを特定するユニークなキーを持たないもののリストを設定する
sequences	pkeyedtables、keyedtablesのキーがシーケンスのとき、そのシーケンスを指定する

(2)レプリケーションスキーマ作成
初期設定。Slonyのレプリケーション管理スキーマがslon_tools_sample.confで
指定した全てのノードのDBで作成されます
これは、マスタノードで実行します。

at 192.168.0.100

cd /usr/local/slony/bin/
slonik_init_cluster --config=../etc/slon_tools_sample.conf | slonik

生成されるスクリプト

# INIT CLUSTER
cluster name = replication_sample;
 node 1 admin conninfo='host=192.168.0.100 dbname=testdb user=postgres port=5432';
 node 2 admin conninfo='host=192.168.0.101 dbname=testdb user=postgres port=5432';
 node 3 admin conninfo='host=192.168.0.102 dbname=testdb user=postgres port=5432';
  init cluster (id = 1, comment = 'Node 1 - testdb@192.168.0.100');

# STORE NODE
  store node (id = 2, event node = 1, comment = 'Node 2 - testdb@192.168.0.101');
  store node (id = 3, event node = 1, comment = 'Node 3 - testdb@192.168.0.102');
  echo 'Set up replication nodes';

# STORE PATH
  echo 'Next: configure paths for each node/origin';
  store path (server = 1, client = 2, conninfo = 'host=192.168.0.100 dbname=testdb user=postgres port=5432');
  store path (server = 1, client = 3, conninfo = 'host=192.168.0.100 dbname=testdb user=postgres port=5432');
  store path (server = 2, client = 1, conninfo = 'host=192.168.0.101 dbname=testdb user=postgres port=5432');
  store path (server = 2, client = 3, conninfo = 'host=192.168.0.101 dbname=testdb user=postgres port=5432');
  store path (server = 3, client = 1, conninfo = 'host=192.168.0.102 dbname=testdb user=postgres port=5432');
  store path (server = 3, client = 2, conninfo = 'host=192.168.0.102 dbname=testdb user=postgres port=5432');
  echo 'Replication nodes prepared';
  echo 'Please start a slon replication daemon for each node';

(3)マスタDBにレプリケーション構成作成
レプリケーションを行うテーブルのセット情報の設定。
セット番号はslon_tools_sample.confで作成したものを使用します。
これは、マスタノードで実行します。

at 192.168.0.100

slonik_create_set --config=../etc/slon_tools_sample.conf set1 | slonik

生成されるスクリプト

cluster name = replication_sample;
 node 1 admin conninfo='host=192.168.0.100 dbname=testdb user=postgres port=5432';
 node 2 admin conninfo='host=192.168.0.101 dbname=testdb user=postgres port=5432';
 node 3 admin conninfo='host=192.168.0.102 dbname=testdb user=postgres port=5432';

# TABLE ADD KEY

# CREATE SET
  try {
    create set (id = 1, origin = 1, comment = 'Set 1 for replication_sample');
  } on error {
    echo 'Could not create subscription set 1 for replication_sample!';
    exit -1;
  }

# SET ADD TABLE
  echo 'Subscription set 1 created';
  echo 'Adding tables to the subscription set';
  set add table (set id = 1, origin = 1, id = 1000,
                 full qualified name = 'public.table1',
                 comment = 'Table public.table1 with primary key');
  echo 'Add primary keyed table public.table1';
  set add table (set id = 1, origin = 1, id = 1001,
                 full qualified name = 'public.table2',
                 comment = 'Table public.table2 with primary key');
  echo 'Add primary keyed table public.table2';

# SET ADD SEQUENCE
  echo 'Adding sequences to the subscription set';
  set add sequence (set id = 1, origin = 1, id = 1000,
                    full qualified name = 'public.table1_seq',
                    comment = 'Sequence public.table1_seq');
  echo 'Add sequence public.table1_seq';
  set add sequence (set id = 1, origin = 1, id = 1001,
                    full qualified name = 'public.table2_seq',
                    comment = 'Sequence public.table2_seq');
  echo 'Add sequence public.table2_seq';
  echo 'All tables added';

(4)slonyプロセス起動
slony起動。slon_tools_sample.confでしていたノード番号を指定します。
格サーバで実行します。

at 192.168.0.100

slon_start --config=../etc/slon_tools_sample.conf node1

at 192.168.0.101

slon_start --config=../etc/slon_tools_sample.conf node2

at 192.168.0.102

slon_start --config=../etc/slon_tools_sample.conf node3

(5)スレーブDBにレプリケーション構成作成(購読設定)
slaveDBの購読設定。set番号とnode番号を指定します。
これで、レプリケーションが開始されます。

at 192.168.0.101

slonik_subscribe_set --config=../etc/slon_tools_sample.conf set1 node2 | slonik

at 192.168.0.102

slonik_subscribe_set --config=../etc/slon_tools_sample.conf set1 node3 | slonik

生成されるスクリプト(set1 node2)

cluster name = replication_sample;
 node 1 admin conninfo='host=192.168.0.100 dbname=testdb user=postgres port=5432';
 node 2 admin conninfo='host=192.168.0.101 dbname=testdb user=postgres port=5432';
 node 3 admin conninfo='host=192.168.0.102 dbname=testdb user=postgres port=5432';
  try {
    subscribe set (id = 1, provider = 1, receiver = 2, forward = yes);
  }
  on error {
    exit 1;
  }
  echo 'Subscribed nodes to set 1';

(6)同期確認

同期状態確認

_replication_sample.sl_statusビューのsl_lag_time(遅延時間)が、レプリケーションの遅延時間の目安になります。
この時間が十数秒以下になれば、同期完了です。

psql sample;
sample=# select * from _replication_sample.sl_status;

 st_origin | st_received | st_last_event |      st_last_event_ts      | st_last_received |    st_last_received_ts     | st_last_received_event_ts  | st_lag_num_events |   st_lag_time
-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------
         1 |           3 |      31422929 | 2011-04-14 14:47:36.318008 |         31422921 | 2011-04-14 14:47:28.309014 | 2011-04-14 14:47:28.305486 |                 8 | 00:00:08.317434
         1 |           2 |      31422929 | 2011-04-14 14:47:36.318008 |         31422921 | 2011-04-14 14:47:28.835874 | 2011-04-14 14:47:28.305486 |                 8 | 00:00:08.317434
(2 rows)

上記のテーブルも、pgadminツールを使うともっと簡単に確認できます。

ログ確認

ログを監視。
ERRORが出たら失敗。レプリケーションスキーマを削除し、構築のやり直しを行ってください。
pg_dump(データをINSERTコマンドとしてダンプ)で半日くらいかかるデータでも、30分くらいでデータコピーまで完了します。

cd /var/log/slony/slony(set_id)/node(node_number)/
tail -f sample-YYYY-mm-DD_HH:SS:MM.log

運用メモ

DBのメンテナンス

vacuum fullなどのメンテナンスは、ノード単位で実行する必要があります。(Postgresql9.0のS/Rは、マスタノードにvacuum fullをかけると、スレーブノードにまで適用されます。。。)

vacuum full

vacuum fullを行うと、テーブル単位でロックしてしまうため、安全に行うには、slonプロセスをkillして行う必要があります。運用中にvacuum fullと同等の処理を行いたい場合は、pg_reorgを使用するとよいです。

pg_reorg

http://reorg.projects.postgresql.org/index-ja.html
サービス運用中に実行可能なvacuum full代替。かなり便利です。
○できること

- オンライン CLUSTER (cluster index順に行を並び替える)
- ユーザの指定した順に行を並び替える
- オンライン VACUUM FULL (行の詰め合わせを行う)

○注意点

- スーパーユーザのみが実行することができます
- 対象のテーブルはPRIMARY KEYを持っている必要があります

動作内容は
(1)DBのコピーを作成
(2)コピーに対してレコードを再編成(並び替え)
(3)元のDBにリネーム
(4)差分を適用
といった感じです。

- インストール

パスの設定などがややこしいので、postgresqlのコンパイルディレクトリを利用してインストールします。
（postgresqlのコンパイルディレクトリのcontrib配下に展開すると、楽にインストールを行えます。）

cd /usr/local/src
wget http://pgfoundry.org/frs/download.php/2845/pg_reorg-1.1.5.tar.gz
tar xvfz pg_reorg-1.1.5.tar.gz -C /usr/local/src/postgresql-8.4.4/contrib/
cd /usr/local/src/postgresql-8.4.4/contrib/pg_reorg-1.1.5
make
su -

cd /usr/local/src/postgresql-8.4.4/contrib/pg_reorg-1.1.5
make install

pg_reorgの関数を登録

psql -f $PGSHARE/contrib/pg_reorg.sql DB名

実行例

# データベースhogeにオンライン CLUSTERを実行する
pg_reorg -U user -d hoge
# データベースfugaにオンライン VACUUM FULLを実行する
pg_reorg -U user -d fuga -n

APPENDIX

slonik : slonyの設定情報を読み込ませる。

slonik_init_cluster : Clusterを作成する。Cluster単位でレプリケーションが行われる。構成情報を標準出力にはき出す。

slonik_create_set : どのテーブルをレプリケーションするかの設定。1 Clusterに複数set持てる。構成情報を標準出力にはき出す。

slonik_subscribe_set : Slave側で、Masterをsubscribeする。Slaveの数だけこのコマンドを実行する。構成情報を標準出力にはき出す。

slon_start (ノード番号) : slony start

slon_kill (ノード番号) : slony kill

slony_show_configuration : 設定情報表示