ClustrixDBに関するよくある質問

ClustrixDBとは何か？

ClustrixDBは民生品のハードウェアおよび並列ソフトウェアに基づく非共有型のクラスタ型スケーラブルデータベースです。Parallelism throughout the system integrates the various nodes of the cluster into one very large (huge) database, from both programming and management perspectives. There are no bottlenecks and no single points of failure. クエリ処理のサポートには全てのプロセッサーの協力を得ます。Queries are parallelized and distributed across the cluster to the relevant data. 新しいノードは自動的に認識されクラスタに組み込まれます。負荷とデータはクラスタ内の全てのノードに渡って自動的に平均化されます。Cluster-wide SQL relational calculus and ACID properties eliminate multi-node complexity from the development and management of multi-tiered applications. 既存のdbモデルが大量のデータをスケールする際に一般的に必要とされる複雑さは除外されています。 And as your database grows, just add nodes.

ClustrixDBの主な機能な何ですか？

ACID transactions and relational calculus across all nodes of the cluster, as opposed to doing without, or developing this capability in applications.
アプリケーション内にjoinロジックを書くのではなく、JOINはクラスタの全てのノードを横断します。
データを手動でバランスするのではなく、データおよびクエリの自動バランシングはクラスタの全てのノードを横断し、適宜申請を適用します。
個々のノードを管理のではなく、クラスタ指向の管理を容易にします。
幾つかのノードをオフラインにし、スキーマを変更し、それから再びオンラインにするのでは無く、read & write トランザクションを続けながら全てのノードを横断してオンラインでスキーマを変更します。
水平方向の連合とマスター-スレーブリプリケーションは、スケールには必要ありません。 The MySQL replication protocol/system is supported for testing and database loading purposes.
Fault tolerant - able to continue operating through all single, and some multiple, node or disk failures.
ノードを差し込むだけで追加します - プログラミングあるいは管理は必要ありません。

ClustrixDBはMySQLのコードを使いますか？

MySQLのコードは使われていません。ClustrixDB is entirely original, based on decades of experience in the development of scalable parallel file systems, very large time-series databases, and some of the world's fastest super-computers.

Clustrixはオープンソースとして利用可能ですか？

いいえ、ClustrixDBはダウンロード可能なソフトウェアとして利用可能です

どのプラットフォームでClustrixDBはサポートされていますか？

ClustrixDBは現在のところRHEL 6.4またはCentOS 6.4またはCentOS 6.5でサポートされています。

ClustrixDBはどうやってスケールできますか？

There are several things that affect scalability and performance:

非共有アーキテクチャー、これは潜在的なボトルネックを取り除きます。Contrast this with shared-disk / shared-cache architectures that bottleneck, don't scale, and are difficult to manage.
クエリの並列化、これはクエリを関係するデータを持つノードに分散します。Results are created as close to the data as possible, then routed back to the requesting node for consolidation and returned to the client.

これは、通常的にクエリを処理するためにノードへ大量のデータを移動する他のシステムと違っていて、クエリに一致しない全てのデータ(大体は大量のデータです)を除外します。By only moving qualified data across the network to the requesting node, ClustrixDB significantly reduces the network traffic bottleneck. In addition, more processors are participate in the data selection process, By selecting data on multiple nodes in parallel, the system produces results more quickly than if all data was selected by a single node, which first has to collect all the required data from the other nodes in the system.

Since each node focuses on a particular partition and sends work items to other nodes rather than requesting raw data from other nodes, each node's cache contains more of that node's data, and less redundant data from other nodes. このことはキャッシュのヒット率がとても高くなり、遅いディスクアクセスをとても減らします。

クラスタのどのノードに接続するかをどうやって知りますか？

問題ありません。クライアントはクラスタのどのノードにも接続することができます。ClustrixDB並列データベースソフトウェアはクエリを適切なノード - 関連データを持つノード - に向かわせるでしょう。Clustrixは外部ロードバランサを使うことを勧めます。

How does ClustrixDB compare with the master-slave replication approach to scalability?

リプリケーションはreadのみをスケールします。マスター - スレーブ構成では、全てのwriteがマスターに行われ、その後様々なスレーブにリプリケートされます。これは二つの問題を起こします:

It takes time to replicate the writes to the slaves. If a slave is read before the write is replicated, then the data that's read will be obsolete.
Eventually, the system spends all of its time replicating writes, and no cycles are left for reads.

How does ClustrixDB compare to application-level horizontal federation (a.k.a. sharding)?

本質的に、ClustriDBは水平方向の連合を行います。The key is making the federation invisible to applications and to administrators. さらに、ClustrixDBは以下のものを提供します:

Full ACID (Atomicity, Consistency, Isolation & Durability) properties across partitions.
Full relational calculus (i.e. left, inner & outer joins, etc.) across partitions.
Automatic management of the cluster - little administrator intervention is required, other than specifying the number of data replicas, and the priorities for various system functions, such as data replication.

By making the federation invisible to applications, ClustrixDB eliminates the need for custom programming and administration for partitioning. This increases the customer's ability to query and update transactions across partitions, ultimately leading to greater functionality at lower cost.

データのレプリカとは何か？

ClustrixDB内の全てのデータは、テーブルごとあるいはインデックスごとの単位でリプリケートされます。Customers may prefer to maintain more replicas of base representations (data tables), and fewer replicas of indexes (since indexes can easily be recreated from base representations).

ClustrixDBはどうやってjoinを最適化しますか？

The query planner is cluster-aware, and it knows which nodes of the cluster contain which indexed rows. Here's how it works:

Indexes are pointers to rows. A hashing function is used to store indexes, so indexes created on multiple tables will hash to the same node when the values that are indexed (or hashed) are the same. So the index for certain rows of table A will be stored on the same node as the index for rows of table B which have the same index values.

If an index is a primary key, then the rows are stored with the index. If the index is not a primary key, then the rows may be on a different node than the index.

This means that the rows for a table with a primary key are located on the same node as the index for rows in another table with its own index. The second index has pointers to the actual rows.

The effect of this is to reduce or eliminate cross-node traffic. For example, say an application wants to join table A's primary key Ap with table B's index Bi. The hashing function has already placed Ap rows and Bi indexes on the same node. The join operation will be dispatched to that node, and the only rows that need to be moved between nodes are the rows in B that meet the join criteria, as indicated by Bi. There's little of the cross-node data movement that's required for joins on other systems.

If the join is on two primary keys (Ap and Bp), then A's rows and B's rows will already have been hashed to the same nodes, completely eliminating the need for movement of raw data between nodes.

Note: if the join is on columns that have no indexes, then table scans are required, but the scans can be done in parallel on multiple nodes, so the operation, while not optimal, is still accelerated.

What steps are required to start a ClustrixDB database?

See the ClustrixDB Software Installation Guide.

What steps are required to add more nodes to an existing ClustrixDB database?

The short answer is: just add nodes. Refer to these instructions for guidance in Expanding Your Cluster's Capacity.

What happens to the system if a component fails?

The system is designed to continue operating through inevitable component failures, as follows:

If a disk fails, then new replicas will be created in accordance with priorities defined by the administrator; future transactions will run against the other replicas.
If a node fails, then new replicas of all data on all disks in that node will be created in accordance with priorities defined by the administrator; future transactions will run against the other replicas. Commonly accepted best practices for transaction retry are recommended.

What levels or redundancy are provided?

The node is the fundamental redundant unit. Any node can fail without a system outage. In addition, all data paths and all data are redundant. Administrators can specify the desired level of redundancy (number of data replicas) and can specify priorities for the re-creation of additional replicas when storage or nodes fail.

Is ClustrixDB a new storage engine for MySQL?

No, it's a complete database, built from the ground up for high-performance, clustered OLTP. It is wire-compatible with MySQL, but is implemented without any MySQL code.

Does the product support online backup operations?

ClustrixDB supports a fast parallel backup that can then be used for restore operations. ClustrixDB also supports MySQL operations such as mysqldump.