This is a high level glossary of terms. To get more detail on a particular term, click on one of the related links. To find the information you are looking for, you can also try searching this site. If there is a term that is not here that you would like to see defined, please email firstname.lastname@example.org
Search all Clustrix Documentation:
refers to the ability to set replicas = allnodes.
refers to the characteristic of a transaction that is all or nothing.
is a standard computer science data structure use for fast access. See B-Tree at Wikipedia
aka baserep. The relation representation that contains all columns, and indexed by the primary key. If no primary key is defined, ClustrixDB assigns a unique rowed key.
is ClustrixDB's garbage collection process that cleans up undo logs needed to rollback running transactions. ClustrixDB uses multi version concurrency control (MVCC) and as part of this must to keep the system state of a transaction while open, and cannot clean up past the oldest running transaction. Once the oldest running transaction is commited "bigc" cleans up the various undo logs. See also "pinning BigC"
Broadcasting refers to a method of transferring a message to all recipients simultaneously. ClustrixDB leverages distributed computing to avoid broadcasts. See how other databases do joins with broadcast .
Commit Identifier that marks when transactional changes become visible to other transactions.
A group of ClustrixDB nodes connected to provide a redundant, scalable RDBMS.
A consistent transaction does not violate any referential integrity during its execution.
ClustrixDB uses a cost-based model for the query optimizer (Sierra) that uses a cost factor based on I/O, CPU usage, and latency.
ClustrixDB leverages fine-grained data distribution and a shared-nothing architecture to provide scalability.
A collection of tables, or relations. This is also sometimes referred to as a schema. (The ClustrixDB term for tables is "relations.")
Refers to the ability for ClustrixDB to perform aggregate queries (e.g. OLAP) in a distributed manner.
The distribution key is some prefix of a representation's key columns and is used to distribute data across the cluster.
A database is durable if it provides a guarantee that transactions that have committed will survive permanently, including in the event of unexpected power loss or other hardware failure.
This describes how a query is evaluated in ClustrixDB.
The GTM and other subsystems use flow control to prevent message senders from outpacing receivers, and to prevent receiving nodes' memory from filling up with unprocessed messages
Sending a row or rows to another node for further processing
A pre-compiled part of a query usually sent to another node for processing
Global Transaction Manager (GTM)
is a subsystem that manages the atomic commitment of transactions across the cluster, and ensures that all nodes involved come to the same decision every time
A Group Change is the event that occurs when a cluster forms a new group. This occurs when a node joins or leaves the cluster group. This can also manifest in Clustrix Insight as 503 Errors while the Group Change occurs.
Invocation Id that marks the beginning of a statement within a transaction.
ClustrixDB uses an independent index distribution
is a property that defines how and when changes made by one transaction are visible to other concurrent transactions.
Generally refers to a poor data distribution.
refers to the ability to leverage a large number of processors to perform a set of coordinated ocmputations parallel.
is an method used to implement concurrency and consistency in a distributed database environment. One of the original papers on this topic is Concurrency Control in Distributed Database Systems. ClustrixDB implements a modified version of this algorithm that provides optimizations for modern database workloads.
A single server running the ClustrixDB software. Multiple nodes connect to form a cluster.
is internal Object Identifier used by ClustrixDB to describe a "thing" – types, relations, and rows are all examples of things that have OIDs.
On-Line Analytical Processing
On-line Transaction Processing
PD (Probability Distribution)
Probability Distributions are tracked for values in each relation to predict the size of a the result set for a given query.
refers to the status of the cluster when at least two replicas of every slice are available. See also Reprotect
The job of the optimizer is to determine which execution plan uses the least amount of resources. Typically this is done by assigning costs to plan choosing the lowest cost plan.
The ClustrixDB Rebalancer automatically moves, copies, and redistributes, and re-ranks data across the cluster. For more on this topic, see ClustrixDB Rebalancer.
is a table.
ClustrixDBは複数のデータのコピーを耐障害性と可用性のために保持します。By rule, copies are stored on different nodes.
When a slice has fewer replicas than desired, the Rebalancer will create a new copy of the slice on a different node
As the data set grows, ClustrixDB will automatically and incrementally redistribute the dataset one or more slices at a time.
is the Relational Intermediate Language, a version of SQL used in the ClustrixDB internals.
Sierra is the name given to the ClustrixDB Query Optimizer.
ClustrixDB breaks up each representation into a collection of logical slices. Rows are assigned to slices according to the results of a hashing function. See also Re-slicing
is an operation that removes a node from a cluster.
Refers to the state of the cluster when it does not have at least two copies (replicas) of each slice.
Virtual relation, often used to represent system information.
The Write Ahead Log is used to log every command that the user executes.
is an identifier used by ClustrixDB internals to denote the logical start of a Transaction.