メタデータの末尾にスキップ
メタデータの先頭に移動

Clustrix Insight is a visual tool for easily monitoring the current state of the cluster, analyzing any irregularities, and making any configuration changes.

Current System Status Ribbon

This "quick status" ribbon at the top of the page provides an immediate snapshot of the state of the cluster, as noted by a green, yellow, or red status indicator for each system component. More detail on system components and current system status can be found in the dashboard.

Dashboard

The dashboard contains a summary of current system resources and processes, as well as historical information. Current system status is indicated by a green, yellow, or red status indicator for each system component. You can refresh any widget on the dashboard by clicking the refresh button (visible on mouseover) at the upper right of the widget.

To invoke the Dashboard:

The Dashboard appears:

TPS

Current transactions per second (TPS) is noted at the upper left of the page, updating every 15 seconds. TPS for the past 48 hours is graphed at the upper right of the page, with data points added at five-minute intervals. You can mouse over this graph to view point-data, or click and drag to zoom in on a time period.

Rebalancer Widget

The rebalancer widget provides information about how data is distributed across the cluster. In the default cluster configuration, all data is stored in two copies (replicas) on different nodes, and data slices are distributed evenly among the nodes. Should any data slices become unavailable, the rebalancer waits ten minutes, then begins the process of replicating the surviving data copies on different nodes and rebalancing the data evenly among the nodes. This widget provides information on the percentage of data that is fully protected (duplicated in the cluster), how data is balanced among the nodes, and rebalancer actions over the past 24 hours.

A yellow status indicator for this widget indicates that the system is rebalancing, while a red status indicator indicates that data protection is less than 100%.

Data distribution among the nodes is shown in a bar chart. Numbers under the bar indicate the percentage of data capacity used for each node. A blue bar is an operating node, a red bar is an offline node, and a blue bar with a red box is an operating node that is the target of a rebalancing operation.

If rebalancing is required, this section will indicate that reprotecting is occurring and will give an ETA to complete reprotection. The number of data slices and quantity of data that are unprotected is also provided.

Possible rebalancer actions are:

  • Reprotect: A slice is being duplicated on another node as part of the reprotection process that follows a hardware failure
  • Redistribute: The distribution of data between nodes is being smoothed out to better distribute indexes that contain a single value
  • Rebalance: Slices are being moved around to spread data evenly between nodes
  • Split: A slice has grown too large and is being split into smaller slices

See Monitoring Data Rebalancing Activity under Administering the Cluster for more information on rebalancer operations.

Resource Utilization Widget

This widget gives current CPU utilization, buffer cache utilization, and disk read/write activity, along with sparklines of this information for the past 2 hours. More detailed information can be found in the performance report.

CPU and buffer cache utilization will be highlighted in yellow if either one exceeds 70%, or in red if either exceeds 85%. The status indicator for this widget will also change to yellow or red based on CPU utilization.

Disk Capacity Widget

Total disk usage on the cluster is given as a percentage and a graph. Disk usage is further broken down into user data, binlog, and undo log. Undo log use is graphed in a 2-hour sparkline. More detailed information can be found in the disk capacity report.

Should a disk go offline, a warning will appear indicating which disk is offline, along with a diagram indicating the disk location in the node. If a disk exceeds the capacity threshold, a warning will appear indicating which disk is over capacity, along with a breakdown of permanent, temporary, and undo log data stored on the disk, and sparklines of disk usage for that disk over the past 2 hours.

The status indicator for this widget will turn yellow if a disk goes over capacity or if total disk usage exceeds 70%. It will turn red if a disk goes offline or total disk usage exceeds 85%.

Slave Processes Widget

This widget shows any connected replication slaves and their replication status. The section lists the names of each slave, the binlog for each slave, current position in the binlog, and how many seconds the slave is behind the master. Seconds behind is also shown in a 2-hour sparkline. Any error messages for stopped slaves will be shown in this widget. If there are multiple slaves, this information will be shown separately for each one. More detailed information can be found in the replication report.

The status indicator for this widget will turn yellow if a slave is behind. It will turn red if a slave status is errored, or if seconds behind is increasing.

Workload Analysis Tools

To access the Workload Analysis tools:

Current Workload Analysis

The following sections describe the Workload Analysis tools.

This analysis tool allows a user to quickly identify specific queries which are inefficient, run slowly, or place a heavy load on system resources. The entire page can be refreshed by clicking the refresh button in the top right corner, otherwise it will not auto-refresh.

The top of the page shows current CPU utilization (also found in the dashboard and performance report), read/write latency, and total number of rows read and written in the current 30-second period. A bar chart groups query statements into bins based on average execution time, from less than 1 millisecond to 1 second or greater. The vertical axis indicates the percentage of the total processor load accounted for by the queries in each bin.

The body of the page lists recently executed queries. Long query statements are abbreviated, but full statements can be viewed by clicking "more" or "expand all". Sortable columns show the percentage of total load, execution count, average execution time, and rows read, written, and outputted for each query statement. Rows transacted per execution are shown in parentheses.

Sorting by the load column reveals queries that place a significant processing load on the cluster. From here, a user can determine the reason for the heavy resource use. Resource-expensive queries may be slow and complex (indicated by a long execution time), may have a high execution count, or may need indexing (possibly indicated by a high count of rows read per execution). Green, yellow, and red pie chart markers next to each query statement flag queries with a high read-per-execution count, and numbers in the rows read column are similarly colored (yellow for >1,000 rows read per execution, red for >20,000 rows read per execution).

To filter query statements to a specific database, enter the database name in the "Show only" box and click "Apply". To filter to multiple databases, click "Show/hide multiple databases" and select which databases to display.

Historical Comparison Workload Analysis



The purpose of this tool is to determine what changed in the database between two points in time. The graph at the top of the page shows historical CPU utilization and rows read per second. The body of the page gives a breakdown of the workload at two points in time. The left column represents workload at the left edge of the graph, while the right column represents workload at the right edge of the graph (initially, the current workload).

Use the graph at the top of the page to select a time period. The slider above the graph zooms out up to seven days. Alternatively, click and drag the graph to zoom in, either horizontally or vertically (double click to zoom out). Note that after clicking and dragging the slider above the graph will be disabled; double click to zoom out and re-enable the slider.

The body of the page shows workload breakdowns at two points in time. Query statements are sorted by the load they place on the CPU. Load is given as a percentage of the total load, not percentage of CPU capacity. Yellow and red pie chart markers indicate queries with a high number of rows read per execution. Mouse over each statement to see how many times the query has executed, average execution time, average rows read, and rows read, written, and outputted per execution. Long query statements are abbreviated; click "more" or "expand all" to show full query statements.

This tool can be useful for diagnosing problems. Suppose the database begins running slowly while CPU utilization spikes. What changed? To answer this question, locate the point on the graph where CPU utilization increased. Select a period that ranges from just before the spike to the peak of the spike. This should change the workload breakdowns in the body of the page. The left column now shows the workload from just before the spike, while the right column show the workload from the peak of the spike. Look for query statements on the right that aren't in the left column, or that account for a larger portion of the workload on the right than on the left. These statements are the likely culprits for the sluggish database, and should be subject to further analysis.

To filter query statements to a specific database, enter the database name in the "Show only" box and click "Apply". To filter to multiple databases, click "Show/hide multiple databases" and select which databases to display.

Workload Distribution Analysis



This analysis tool provides data on how the workload is distributed between the nodes. Ideally, workload should be evenly distributed among all nodes.

The graph at the top of the page shows the spread between read distributions over time. A wide band means that data is being read unevenly between nodes, and could indicate that data is not properly balanced between nodes. A narrow band indicates that all nodes are performing about the same number of read operations. Click the graph to show workload distribution at a certain point in time, or click "reset to current" to show current workload distribution.

The overall workload distribution chart shows the percentage of reads and writes distributed to each node, as well as the total CPU utilization for each node. A red or yellow bar highlights a workload that is out of balance. Further down the page, balance information is given for individual representations (indexes and base tables) that are frequently read/written.

To filter read/writes to a specific database, enter the database name in the "Show only" box and click "Apply". To filter to multiple databases, click "Show/hide multiple databases" and select which databases to display.

Workload imbalances should eventually be resolved by the rebalancer. If the workload has been imbalanced for some time and the rebalancer is not running or does not seem to be resolving the imbalance, manual rebalancing may be required; please contact Clustrix support.

Reports


To access reports:

Reports provide historical, detailed information about cluster status. Each report page consists of a series of graphs. These graphs are initially compact, but can be expanded by clicking "expand" or "expand all". Click the pop-out icon for any graph to pop it out in a new window which can be re-sized as desired. Click the refresh icon on any graph to refresh the entire page.

Use the slider at the top of the page to zoom out up to seven days. Click and drag a graph to zoom in, either horizontally or vertically (double click to zoom out). Note than after clicking and dragging to zoom in, the slider at the top of the page will be disabled. Double click any graph to zoom back out and re-enable the slider.

It is important to note that all the graphs on each report page are locked together. Zooming in or out on one graph will also adjust the other graphs. This feature makes it easier to compare different metrics over the same time period.

  • Performance Report The performance report provides a historical view of the data found in the resource utilization section of the dashboard. Transactions per second, CPU utilization, read/write query latencies, and buffer cache utilization are graphed over the past seven days.
  • Disk Capacity Report Like the performance report, this report provides a historical view of disk capacity. Disk utilization, disk utilization by type, binlog size, and undo log utilization are graphed over the past seven days.
  • Replication Report This report gives a historical view of replication processes. The seconds behind master graph shows how far behind each slave was at any point in the last seven days. Relay log size shows the size of the relay log for each slave at any point in the last seven days.

Configuration with Insight

You can use Clustrix Insight to perform the following types of configuration:

HTTPS Connections

To enable encrypted connections to Insight, open a command line and issue the following command: clx cmd 'webui_https_mode http_disable'. In a browser, browse to any node in the cluster and accept the certificate. Insight will now be https-only. Note that you will have to accept the certificate when first connecting to Insight on each node. It is not currently possible to use customer certificates.

TOP
inserted by FC2 system