Category Archives: Uncategorized

RAC Performance Tuning

Oracle Cache fusion, private inter connects and practical performance management considerations in Oracle RAC

Oracle Cache fusion, private inter connects and practical performance management considerations in Oracle RAC

In the article you will have a look at the cache fusion from the perspective of the Practical Performance Management for Oracle RAC. Cache fusion is based and heavily depends on Private Interconnect. The article will emphasize on the Interconnects impact, often overlooked and underestimated, on the RAC performance. Tuning RAC is similar to tuning of a regular single instance database and in addition takes into account the overhead of the interconnects existing in Oracle RAC (10gR1/10gR2/11gR1/11gR2). Some well known cases will be addressed in RAC to improve performance.

In this article you will review the Oracle fundamentals and infrastructure architecture and you will look at the Cache Fusion impact on the RAC performance. The outlined in the article guidelines has two objectives:

  • Maximize the utilization of the software and hardware for Private Interconnect. Make sure that you fully utilize the maximum of your capacity. Increase the bandwidth, throughput and decrease latency. You will look at considerations for the Private Interconnect.
    • network Architecture
    • network configuration
    • network settings
    • TCP/UDP configuration and settings
    • OS settings
  • Minimizing Cache fusion traffic across the private interconnects. Look at how to diagnose, determine and resolve problems related to Cache fusion addressing some common problems, derived from the cache fusion wait event and statistics, known to exist in Oracle RAC.

 

 

RAC Fundamentals and architecture

Oracle RAC is a parallel database architecture aiming at high availability and parallelism in a grid deployment. Concept of a grid is based on providing and distributing computing resources on demand. Oracle RAC is shared disk architecture. Oracle RDB database is another example of shared disk architecture. An alternative to Shared Disk Architecture is Shared nothing architecture. Shared nothing architecture is parallel database architecture (Teradata, IBM DB2, and Informix) based on MPP (Massive Parallel processing.

The table below represents RAC architecture and all layered components (RDBMS, clusterware resources, Oracle clusterware etc…)depends on the private interconnect.

Note 1

Node N

Oracle RDBMS

Oracle RDBMS

Oracle Clusterware resources (VIP, SCAN, Listeners, ASM etc…)

Oracle Clusterware resources (VIP, SCAN, Listeners, ASM etc…)

Oracle clusterware/GI

Oracle clusterware/GI

 

Private network (same subnet, same interface names across all nodes)

Private network (same subnet, same interface names across all nodes)

Public network (same subnet, same interfaces across all nodes)

Public network (same subnet, same interfaces across all nodes)

Same OS (Oracle Supported)

Same OS (Oracle Supported)

Shared or Cluster file system for the shared storage (Oracle supported) – GPFS, Veritas, ASM, etc…
Shared disk hardware storage (Oracle supported)

Function and processes for CGS, GES and GRD

SGA for a RAC database includes some new structures as show below in the image.

RAC Database System has two important services. They are Global Cache Service (GCS) and Global Enqueue Service (GES). These are basically collections of background processes and memory structures.  These two services GCS and GES together manage the total Cache Fusion process, resource transfers, and resource acquisition among the instances. Major processes are:

  • LMON: Global Enqueue Service Monitor: The LMON process monitors global enqueues across the cluster and performs global enqueue recovery operations. Recovers GRD.
  • LMD: Global Enqueue Service Daemon: The LMD process manages incoming remote resource requests within each instance.
  • LMS: Global Cache Service Process: The LMS process maintains records of the data file statuses and each cached block number by recording information in a Global Resource Directory (GRD). The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is part of the Cache Fusion feature.
  • LCK0: Instance Enqueue Process: The LCK0 process manages non-Cache Fusion resource requests such as library and row cache requests.

Global Resource Directory (GRD) records the information about the resources and the enqueues. GRD is maintained by GES and GCS. GRD remains in the memory and is stored on all of the running instances. Each instance manages a portion of the directory. This distributed nature is a key point for fault tolerance of the RAC. Global Resource Directory (GRD) is the internal in-memory database that records and stores the current status of the data blocks. Whenever a block is transferred out of a local cache to another instance’s cache the GRD is updated. The following resources information is available in GRD.

  • Data Block information such as file # and block #
  • Location of most current version
  • Modes of the data blocks: (N)Null, (S)Shared, (X)Exclusive

GCS is primarily responsible for maintaining coherency of the Buffer caches of all running RAC instances. CGS makes sure that instances acquire a resource cluster-wide before modifying or reading a database block. The GCS is used to synchronize global cache access, allowing only one instance to modify a block at any single point in time. The GCS, through the RAC wide Global Resource Directory, ensures that the status of data blocks cached in the cluster is globally visible and maintained.

GES is primarily responsible for maintaining coherency in the dictionary and library caches. The dictionary cache consists of the data dictionary information for each node in its SGA for quicker lookup and access. GES synchronizes dictionary and library caches across all running instances in the RAC environment. LMON, LCK and LMD processes work in tandem to make the GES operate in a smoothly.

Oracle RAC maintains multi-versioning architecture as far as blocks are concerned distinguishing between current data blocks and one or more consistent read (CR) versions of a block. A current block contains changes for all committed and uncommitted transactions. A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. In Oracle RAC, applying rollback segment information to current blocks produces consistent read versions of a block. Both the current and consistent read blocks are managed by the GCS. To transfer data blocks among database caches, buffers are shipped by means of the high speed IPC based on the private interconnect. Disk writes are only required for cache replacement. A past image (PI) of a block is kept in memory before the block is sent if it is a dirty block. In the event of failure, Oracle reconstructs the current version of the block by reading the PI blocks.

Global Logical Buffer Cache

Using Cache Fusion, Oracle RAC logically combines each instance’s buffer cache to enable the users and applications to process data as if the data resided on a logically combined single cache. The SGA size requirements for Oracle RAC are greater than the SGA requirements for single-instance Oracle databases due to Cache Fusion. To ensure that each Oracle RAC database instance obtains the block that it requires to satisfy a query or transaction, Oracle RAC instances use two components:

  • Global Cache Service (GCS)
  • Global Enqueue Service (GES)

The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances, which effectively increase the size of the SGA for an Oracle RAC instance. After one instance caches data, any other instance within the same cluster database can acquire a block image from another instance in the same database faster than by reading the block from disk. Therefore, Cache Fusion moves current blocks between instances rather than re-reading the blocks from disk. When a consistent block is needed or a changed block is required on another instance, Cache Fusion transfers the block image directly between the affected instances. Oracle RAC uses the private interconnect for inter-instance communication and block transfers. The GCS and GES processes, and the GRD collaborate to enable Cache Fusion. The image below is an illustration of the global buffer cache.

Cost of cache fusion.

RAC has a new feature called the cache fusion in comparison to OPS. Reads from disk are involved only if block is not available in the buffer caches of the other instances. The cost of a block access and cache coherency can be determined by CGS statistics and GCS events. CGS statistics and CGS events can gauge what is going on and can be a useful source of information for diagnostics.

The response time for the cache fusion transfers are determined by:

  • Physical private interconnects – More than one interconnect is required, the more interconnects the more redundancies and higher bandwidth for messages and cache fusion block transfer are available. Achieving low latencies is the objective. Private interconnects are required. Public corporate LAN might have a high bandwidth but have a low latency due to high retransmissions on encountered collisions. Interconnect performance depends on the speed that is set and redundancy.
  • IPC protocol – Oracle RAC tries to use user mode IPC inter process communication for sending data from one node to another as it does not require context switch and does not require kernel mode and runs in user application program mode. IPC protocol depends on the vendor of the hardware.
  • GCS protocol – GCS protocol depends on IPC protocol and private interconnect and is not directly affected by disk I/O except of disk I/O for the log write I/O whenever a dirty block in the buffer cache that is send over the interconnect to another instance for cache fusion reason either in write-read or write-write situation.

The response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in a write-read or write-write situation. For example a block is updated in transaction in buffer cache in instance A and the very same block image is transferred via the cache fusion traffic mechanism to buffer cache in instance C. We have to guarantee that redo log is written to the redo log files first. Other than that there is no much disk I/O performed. For example if there are 1000th updates they will result in doing much more redo than doing more reads. Heavy updates transactions will incur more disk I/O for the redo than doing more reads. The cache fusion response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in write-read or write-write situation.

Typical Latencies

Basically CR block request time and current block request time are what you are looking at.

  • CR block request time = build time + flush time + send time
  • Current block request time = pin time + flush time + send time

CR block request time is the time it takes to build the CR block in an instance that owns the appropriate image and the time to flush it, you have to write to disk , and how long it takes to send it across the interconnect.

Current block request time is how long it takes to pin the image in an instance that owns the block image and time it takes to flush it and send it across, because you cannot send it while some is changing that block at the same time. That is why you need to pin the block in exclusive mode then flush it and send it over the interconnect.

The statistics come from v$sysstat. Always query v$sysstat for the statistics or gv$sysstat.

Other latencies comes from v$ges_statistics or GV$ges_statistics view.

Latencies are dependent upon:

  • Utilization of the IPC protocol
  • Scheduling delays when the system is under high CPU or memory utilization
  • Log flushes for current blocks served

What you are primarily concerned are the average time to process CR block and the average time to process current block. Those shown values are typical. If overtime those times start to grow it might mean that you need to explore why it is taking longer. you might need to look at the wait events and the possible causes for those latencies to be growing. You need to determine why the things are changing and getting worst over time.

Waits in RAC environment

Wait events for RAC are very interesting in that like any other wait events show you all various things a session can wait on helping you identify what problem can be. RAC introduces an area that you do not need in a single instance environment.

Let’s review v$session_wait view. Oracle includes some common columns in v$session and v$session_wait views. The interesting columns are wait_time and event containing the name of the event in both views. If a session is waiting for something then when you query v$session_wait the event column would contain the name of event what a session is waiting on, for example db sequential read or log file parallel write occur in log writer (LGWR) as part of normal activity of copying records from the redo log buffer to the current online log or log file sync log when you commit also referred as a commit latency. If wait_time is 0 event shows what is waiting. If wait_time is greater than 0 event shows how long last event waited. If wait_time is -2 init parameter timed_statistics is not set. If wait_time is -1 wait_time is less than a hundred of a second and wait event is not captured.

For single instance, situation is simple, row in v$session_wait view represents either currently waiting 0 or something waited. RAC introduces complexity. When cache fusion is being done server process cannot do I/O as it wants. A single instance server process do I/O as it wants if a buffer is not in the buffer cache issues a requests and waits, for example db sequential read, and when I/O completes continues. In RAC server process makes a request to LMS background process handling cache fusion and when LMS gets involved there are several possibilities

  • one is that the instance requesting I/O have a valid copy of the block image in its own buffer cache and have enough information for the metadata global resource directory GRD and everything can be done locally without a block transfer
  • another scenario is when the requesting instance A does not have the metadata and another instance B have the GRD metadata for example block m in file n and to get the global resource metadata will require a hop and will get to instance B in order to obtain GRD metadata to identify the instance that have a valid copy of the block and if the block is either in instance A or B there are 2 hops as we already have 2 nodes involved.
  • Worst possible scenarios irrelevant to how many instances we have, assuming we have more than two instances, is when the instance that makes the request does not have the image copy of the block neither the global resource directory metadata for the block. In this case the LMS talk to LMS having the metadata who talks to LMS on a third instance that have the block image and the third instance using user mode IPC sends the block image to the first instance A requesting the block image. In the latter scenario you have a three hop situation. Three hop situation is the worst possible situation regardless of the number of nodes.

To summarize you have a requesting instance where the initial request is made for a block image by the server process, you have the instance that serves the image called the owning or serving instance and you have the instance that own the metadata in GRD for the particular block number and file number that is referred to as a mastering instance. The worst situation is when the owning, master and requesting instances are separate instances. The best case is when they are in the same instance. All wait events related to the global cache are then collected in the cluster wait class in V$ or OEM DC/GC. Wait events for RAC help you analyze what sessions are waiting for. Wait times are attributed to events that reflects the outcome of a request. Global cache waits are summarized in a broader category called cluster wait class. These events are used in ADDM or (G)V$ views to enable cache fusion diagnostics.

  • Wait times in RAC are attributed to events that reflect the outcome of a request in (g)v$session_wait :
    • Placeholders while waiting – wait_time = 0
    • Actual events after waiting – wait_time != 0

While making a request and waiting there is a placeholder event and wait_time is 0. When a block is received the placeholder event gets replaced by the actual event and the wait_time reflects that time waited.

Let’s look at the wait event views as a refresher for people that have not done it for a while.

  • V$SYSTEM_EVENT – total waits for an event
  • V$SESSION_WAIT_CLASS – waits for a wait event class by a session
  • V$SESSION_EVENT – waits for an event by a session
  • V$ACTIVE_SESSION_HISTORY – activity of recent active sessions
  • V$SESSION_WAIT_HISTORY – last 10 wait events for each active session.
  • V$SESSION_WAIT – events for which active sessions are waiting
  • V$SQLSTATS – identify SQL statements impacted by interconnect latencies

Global Cache wait events – Overview

On top (see the image) are 4 possible wait events known as a placeholder wait events. This occurs when wait_time is 0. When a server process makes a request for a block image it has no way to know if it is going to be satisfied by 1 or 2 or 3 nodes. All it does is contact local LMS process and requests a particular block image. This can be current or cr request and can be either ordinary or multiblock request so to summarize the possibilities are as follows:

  • Gc current request
  • Gc current multiblock request
  • Gc cr request
  • Gc cr multiblock request

The placeholder holds those events above whilst the process is waiting to get this block image. Once the wait is over you will know what has happened and the wait_time will be non zero value and you will have the actual event the process have been waiting in the event column of the v$session_wait instead of the placeholder event and the wait time for the actual event in the wait_time column in the v$session_wait. Let’s look at some of the events that are relevant.

Gc [current/cr] [2/3]-way – If you have 2 node cluster you cannot get 3-way as only two RAC instances are available and therefore only 2-way is possible as you can have at most two hops. If you have three or more RAC instances then 2-way or 3-way is possible. Blocks are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency. Monitor if average ms > 1ms or close to Disk I/O latency. Look at reducing latency.

Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while gc cr grant is read the block from disk and build a read consistent block once is read.

Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. You should look at it as it indicates that LMS could not dequeue message fast enough.

Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. It is also worth investigating.

Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example you are extending the high water mark and you are formatting the block images or blocks with block headers.

Gc [current/cr][failure/retry] – Not received because of failure, checksum error usually in the protocol of the private interconnect due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problems recovers and ultimately the block image can be received but it needs to retry.

Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention.

The key to understand is that there are separate wait events for the placeholder and when the event is over this event is replaced in v$session_wait with different event depending on how many hops there were, what kind of request was, what happened, was there a congestion, busy, failure or retry. Looking at (g)v$ views or AWR reports you need to see if you observe congestion, busy, failure, retry and investigate further.

Examples:

  • GC Current block 2-way

Let’s look at the case whereby the placeholder event is gc current block request and the actual event is gc current block 2-way. A process of the requesting instance asks for a block to be accessed in a current mode. The instance mastering the corresponding resource receives the requests. The master has the current version of the block. It makes a current copy using the current block and undo data it has and sends the current copy of the block to the requestor via the private interconnect. The inter-instance communication is done via the LMS processes messages as explained above. If the buffer is dirty in the owning instance it is flushed to disk with the log writer. The placeholder event changes from gc current block request to gc current block 2-way. This event might indicate a write/read contention if present in the top 5 event.

  • gc current block 3-way

    Let’s look at the case whereby the placeholder event is gc current block request and the actual event is gc current block 3-way. A process of the requesting instance asks for a block to be accessed in a current mode. The instance mastering the corresponding resource receives the request. The master forwards the request to the current holder of the block. The holder makes a current copy using the current block and undo data it has and sends a current copy of the block to the requestor via the cache fusion mechanism. The inter-instance communication is done via the LMS processes messages as explained above. If the buffer is dirty in the owning instance it is flushed to disk with the log writer. The placeholder event changes from gc current block request to gc current block 3-way. This event might indicate write/read contention if present in the top 5 events.

  • gc current grant 2 way

    Let’s look at the case whereby the placeholder event is gc current block request and the actual event is gc current grant 2-way. When an instance requests an access to a block in current mode it sends the request to the master instance. The master instance finds out that currently no instances, including itself, has any lock on the requested block. It sends a message back to the requesting instance granting it the shared lock on the block resulting to a permission to read the block from the disk data file. The placeholder event changes from gc current block request to gc current grant 2-way. This event might indicate that the instance is spending a significant amount of time in obtaining the locks if present in the top 5 list of events.

Global Enqueues Overview

In a RAC database, the GES is responsible for inter-instance resource coordination and coherency of the dictionary and library caches. GES tracks the status of all Oracle enqueue mechanisms for resources that are accessed by more than one instance. Oracle uses GES to manage concurrency for resources on transactions, tables and other structures within a RAC environment. GES is an integrated RAC component that coordinates global locks between the instances in the cluster.

An enqueue is
a memory structure that serializes access to a particular resource. Enqueue in RAC are global in order to have coherency for enqueues across all the instances and they are synchronous.

Most common enqueue are listed below

  • TX – transaction enqueue representing either row lock waits or ITL waits
  • TM – table manipulation enqueue on any table with DML. For example update block in one instance and truncate, drop or collect statistics in another instance on the same table.
  • HW – high watermark enqueue when there is extending of HW on a segment
  • TA – transaction recovery enqueue
  • SQ – sequence generation enqueue
  • US – undo segment enqueue for managing undo segments extension

The waits may constitute serious serialization points because enqueue are often even in a single instance and in RAC in addition to serialization that occurs due to enqueus they are exacerbated due to the private interconnect latency and overhead. If something does not scale well in a single instance it will not scale well in RAC due to contention problems and additional overhead contributed by the private interconnect latencies.

Session and System Statistics

There are various dynamic views to look at in order to diagnose performance problems with RAC environments.

  • V$SYSSTAT to characterize the workload.
  • V$SESSSTAT to monitor important sessions.
  • V$SEGMENT_STATISTICS includes RAC statistics on a per-segment basis identifying which segments are hot in regards to GES and GCS. Helps to identify segments that are contended over the interconnect.
  • V$ENQUEUE_STATISTICS determines the enqueues with the highest impact.
  • V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into block classes. What kind of blocks, data blocks or index blocks etc, are giving the most contention.

RAC relevant statistics group are:

  • Global Cache Service statistics
  • Global Enqueue Service statistics
  • Statistics for messages send

IPC configuration

CGS protocol depends on the IPC protocol and private interconnect. You can look at configuring the following for IPC

  • Socket buffer size: net.core.rmem_max, net.core.wmem_max.
  • Interface queue size: increase the size of the interface queue i.e. the transmit queue of the network interface issue the following command using ifconfig.
    • # ifconfig eth0 txqueuelen 10000
  • Input queue size: netdev_max_backlog specifies the maximum length of the input queues for the processors. The default value is 300 (packets). Linux has to wait up to scheduling time to flush buffers. Therefore, this value can limit the network bandwidth when receiving packets. If you want to get higher throughput, you need to condider increasing netdev_max_backlog

GCS processing

Latency of the GCS can be contributed by

  • CPU starvation
  • Memory starvation
  • IPC latencies
  • LMS configuration

In case of LMS congestion that is, when LMS cannot dequeue messages fast enough look at the GCS_SERVER_PROCESSES.

In 10g GCS_SERVER_PROCESSES is 2 by default. In 11g for one CPU, there is one GCS server process. For 2 – 8 CPUs, there will be 2 GCS server processes. For more than 8 CPUs, the number of GCS server processes will be equal to the number of CPUs divided by 4. If the result includes a fraction, ignore the fraction. For example, if you had 10 CPUs, then 10/4 would mean 2 GCS processes.

Let’s put it all together.

Private interconnect considerations

The network between the nodes of a RAC cluster must be private. The interfaces must have the same name and be in the same subnet across all nodes of the cluster. Configure the interfaces on the fastest PCI bus.

Supported links are Giga Bit Ethernet (GbE) and Infiniband (IB.

Supported protocols are UDP, RDS for Linux/Unix and TCP for Windows.

Use multiple or dual-ported NICs for redundancy (HA), load balancing, load spreading and increase bandwidth with NIC bonding/aggregation.

Large (Jumbo) Frames for GbE recommended if the global cache workload requires it.

Important settings

  • Full bit rate: Ensure that the network interface card (NIC) is configured at the maximum bandwidth that is 100Mbps, 1Gbps, 10Gbps.
  • Full duplex mode: In full-duplex operation, a connection between two devices is capable of sending data in both directions simultaneously.
    In contrast technologies that employ half-duplex operation are capable of sending information in both directions between two nodes, but only one direction or the other can be utilized at a time.
  • Flow control: control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from outrunning a slow receiver. It provides a mechanism for the receiver to control the transmission speed, so that the receiving node is not overwhelmed with data from sending nodes. Flow control should be distinguished from congestion control. Oracle recommends that for the flow control Rx=on tx=off
    • /sbin/ethtool -A eth0 autoneg on tx off rx on
  • NIC ring buffers: NIC is driver dependent and defaults are generally acceptable. Look at and investigate ring buffer events.
  • CPU(s) receiving networks interrupts:

Buffer overflows, congested links and flow control can have severe consequences for performance. Use netstat to obtain a report related to fragmentation and/or reassembly errors. Use ifconfig for reporting dropped packets or overflow.

Verify you setup

  • Use Cluster Verification utility
  • Load testing eliminates potentials for problems
  • AWR and ADDM provides an estimate of link utilization

Block access latencies increase when CPU(s) are busy and run queues are long.

  • Immediate LMS scheduling is critical for predictable block access latencies when CPU > 80% busy
  • LMS is by default elevated priority.

Fewer and busier LMS processes may be more efficient.

  • monitor their CPU utilization
  • Caveat: 1 LMS can be good for runtime performance but may impact cluster reconfiguration and instance recovery time
  • the default is good for most requirements. gcs_server_processes init parameter overrides defaults

The table below shows some characteristics of available cluster interconnects.

Measure-ment SMP Bus Memory channel Myrinet SUN SCI Gigabit Ethernet
(Gbe)
Infiniband
(IB)
Latency
(micro second)
0.5 3 7-9 10 100 <10
CPU overhead (micro second) <1 <1 <1   ~100  
Messages/sec
(million)
>10 >2     <0.1  
Bandwidth
MB/sec
>500 >100 ~250 ~70 ~50 3Gbps with ability to aggregate

Many of the Cluster vendors have designed very competitive technology. Technical solutions such as Memory Channel, SCI and Myrinet support virtual shared memory space implementing inter-nodal memory address space mapping. Connections between nodes are established by mapping part of the nodes virtual address space interconnect interface. Because of the memory mapped nature of the interface, transmitting or receiving overhead is similar to an access to local main memory. This mechanism is the fundamental reason for the low latency levels as seen in Memory Channel, SCI and Myrinet.

Aggregation

Aggregation is the ability of having separate physical NIC to operate together as one logical interface. The aggregated logical interface can operate in one of two modes:

  • Active – Standby: providing high availability and fault tolerance
  • Active – Active: providing high availability, fault tolerance and increased bandwidth equal to the sum of all bandwidths of aggregated interfaces.

Some of the Aggregation solutions are:

  • Cisco Etherchannel based 802.3ad
  • AIX Etherchannel
  • HPUX Auto Port Aggregation
  • SUN Trunking, IPMP, GLD
  • Linux Bonding (only certain modes)
  • Windows NIC teaming

Wait events worth investigation

Let’s look at some of the wait events which are worth further investigation as they represent a potential performance problem if the wait time is excessive or the event wait time is among the top 5 list in AWR report.

  1. gc cr/current blocks lost: This event shows block losses during transfers. This event almost always represents a severe performance problem and can reveal network congestion involving discarded packets and fragments, packet reassembly or timeouts, buffers overflows, flow control. Checksum errors or corrupted headers are also often the reason for the wait event. It is worth investigating the IPC configuration and possible downstream network problems (NIC, switch etc). Operating system data needs to be gathered with ifconfig, netstat and sar to name a few. ‘cr request retry’ event is likely to be seen when ‘gc cr blocks lost’ show up. High values indicate network problems. The use of an unreliable IPC protocol such as UDP may result in the value for global cache blocks lost being non-zero. Common problems and symptoms are:
  • Interconnect or Switch Problems
  • System load and scheduling
  • Contention
  • Unexpectedly high global cache latencies

Check for miss-configured or faulty interconnects that can cause:

  • Dropped packets/fragments
  • Buffer overflows
  • Packet reassembly failures or timeouts
  • Ethernet Flow control kicks in
  • TX/RX errors

Solution

  • Fix interconnect NICs and switches
  • Tune IPC buffer sizes
  1. gc block corrupt: This event shows if any blocks were corrupted during transfers. If high values are observed, there is probably an IPC, network or hardware problem.
  2. global cache null to s and global cache null to x: These events are generated by inter-instance block ping across the network. Inter-instance block ping is when two instances exchange the same block back and forth. Processes waiting for global cache null to s events are waiting for a block to be transferred from the instance that last changed it. When one instance repeatedly requests cached data blocks from the other RAC instances, these events consume a greater proportion of the total wait time. The only method for reducing these events is to reduce the number of rows per block to eliminate the need for block swapping between two instances in the RAC cluster. You can also look at implementing logical partitioning where the activity is on a selected instance. Frequent commits can also reduce some inter-instance transfers.
  3. global cache cr request: This event is generated when an instance has requested a consistent read data block and the block to be transferred had not arrived at the requesting instance. Other than examining the cluster interconnects for possible problems, there is nothing that can be done about this event.
  4. gc buffer busy: This event can be associated with buffer cache contention. However, note than there can be a multiple symptoms for the same cause. It can be seen together with ‘db file scattered reads’ event. Global cache access and serialization attributes to this event.
  5. congested: tells that The LMS cannot dequeue messages fast enough. The events that contain ‘congested’ suggest CPU saturation (runaway or spinning processes), long running queues, network configuration issues. It indicates performance problems. While investigating need to maintain a global view and remember that symptom and cause can be on different instances. This event can also happen if LSM cannot dequeue messages fast enough. gcs_server_processes init parameter controls number of LMS processes although in most of the cases the default value is sufficient. Excessive memory consumption leading to memory swapping can be another reason.
  6. busy: The events that contain ‘busy’ indicate contention. It needs investigation by drilling down into either SQL with highest cluster wait time or segment statistics with highest block transfers. Also look at objects with highest number of block transfers and global serialization.
  7. Gc [current/cr] [2/3]-way – If you have 2 node cluster you cannot get 3-way as only two RAC instances are available and therefore only 2-way is possible as you can have at most two hops. If we have three or more RAC instances then 2-way or 3-way is possible. Events are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency. The objective, if seen in the top 5 events, is to lower the latency.
    High Latencies are addressed by:
  • Check network configuration
    • Make sure that is private (do not share traffic)
    • Make sure that is running at expected bit rate
  • Find cause of high CPU consumption or high memory consumption
    • Runaway or spinning processes
  1. Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while gc cr grant is read the block from disk and build a read consistent block once is read. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency.
  2. Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. Look at congested.
  3. Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. Look for reducing concurrency and contention.
  4. Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example we are extending the high water mark and we are formatting the block images or blocks with block headers.
  5. Gc [current/cr][failure/retry] – Not received because of failure, checksum error usually in the protocol of the private interconnect due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problem recovers and ultimately the block image can be received but it needs to retry.
  6. Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention. Identify the ‘hot’ blocks in the application. Reduce the concurrency on the hot blocks.
  7. gc current grant busy – events that can be seen in applications with high insert rates where there is a serialization on HWM enqueue.

Perform a top down approach for performance analysis can be helpful. You can start with ADDM analysis then continue with AWR detail statistics and historical data and last but not least ASH will provide you with finer-grained session specific data.

Practical performance tuning considerations for RAC

Application tuning and modifying schema design is often the most beneficial performance solution when migrating to RAC. In case you are on RAC following the single instance tuning methodology and avoiding serialization reduces the private interconnect overhead.

Resize and tune the buffer cache when migrating from a single instance database to RAC to accommodate the CGS, GES and GRD overhead.

Reduce full table scans in OLTP systems
while doing updates and separate batch from OLTP at the same time. In RAC when you do a full table scan the instance doing them needs to build a read consistent version of the blocks and if the modifications are done across many instances than the undo needs to be shipped from across all the instances. The latter is a lot of unnecessary traffic. It is recommended if possible that operationally OLTP and DSS be scheduled so that they do not overlap in order to improve performance.

Use Automatic Segment Space Management. ASSM helps for RAC high insert intensive applications to avoid issues previously encountered and associated with free lists and free list groups and older ways Oracle used to manage space in segments. ASSM also provides an instance affinity to table blocks.

Use sequence caching in RAC.
Caching Oracle sequences can reduce index contention. If your business permits to have application logic where the numbers generated by the sequence are not ordered across all instances than you can use an Oracle sequence generated using cache number noorder clause. When you cache a sequence using cache number noorder clause on every instance you achieve a logical partitioning whereby the index keys on each instance are non-overlapping and each instance accesses a separate portion of the index tree. Successive index block splits can create instance affinity for the index blocks. Thus, you can avoid interconnect overhead while using indexes by avoiding a hot blocks being shipped back and forth across all instances.

Undo considerations in RAC. Undo is very expensive in RAC. Maintain a global view and understand that cause and symptoms can be on different instances and remember that UNDO is generated and handled on each instance. Avoid frequent reads of index blocks containing active transactions from many instances that result into excessive undo shipment across instances and undo contention. Undo block shipment may become a bottleneck if either

  • Excessive amount of index blocks are changed by multiple instances
  • Transaction duration is longer

This can often be seen in application that frequently read recently modified or inserted data but commit infrequently. You can avoid it if you use

  • Short transactions to reduce the probability that a index block in the cache has uncommitted data, thereby reducing the need to access undo information for consistent read
  • Increased Oracle sequence cache sizes to reduce inter-instance concurrent access to index blocks. CR versions of index blocks modified by only one instance can be fabricated without the need of remote undo block shipping.

In RAC shipment of undo, via the interconnect, introduces an extra overhead as undo is obtained from the instance where the DML took place.

High water mark considerations for high volume insert applications. For applications with high rate of inserts new blocks need to be allocated to the segment. New blocks are made available while serializing on HWM enqueue. Solution is to use large uniform extent sizes for the locally managed and automatic space managed segments that are subject to high-volume inserts.

Concurrent cross-instance calls considerations in RAC. Every time a table is truncated there is a checkpoint. In RAC there is cross-instance call to flush dirty blocks of the table that may be spread across all RAC instances. In some applications it is not uncommon to see a lot of truncate operations on tables containing temporary data. In RAC truncating tables concurrently from different instances does not scale well because it constitutes a point of serialization due to the fact that while the first truncate command is happening, the second has to wait until the first one completes.

Use partitioning to reduce inter-instance traffic. Partitioning using services allow us to avoid a lot of cache fusion overhead by having partitions accessed by users connected to some instances and other partitions accessed by users connected to other instances. It cannot be guaranteed but helps under certain circumstances. Table and index partitioning schemes can also reduce the interconnect traffic by reducing the data sets.

Avoid unnecessary parsing. In a single instance excessive parsing presents a room of concern. In addition in RAC, GES is kept busy to synchronize the library and row caches. You do not want this overhead.

Minimize locking usage.
Contention is going to be exacerbated in RAC because apart from the contention available in a single instance database in RAC you have also an interconnect overhead, so minimizing the locking overhead is very important in RAC environments. Typically this occurs with third party applications that does a lot of locking overhead due to the fact that their code is the same so that they can work on many database platforms. This is worth investigating further. Using tables instead of Oracle sequences to generate a unique numbers can be a serious problem as blocks will be cashed across all instances and will be transmitted back and forth via the private interconnect along with the undo from all instances.

Remove unselective indexes in OLTP
as they are not used by CBO for access method but while executing DML statements they steel need to be maintained. That is an overhead generally speaking but in RAC we might end up having those index blocks transferred by cache fusion. Removing unselective indexes will improve performance. Index splits are another important activity that you must be aware and look at.

Even in a single instance database applications with heavy inserts used to suffer a bad performance due to the few factors such as:

  • monotonically ordered increasing values in the index (right edge of the tree)
  • frequent index block splits

In RAC a global hot index block and index block splits will be also a problem when you account for the cluster interconnect overhead. The primary objective should be an approach aiming at equal and gradual distribution of the concurrency in the index tree. Such a distribution can be achieved if you:

  • Use a global hash partitioned indexes
  • Use reverse key indexes
  • Use cached sequences for logical partitioning

Configure interconnect properly with high bandwidth and low latency. Look at the private interconnect consideration section.

Sources:

Dowload Oracle Cache fusion, private inter connects and practical performance management considerations in Oracle RAC 

Like this:

August 2, 2011 – Posted by | oracle

7 Comments »

  1.  
    0
     
    0

     

    Rate This

    Congratulations. Nice post!

    Comment by chduarte02 | September 16, 2011 | Reply

  2.  
    0
     
    0

     

    Rate This

    Thanks for a wonderful explanation

    Comment by KRISHNA KUMAR | September 17, 2011 | Reply

  3.  
    0
     
    0

     

    Rate This

    I’m now not positive the place you are getting your info, however great topic. I needs to spend a while studying more or figuring out more. Thanks for great info I was in search of this information for my mission. Sorry for poor English. Goodbye from Japan!

    Comment by gre prep course | November 29, 2011 | Reply

  4.  
    0
     
    0

     

    Rate This

    Pretty nice post. I simply stumbled upon your blog and wished to mention that I’ve really loved surfing around your blog posts. After all I will be subscribing on your rss feed and I hope you write again soon!

    Comment by clickbank | December 5, 2011 | Reply

  5.  
    1
     
    0

     

    Rate This

    good job and great article

    Comment by Gpkanth | August 17, 2012 | Reply

  6.  
    0
     
    0

     

    Rate This

    Great Article summarizing tuning in RAC

    Comment by Sriram | August 26, 2012 | Reply

  7.  
    1
     
    0

     

    Rate This

    Nice post thanks a lot!

    Comment by Peter | December 3, 2012 | Reply

 

Leave a Reply

« Previous | Next »

Follow

Follow “Guenadi N Jilevski’s Oracle BLOG”

Get every new post delivered to your Inbox.

Join 419 other followers

How to read AWR

Search This Blog

powered by

This Blog
This Blog

Saturday, May 5, 2012

How To Read An AWR Report ?

Prerequisites for awrreport.

Statistics level should be set to typical.
Timed statistics should also be enabled.

SQL> show parameter statistics_level 

NAME TYPE VALUE ------------------------------------ -------- ------------------------------ statistics_level string TYPICAL

SQL> select STATISTICS_NAME,ACTIVATION_LEVEL, SYSTEM_STATUS from v$statistics_level;

STATISTICS_NAME ACTIVAT SYSTEM_S ---------------------------------------------------------------- ------- -------- Buffer Cache Advice TYPICAL ENABLED MTTR Advice TYPICAL ENABLED Timed Statistics TYPICAL ENABLED Timed OS Statistics ALL DISABLED Segment Level Statistics TYPICAL ENABLED PGA Advice TYPICAL ENABLED Plan Execution Statistics ALL DISABLED Shared Pool Advice TYPICAL ENABLED Modification Monitoring TYPICAL ENABLED Longops Statistics TYPICAL ENABLED Bind Data Capture TYPICAL ENABLED Ultrafast Latch Statistics TYPICAL ENABLED Threshold-based Alerts TYPICAL ENABLED Global Cache Statistics TYPICAL ENABLED Active Session History TYPICAL ENABLED Undo Advisor, Alerts and Fast Ramp up TYPICAL ENABLED Streams Pool Advice TYPICAL ENABLED Time Model Events TYPICAL ENABLED Plan Execution Sampling TYPICAL ENABLED Automated Maintenance Tasks TYPICAL ENABLED SQL Monitoring TYPICAL ENABLED Adaptive Thresholds Enabled TYPICAL ENABLED V$IOSTAT_* statistics TYPICAL ENABLED

The Header

The header basically gives you a Bird’s-eye view information on the database.

WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ DDOG 1373267642 DDOG2 2 10.2.0.2.0 YES lefty Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 2009 09-Jan-07 09:00:12 40 3.7 End Snap: 2010 09-Jan-07 10:00:19 39 2.8 Elapsed: 60.12 (mins) DB Time: 0.09 (mins) Cache Sizes ~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 528M 528M Std Block Size: 8K Shared Pool Size: 480M 480M Log Buffer: 15,152K

Elasped Time: It represents the snapshot window or the time between the two snapshots.
DB TIME: Represents the activity on the database.

If DB TIME is Greater than Elapsed Time then it means that database has high workload.

Load Profile

Meaning of various terms.

•Redo size – Indicates the amount of DML  activity happening in the database. 
•Logical and physical reads – Represents number of IO's (Physical and logical) that the
 database is performing.  
•User calls - Indicates how many user calls have occurred during the snapshot period.
 This value can give you some indication if usage has increased. 
•Parses and hard parses - Provides an indication of the efficiency of SQL re-usage.  
•Sorts - Number of sorts occurring in the database. 
•Logons – Number of logins which occurred in the database. 
•Executes – Number of SQL statements which were executed. 
•Transactions - Indicates how many transactions occurred during the snapshot period.

Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 209.78 1,014.29 Logical reads: 20.14 97.38 Block changes: 0.50 2.42 Physical reads: 0.00 0.01 Physical writes: 0.10 0.49 User calls: 1.05 5.08 Parses: 0.39 1.86 Hard parses: 0.00 0.01 Sorts: 0.52 2.53 Logons: 0.02 0.12 Executes: 0.95 4.61 Transactions: 0.21 % Blocks changed per Read: 2.49 Recursive Call %: 84.53 Rollback per transaction %: 97.45 Rows per Sort: 41.43

The percentage of soft parses should always be higher than hard parses.
Possible reasons for excessive hard parses may be a small shared pool
Or may be that bind variables are not being used.

Instance Efficiency

Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.98 Redo NoWait %: 100.00 Buffer Hit %: 100.00 In-memory Sort %: 100.00 Library Hit %: 99.64 Soft Parse %: 99.64 Execute to Parse %: 59.60 Latch Hit %: 100.00 Parse CPU to Parse Elapsd %: 130.00 % Non-Parse CPU: 96.30 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 87.98 87.97 % SQL with executions>1: 97.83 97.86 % Memory for SQL w/exec>1: 85.54 85.52

The following ratios should be above 90% in a database.

Buffer Nowait
Buffer  Hit   
Library Hit
Redo NoWait 
In-memory Sort 
Soft Parse 
Latch Hit 
Non-Parse CPU

The execute to parse ratio should be very high in a ideal database.

The execute to parse ratio is basically a measure between the number
Of times a sql is executed versus the number of times it is parsed.

The ratio will move higher as the number of executes go up, while
The number of parses either go down or remain the same.

The ratio will be close to zero if the number of executes and parses
Are almost equal.

The ratio will be negative executes are lower but the parses are higher.

% SQL with executions>1 value should also be high, a low value could mean that
The database is not using shared SQL statements which in turn could mean that
Bind variables are not being used.

Top 5 Timed Events

Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- control file parallel write 1,220 18 15 331.7 System I/O control file sequential read 6,508 6 1 110.9 System I/O CPU time 4 64.4 CGS wait for IPC msg 422,253 1 0 20.9 Other change tracking file synchrono 60 1 13 14.4 Other -------------------------------------------------------------

In an ideal database CPU and I/O should be the top wait events.
If there are events like TX – row lock contention, Latch Free then that
means there is contention in your database. If there is a high log file sync
event then check why is it happening. A possible solution for this problem is
to increase the size of the redo log buffer and to move your Logfiles to a storage
system where Disk I/O capacity is high.
Also, The db file sequential read(which means index reads) should be higher as
compared to db file scattered read(which means full table scans).

RAC Statistics

If you are running on a RAC cluster, then the AWRRPT.SQL report will provide various
RAC statistics including statistics on the number of RAC instances, as well as global
cache and enqueue related performance statistics. Here is an example of the RAC statistics
part of the report:

RAC Statistics DB/Inst: A109/a1092 Snaps: 2009-2010 Begin End ----- ----- Number of Instances: 2 2 Global Cache Load Profile ~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Global Cache blocks received: 0.11 0.52 Global Cache blocks served: 0.14 0.68 GCS/GES messages received: 0.88 4.23 GCS/GES messages sent: 0.85 4.12 DBWR Fusion writes: 0.01 0.04 Estd Interconnect traffic (KB) 2.31 Global Cache Efficiency Percentages (Target local+remote 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer access - local cache %: 99.47 Buffer access - remote cache %: 0.53 Buffer access - disk %: 0.00 Global Cache and Enqueue Services - Workload Characteristics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg global enqueue get time (ms): 0.0 Avg global cache cr block receive time (ms): 0.2 Avg global cache current block receive time (ms): 0.3 Avg global cache cr block build time (ms): 0.0 Avg global cache cr block send time (ms): 0.0 Global cache log flushes for cr blocks served %: 1.8 Avg global cache cr block flush time (ms): 4.0 Avg global cache current block pin time (ms): 0.0 Avg global cache current block send time (ms): 0.1 Global cache log flushes for current blocks served %: 0.4 Avg global cache current block flush time (ms): 0.0 Global Cache and Enqueue Services - Messaging Statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg message sent queue time (ms): ######## Avg message sent queue time on ksxp (ms): 0.1 Avg message received queue time (ms): 4.6 Avg GCS message process time (ms): 0.0 Avg GES message process time (ms): 0.0 % of direct sent messages: 45.26 % of indirect sent messages: 31.59 % of flow controlled messages: 23.15 -------------------------------------------------------------

From the perspective of performance tuning the amount of block transfer
across the interconnect should be low in comparison to localized access.
This can be seen in the "Global Cache Efficiency Percentages" section.

In the "Global Cache and Enqueue Services" the upper limit for
Avg global cache cr block receive time should be 4 and
Avg global cache current block receive time should be 8.

Time Model Statistics

Time related statistics presents the various operations which are consuming most of the database time.

Time Model Statistics DB/Inst: A109/a1092 Snaps: 2009-2010 -> Total time in database user-calls (DB Time): 5.5s -> Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic -> Ordered by % or DB time desc, Statistic name Statistic Name Time (s) % of DB Time ------------------------------------------ ------------------ ------------ sql execute elapsed time 4.5 82.8 DB CPU 3.5 64.4 connection management call elapsed time 0.1 1.6 parse time elapsed 0.1 1.3 PL/SQL execution elapsed time 0.0 .9 hard parse elapsed time 0.0 .3 sequence load elapsed time 0.0 .1 repeated bind elapsed time 0.0 .0 DB time 5.5 N/A background elapsed time 33.0 N/A background cpu time 9.7 N/A -------------------------------------------------------------

If Hard parses or parsing time is very high then further investigation
should be done to resolve the problem. 

Wait class and Wait Event Statistics

Closely associated with the time model section of the report are the wait class and wait
event statistics sections. Within Oracle, the duration of a large number of operations 
(e.g. Writing to disk or to the control file) is metered. These are known as wait events,
because each of these operations requires the system to wait for the event to complete. 
Thus, the execution of some database operation (e.g. a SQL query) will have a number of
wait events associated with it. We can try to determine which wait events are causing us
problems by looking at the wait classes and the wait event reports generated from AWR.
Wait classes define "buckets" that allow for summation of various wait times. Each wait 
event is assigned to one of these buckets (for example System I/O or User I/O). These buckets
allow one to quickly determine which subsystem is likely suspect in performance problems 
(e.g. the network, or the cluster). Here is an example of the wait class report section:

Wait Class DB/Inst: A109/a1092 Snaps: 2009-2010 -> s - second -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> us - microsecond - 1000000th of a second -> ordered by wait time desc, waits desc Avg %Time Total Wait wait Waits Wait Class Waits -outs Time (s) (ms) /txn -------------------- ---------------- ------ ---------------- ------- --------- System I/O 8,142 .0 25 3 10.9 Other 439,596 99.6 3 0 589.3 User I/O 112 .0 0 3 0.2 Cluster 443 .0 0 0 0.6 Concurrency 216 .0 0 0 0.3 Commit 16 .0 0 2 0.0 Network 3,526 .0 0 0 4.7 Application 13 .0 0 0 0.0 -------------------------------------------------------------

In this report the system I/O wait class has the largest number of waits (total of 25 seconds)
and an average wait of 3 milliseconds. 

Wait events are normal occurrences, but if a particular sub-system is having a problem 
performing (e.g. the disk sub-system) this fact will appear in the form of one or more 
wait events with an excessive duration. The wait event report then provides some insight 
into the detailed wait events. Here is an example of the wait event report (we have 
eliminated some of the bulk of this report, because it can get quite long). Note that 
this section is sorted by wait time (listed in microseconds).

                                                                   Avg
                                             %Time  Total Wait    wait     Waits
Event Waits -outs Time (s) (ms) /txn ---------------------------- -------------- ------ ----------- ------- --------- control file parallel write 1,220 .0 18 15 1.6 control file sequential read 6,508 .0 6 1 8.7 CGS wait for IPC msg 422,253 100.0 1 0 566.0 change tracking file synchro 60 .0 1 13 0.1 db file parallel write 291 .0 0 1 0.4 db file sequential read 90 .0 0 4 0.1 reliable message 136 .0 0 1 0.2 log file parallel write 106 .0 0 2 0.1 lms flush message acks 1 .0 0 60 0.0 gc current block 2-way 200 .0 0 0 0.3 change tracking file synchro 59 .0 0 1 0.1

In this example our control file parallel write waits (which occurs during writes to the 
control file) are taking up 18 seconds total, with an average wait of 15 milliseconds per 
wait. Additionally we can see that we have 1.6 waits per transaction (or 15ms * 1.6 per transaction = 24ms).

Operating System Statistics
This part of the report provides some basic insight into OS performance, and OS configuration 
too. This report may vary depending on the OS platform that your database is running on. 
Here is an example from a Linux system:

Statistic Total -------------------------------- -------------------- BUSY_TIME 128,749 IDLE_TIME 1,314,287 IOWAIT_TIME 18,394 NICE_TIME 54 SYS_TIME 31,633 USER_TIME 96,586 LOAD 0 RSRC_MGR_CPU_WAIT_TIME 0 PHYSICAL_MEMORY_BYTES 3,349,528 NUM_CPUS 4

In this example output, for example, we have 4 CPU's on the box.

SQL In Need of Tuning
Next in the report we find several different reports that present SQL statements that 
might be improved by tuning. There are a number of different reports that sort offending 

SQL statements by the following criteria:

•Elapsed time

•CPU time

•Buffer gets

•Physical reads

•Executions

•Parse calls

•Sharable memory

•Version count

•Cluster wait time

While these reports might not help tune specific application problems, they can help you 
find more systemic SQL problems that you might not find when tuning a specific application 
module. Here is an example of the Buffer gets report:

 Gets CPU Elapsed Buffer Gets Executions per Exec %Total Time (s) Time (s) SQL Id -------------- ------------ ------------ ------ -------- --------- ------------- 2,163 7 309.0 3.0 0.03 0.04 c7sn076yz7030 select smontabv.cnt, smontab.time_mp, smontab.scn, smontab.num_mappings, smon tab.tim_scn_map, smontab.orig_thread from smon_scn_time smontab, (sel ect max(scn) scnmax, count(*)+sum(NVL2(TIM_SCN_MAP,NUM_MAPPINGS, 0)) cnt from smon_scn_time where thread=0) smontabv where smon 1,442 721 2.0 2.0 0.05 0.05 6ssrk2dqj7jbx select job, nvl2(last_date, 1, 0) from sys.job$ where (((:1 <= next_date) and (n ext_date <= :2)) or ((last_date is null) and (next_date < :3))) and (field1 = :4 or (field1 = 0 and 'Y' = :5)) and (this_date is null) order by next_date, j ob 1,348 1 1,348.0 1.9 0.04 0.04 bv1djzzmk9bv6 Module: TOAD 9.0.0.160 Select table_name from DBA_TABLES where owner = 'CDOL2_01' order by 1 1,227 1 1,227.0 1.7 0.07 0.08 d92h3rjp0y217 begin prvt_hdm.auto_execute( :db_id, :inst_id, :end_snap ); end; 896 4 224.0 1.2 0.03 0.03 6hszmvz1wjhbt Module: TOAD 9.0.0.160 Select distinct Cons.constraint_name, cons.status, cons.table_name, cons.constra int_type ,cons.last_change from sys.user_constraints cons where 1=1 a nd cons.status='DISABLED'

In this report we find a SQL statement that seems to be churning through 309 buffers per execution. 
While the execution times are not terrible we might want to look closer into the SQL statement 
and try to see if we could tune it (in fact this is Oracle issued SQL that we would not tune anyway).

Instance Activity Stats

This section provides us with a number of various statistics (such as, how many DBWR Checkpoints 
occurred, or how many consistent gets occurred during the snapshot). Here is a partial example of the report:

Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- consistent changes 9 0.0 0.0 consistent gets 70,445 19.5 94.4 consistent gets - examination 8,728 2.4 11.7 consistent gets direct 0 0.0 0.0 consistent gets from cache 70,445 19.5 94.4 cursor authentications 2 0.0 0.0 data blocks consistent reads - u 5 0.0 0.0 db block changes 1,809 0.5 2.4 db block gets 2,197 0.6 3.0 db block gets direct 0 0.0 0.0 db block gets from cache 2,033 0.6 2.7

Tablespace and Data File IO Stats

The tablespace and data file IO stats report provides information on tablespace IO performance. 
From this report you can determine if the tablespace datafiles are suffering from sub-standard 
performance in terms of IO response from the disk sub-system. Here is a partial example of the 
tablespace report:

Tablespace ------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ SYSAUX 1 0 0.0 1.0 159 0 13 0.8 UNDOTBS2 1 0 10.0 1.0 98 0 0 0.0 SYSTEM 1 0 10.0 1.0 46 0 0 0.0 AUD 1 0 0.0 1.0 1 0 0 0.0 CDOL2_INDEX 1 0 10.0 1.0 1 0 0 0.0 CDOL_DATA 1 0 10.0 1.0 1 0 0 0.0 DBA_DEF 1 0 10.0 1.0 1 0 0 0.0 UNDOTBS1 1 0 10.0 1.0 1 0 0 0.0 USERS 1 0 10.0 1.0 1 0 0 0.0 USER_DEF 1 0 10.0 1.0 1 0 0 0.0

If the tablespace IO report seems to indicate a tablespace has IO problems, we can then use the 
file IO stat report allows us to drill into the datafiles of the tablespace in question and 
determine what the problem might be. Here is an example of the File IO stat report:

Tablespace Filename ------------------------ ---------------------------------------------------- Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ AUD +ASM01/a109/datafile/aud.296.604081931 1 0 0.0 1.0 1 0 0 0.0 CDOL2_INDEX +ASM01/a109/datafile/cdol2_index_001.dbf 1 0 10.0 1.0 1 0 0 0.0 CDOL_DATA +ASM01/a109/datafile/cdol_data_001.dbf 1 0 10.0 1.0 1 0 0 0.0 DBA_DEF +ASM01/a109/datafile/dba_def.294.604081931 1 0 10.0 1.0 1 0 0 0.0 SYSAUX +ASM01/a109/datafile/sysaux.299.604081927 1 0 0.0 1.0 159 0 13 0.8 SYSTEM +ASM01/a109/datafile/system.301.604081919 1 0 10.0 1.0 46 0 0 0.0 UNDOTBS1 +ASM01/a109/datafile/undotbs1.300.604081925 1 0 10.0 1.0 1 0 0 0.0 UNDOTBS2 +ASM01/a109/datafile/undotbs2.292.604081931 1 0 10.0 1.0 98 0 0 0.0 USERS +ASM01/a109/datafile/users.303.604081933 1 0 10.0 1.0 1 0 0 0.0 USER_DEF +ASM01/a109/datafile/user_def.291.604081933 1 0 10.0 1.0 1 0 0 0.0 -------------------------------------------------------------

Buffer Pool Statistics

The buffer pool statistics report follows. It provides a summary of the buffer pool 
configuration and usage statistics as seen in this example:

 Free Writ Buffer Number of Pool Buffer Physical Physical Buff Comp Busy P Buffers Hit% Gets Reads Writes Wait Wait Waits --- ---------- ---- -------------- ------------ ----------- ---- ---- ---------- D 64,548 100 72,465 0 355 0 0 13 -------------------------------------------------------------

In this case, we have a database where all the buffer pool requests came out of the buffer 
pool and no physical reads were required. We also see a few (probably very insignificant in 
our case) buffer busy waits.

Instance Recovery Stats

The instance recovery stats report provides information related to instance recovery. By analyzing 
this report, you can determine roughly how long your database would have required to perform 
crash recovery during the reporting period. Here is an example of this report:

-> B: Begin snapshot, E: End snapshot Targt Estd Log File Log Ckpt Log Ckpt MTTR MTTR Recovery Actual Target Size Timeout Interval (s) (s) Estd IOs Redo Blks Redo Blks Redo Blks Redo Blks Redo Blks - ----- ----- ---------- --------- --------- ---------- --------- ------------ B 0 19 196 575 183 92160 183 N/A E 0 19 186 258 96 92160 96 N/A -------------------------------------------------------------

Buffer Pool Advisory

The buffer pool advisory report answers the question, how big should you make your database 
buffer cache. It provides an extrapolation of the benefit or detriment that would result if 
you added or removed memory from the database buffer cache. These estimates are based on 
the current size of the buffer cache and the number of logical and physical IO's encountered 
during the reporting point. This report can be very helpful in "rightsizing" your buffer cache. 
Here is an example of the output of this report:

 Est Phys Size for Size Buffers for Read Estimated P Est (M) Factor Estimate Factor Physical Reads --- -------- ------ ---------------- ------ ------------------ D 48 .1 5,868 4.9 803,496 D 96 .2 11,736 4.0 669,078 D 144 .3 17,604 3.3 550,831 D 192 .4 23,472 2.8 462,645 D 240 .5 29,340 2.3 379,106 D 288 .5 35,208 1.8 305,342 D 336 .6 41,076 1.4 238,729 D 384 .7 46,944 1.2 200,012 D 432 .8 52,812 1.1 183,694 D 480 .9 58,680 1.0 172,961 D 528 1.0 64,548 1.0 165,649 D 576 1.1 70,416 1.0 161,771 D 624 1.2 76,284 1.0 159,728 D 672 1.3 82,152 1.0 158,502 D 720 1.4 88,020 1.0 157,723 D 768 1.5 93,888 0.9 157,124 D 816 1.5 99,756 0.9 156,874 D 864 1.6 105,624 0.9 156,525 D 912 1.7 111,492 0.9 156,393 D 960 1.8 117,360 0.9 155,388 -------------------------------------------------------------

In this example we currently have 528GB allocated to the SGA (represented by the size 
factor column with a value of 1.0. It appears that if we were to reduce the memory allocated 
to the SGA to half of the size of the current SGA (freeing the memory to the OS for other processes) 
we would incur an increase of about 1.8 times the number of physical IO's in the process.

PGA Reports

The PGA reports provide some insight into the health of the PGA. The PGA Aggr Target Stats report
provides information on the configuration of the PGA Aggregate Target parameter during the 
reporting period. 

The PGA Aggregate Target Histogram report provides information on the size of various operations 
(e.g. sorts). It will indicate if PGA sort operations occurred completely in memory, or if some 
of those operations were written out to disk. 

Finally the PGA Memory Advisor, much like the buffer pool advisory report, provides some insight 
into how to properly size your PGA via the PGA_AGGREGATE_TARGET database parameter. The PGA Memory 
Advisor report is shown here:

 Estd Extra Estd PGA Estd PGA PGA Target Size W/A MB W/A MB Read/ Cache Overalloc Est (MB) Factr Processed Written to Disk Hit % Count ---------- ------- ---------------- ---------------- -------- ---------- 44 0.1 289,899.2 7,844.9 97.0 1,124 88 0.3 289,899.2 7,576.9 97.0 1,073 176 0.5 289,899.2 3.3 100.0 0 263 0.8 289,899.2 3.3 100.0 0 351 1.0 289,899.2 3.3 100.0 0 421 1.2 289,899.2 0.0 100.0 0 491 1.4 289,899.2 0.0 100.0 0 562 1.6 289,899.2 0.0 100.0 0 632 1.8 289,899.2 0.0 100.0 0 702 2.0 289,899.2 0.0 100.0 0 1,053 3.0 289,899.2 0.0 100.0 0 1,404 4.0 289,899.2 0.0 100.0 0 2,106 6.0 289,899.2 0.0 100.0 0 2,808 8.0 289,899.2 0.0 100.0 0 -------------------------------------------------------------

Shared Pool Advisory

The shared pool advisory report provides assistance in right sizing the Oracle shared pool. 
Much like the PGA Memory Advisor or the Buffer Pool advisory report, it provides some insight 
into what would happen should you add or remove memory from the shared pool. This can help 
you reclaim much needed memory if you have over allocated the shared pool, and can significantly 
improve performance if you have not allocated enough memory to the shared pool. Here is an 
example of the shared pool advisory report:

 Est LC Est LC Est LC Est LC Shared SP Est LC Time Time Load Load Est LC Pool Size Size Est LC Saved Saved Time Time Mem Size(M) Factr (M) Mem Obj (s) Factr (s) Factr Obj Hits ---------- ----- -------- ------------ ------- ------ ------- ------ ----------- 192 .4 54 3,044 ####### .8 ####### 382.1 22,444,274 240 .5 92 5,495 ####### .9 ####### 223.7 22,502,102 288 .6 139 8,122 ####### .9 53,711 102.5 22,541,782 336 .7 186 12,988 ####### 1.0 17,597 33.6 22,562,084 384 .8 233 17,422 ####### 1.0 7,368 14.1 22,569,402 432 .9 280 23,906 ####### 1.0 3,553 6.8 22,571,902 480 1.0 327 28,605 ####### 1.0 524 1.0 22,573,396 528 1.1 374 35,282 ####### 1.0 1 .0 22,574,164 576 1.2 421 40,835 ####### 1.0 1 .0 22,574,675 624 1.3 468 46,682 ####### 1.0 1 .0 22,575,055 672 1.4 515 52,252 ####### 1.0 1 .0 22,575,256 720 1.5 562 58,181 ####### 1.0 1 .0 22,575,422 768 1.6 609 64,380 ####### 1.0 1 .0 22,575,545 816 1.7 656 69,832 ####### 1.0 1 .0 22,575,620 864 1.8 703 75,168 ####### 1.0 1 .0 22,575,668 912 1.9 750 78,993 ####### 1.0 1 .0 22,575,695 960 2.0 797 82,209 ####### 1.0 1 .0 22,575,719 -------------------------------------------------------------

SGA Target Advisory

The SGA target advisory report is somewhat of a summation of all the advisory reports previously 
presented in the AWR report. It helps you determine the impact of changing the settings of the 
SGA target size in terms of overall database performance. The report uses a value called DB Time 
as a measure of the increase or decrease in performance relative to the memory change made. Also 
the report will summarize an estimate of physical reads associated with the listed setting for 
the SGA. Here is an example of the SGA target advisory report:

SGA Target SGA Size Est DB Est Physical Size (M) Factor Time (s) Reads ---------- ---------- ------------ ---------------- 528 0.5 25,595 769,539 792 0.8 20,053 443,095 1,056 1.0 18,443 165,649 1,320 1.3 18,354 150,476 1,584 1.5 18,345 148,819 1,848 1.8 18,345 148,819 2,112 2.0 18,345 148,819

In this example, our SGA Target size is currently set at 1056MB. We can see from this report that 
if we increased the SGA target size to 2112MB, we would see almost no performance improvement 
(about a 98 second improvement overall). In this case, we may determine that adding so much 
memory to the database is not cost effective, and that the memory can be better used elsewhere.

Memory Advisory

Memory advisory reports for the streams pool and the java pool also appear in the report 
(assuming you are using the streams pool). These reports take on the same general format as 
the other memory advisor reports.

Buffer Wait Statistics

The buffer wait statistics report helps you drill down on specific buffer wait events, and where 
the waits are occurring. In the following report we find that the 13 buffer busy waits we saw in 
the buffer pool statistics report earlier are attributed to data block waits. We might then want 
to pursue tuning remedies to these waits if the waits are significant enough. Here is an example 
of the buffer wait statistics report:

Class Waits Total Wait Time (s) Avg Time (ms) ------------------ ----------- ------------------- -------------- data block 13 0 1

Enqueue Activity

The Enqueue activity report provides information on enqueues (higher level Oracle locking) that occur. 
As with other reports, if you see high levels of wait times in these reports, you might dig further 
into the nature of the enqueue and determine the cause of the delays. Here is an example of this report section:
Enqueue Type (Request Reason)

------------------------------------------------------------------------------ Requests Succ Gets Failed Gets Waits Wt Time (s) Av Wt Time(ms) ------------ ------------ ----------- ----------- ------------ -------------- PS-PX Process Reservation 386 358 28 116 0 .43 US-Undo Segment 276 276 0 228 0 .18 TT-Tablespace 90 90 0 42 0 .71 WF-AWR Flush 12 12 0 7 0 1.43 MW-MWIN Schedule 2 2 0 2 0 5.00 TA-Instance Undo 12 12 0 12 0 .00 UL-User-defined 7 7 0 7 0 .00 CF-Controlfile Transaction 5,737 5,737 0 5 0 .00

Undo Segment Summary

The undo segment summary report provides basic information on the performance of undo tablespaces.

Latch Activity

The latch activity report provides information on Oracle's low level locking mechanism called a 
latch. From this report you can determine if Oracle is suffering from latching problems, and if 
so, which latches are causing the greates amount of contention on the system. Here is a partial 
example of the latch activity report (it is quite long):

 Pct Avg Wait Pct Get Get Slps Time NoWait NoWait Latch Name Requests Miss /Miss (s) Requests Miss ------------------------ -------------- ------ ------ ------ ------------ ------ ASM allocation 122 0.0 N/A 0 0 N/A ASM map headers 60 0.0 N/A 0 0 N/A ASM map load waiting lis 11 0.0 N/A 0 0 N/A ASM map operation freeli 30 0.0 N/A 0 0 N/A ASM map operation hash t 45,056 0.0 N/A 0 0 N/A ASM network background l 1,653 0.0 N/A 0 0 N/A AWR Alerted Metric Eleme 14,330 0.0 N/A 0 0 N/A Consistent RBA 107 0.0 N/A 0 0 N/A FAL request queue 75 0.0 N/A 0 0 N/A FAL subheap alocation 75 0.0 N/A 0 0 N/A FIB s.o chain latch 14 0.0 N/A 0 0 N/A FOB s.o list latch 93 0.0 N/A 0 0 N/A JS broadcast add buf lat 826 0.0 N/A 0 0 N/A JS broadcast drop buf la 826 0.0 N/A 0 0 N/A

In this example our database does not seem to be experiencing any major latch problems, as the 
wait times on the latches are 0, and our get miss pct (Pct Get Miss) is 0 also.

There is also a latch sleep breakdown report which provides some additional detail if a latch is 
being constantly moved into the sleep cycle, which can cause additional performance issues.

The latch miss sources report provides a list of latches that encountered sleep conditions. This 
report can be of further assistance when trying to analyze which latches are causing problems 
with your database.

Segments by Logical Reads and Segments by Physical Reads

The segments by logical reads and segments by physical reads reports provide information on the 
database segments (tables, indexes) that are receiving the largest number of logical or physical 
reads. These reports can help you find objects that are "hot" objects in the database. You may 
want to review the objects and determine why they are hot, and if there are any tuning 
opportunities available on those objects (e.g. partitioning), or on SQL accessing those objects. 
For example, if an object is showing up on the physical reads report, it may be that an index 
is needed on that object. Here is an example of the segments by logical reads report:

Segments by Logical Reads DB/Inst: A109/a1092 Snaps: 2009-2010 -> Total Logical Reads: 72,642 -> Captured Segments account for 96.1% of Total Tablespace Subobject Obj. Logical Owner Name Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- SYS SYSAUX SYS_IOT_TOP_8813 INDEX 52,192 71.85 SYS SYSTEM SMON_SCN_TIME TABLE 4,704 6.48 SYS SYSTEM I_JOB_NEXT INDEX 2,432 3.35 SYS SYSTEM OBJ$ TABLE 1,344 1.85 SYS SYSTEM TAB$ TABLE 1,008 1.39 -------------------------------------------------------------

Additional Reports

Several segment related reports appear providing information on:
•Segments with ITL waits

•Segments with Row lock waits

•Segments with buffer busy waits

•Segments with global cache buffer waits

•Segments with CR Blocks received

•Segments with current blocks received

These reports help provide more detailed information on specific segments that might be 
experiencing performance problems.

The dictionary cache and library cache statistics reports provide performance information 
on the various areas in the data dictionary cache and the library cache.

The process memory summary, SGA memory summary, and the SGA breakdown difference reports 
provide summary information on how memory allocated to the database is allocated amongst 
the various components. Other memory summary reports may occur if you have certain optional 
components installed (such as streams).

The database parameter summary report provides a summary of the setting of all the database 
parameters during the snapshot report. If the database parameters changed during the period 
of the report, then the old and new parameters will appear on the report.

Related: -

1) How to Tune Shared Pool Cache ?

Reference: ToadWorld

2 comments:

  1. really good and informative

    ReplyDelete

  2. This is best article in AWR I have ever read… Thanks for sharing.

    ReplyDelete

 

 

 

Subscribe to: Post Comments (Atom)

good websites

Windows:

camstudio.org —  records screen and convert to avi

xbmc.org — media library

truecrypt.org — encrypts a folder

screenleap.com — screen sharing service

rocketdock.com — max like look

piriform.com/recuva — recover accidentally deleted documents

keepass.info — encrypt and keep passwrds

eraser.heidi.ie — erase the HD

dexpot.de — virtual desktop sw

crossloop.com — remote support sw