RAC Performance Tuning

Oracle Cache fusion, private inter connects and practical performance management considerations in Oracle RAC

Oracle Cache fusion, private inter connects and practical performance management considerations in Oracle RAC

In the article you will have a look at the cache fusion from the perspective of the Practical Performance Management for Oracle RAC. Cache fusion is based and heavily depends on Private Interconnect. The article will emphasize on the Interconnects impact, often overlooked and underestimated, on the RAC performance. Tuning RAC is similar to tuning of a regular single instance database and in addition takes into account the overhead of the interconnects existing in Oracle RAC (10gR1/10gR2/11gR1/11gR2). Some well known cases will be addressed in RAC to improve performance.

In this article you will review the Oracle fundamentals and infrastructure architecture and you will look at the Cache Fusion impact on the RAC performance. The outlined in the article guidelines has two objectives:

  • Maximize the utilization of the software and hardware for Private Interconnect. Make sure that you fully utilize the maximum of your capacity. Increase the bandwidth, throughput and decrease latency. You will look at considerations for the Private Interconnect.
    • network Architecture
    • network configuration
    • network settings
    • TCP/UDP configuration and settings
    • OS settings
  • Minimizing Cache fusion traffic across the private interconnects. Look at how to diagnose, determine and resolve problems related to Cache fusion addressing some common problems, derived from the cache fusion wait event and statistics, known to exist in Oracle RAC.

 

 

RAC Fundamentals and architecture

Oracle RAC is a parallel database architecture aiming at high availability and parallelism in a grid deployment. Concept of a grid is based on providing and distributing computing resources on demand. Oracle RAC is shared disk architecture. Oracle RDB database is another example of shared disk architecture. An alternative to Shared Disk Architecture is Shared nothing architecture. Shared nothing architecture is parallel database architecture (Teradata, IBM DB2, and Informix) based on MPP (Massive Parallel processing.

The table below represents RAC architecture and all layered components (RDBMS, clusterware resources, Oracle clusterware etc…)depends on the private interconnect.

Note 1

Node N

Oracle RDBMS

Oracle RDBMS

Oracle Clusterware resources (VIP, SCAN, Listeners, ASM etc…)

Oracle Clusterware resources (VIP, SCAN, Listeners, ASM etc…)

Oracle clusterware/GI

Oracle clusterware/GI

 

Private network (same subnet, same interface names across all nodes)

Private network (same subnet, same interface names across all nodes)

Public network (same subnet, same interfaces across all nodes)

Public network (same subnet, same interfaces across all nodes)

Same OS (Oracle Supported)

Same OS (Oracle Supported)

Shared or Cluster file system for the shared storage (Oracle supported) – GPFS, Veritas, ASM, etc…
Shared disk hardware storage (Oracle supported)

Function and processes for CGS, GES and GRD

SGA for a RAC database includes some new structures as show below in the image.

RAC Database System has two important services. They are Global Cache Service (GCS) and Global Enqueue Service (GES). These are basically collections of background processes and memory structures.  These two services GCS and GES together manage the total Cache Fusion process, resource transfers, and resource acquisition among the instances. Major processes are:

  • LMON: Global Enqueue Service Monitor: The LMON process monitors global enqueues across the cluster and performs global enqueue recovery operations. Recovers GRD.
  • LMD: Global Enqueue Service Daemon: The LMD process manages incoming remote resource requests within each instance.
  • LMS: Global Cache Service Process: The LMS process maintains records of the data file statuses and each cached block number by recording information in a Global Resource Directory (GRD). The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is part of the Cache Fusion feature.
  • LCK0: Instance Enqueue Process: The LCK0 process manages non-Cache Fusion resource requests such as library and row cache requests.

Global Resource Directory (GRD) records the information about the resources and the enqueues. GRD is maintained by GES and GCS. GRD remains in the memory and is stored on all of the running instances. Each instance manages a portion of the directory. This distributed nature is a key point for fault tolerance of the RAC. Global Resource Directory (GRD) is the internal in-memory database that records and stores the current status of the data blocks. Whenever a block is transferred out of a local cache to another instance’s cache the GRD is updated. The following resources information is available in GRD.

  • Data Block information such as file # and block #
  • Location of most current version
  • Modes of the data blocks: (N)Null, (S)Shared, (X)Exclusive

GCS is primarily responsible for maintaining coherency of the Buffer caches of all running RAC instances. CGS makes sure that instances acquire a resource cluster-wide before modifying or reading a database block. The GCS is used to synchronize global cache access, allowing only one instance to modify a block at any single point in time. The GCS, through the RAC wide Global Resource Directory, ensures that the status of data blocks cached in the cluster is globally visible and maintained.

GES is primarily responsible for maintaining coherency in the dictionary and library caches. The dictionary cache consists of the data dictionary information for each node in its SGA for quicker lookup and access. GES synchronizes dictionary and library caches across all running instances in the RAC environment. LMON, LCK and LMD processes work in tandem to make the GES operate in a smoothly.

Oracle RAC maintains multi-versioning architecture as far as blocks are concerned distinguishing between current data blocks and one or more consistent read (CR) versions of a block. A current block contains changes for all committed and uncommitted transactions. A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. In Oracle RAC, applying rollback segment information to current blocks produces consistent read versions of a block. Both the current and consistent read blocks are managed by the GCS. To transfer data blocks among database caches, buffers are shipped by means of the high speed IPC based on the private interconnect. Disk writes are only required for cache replacement. A past image (PI) of a block is kept in memory before the block is sent if it is a dirty block. In the event of failure, Oracle reconstructs the current version of the block by reading the PI blocks.

Global Logical Buffer Cache

Using Cache Fusion, Oracle RAC logically combines each instance’s buffer cache to enable the users and applications to process data as if the data resided on a logically combined single cache. The SGA size requirements for Oracle RAC are greater than the SGA requirements for single-instance Oracle databases due to Cache Fusion. To ensure that each Oracle RAC database instance obtains the block that it requires to satisfy a query or transaction, Oracle RAC instances use two components:

  • Global Cache Service (GCS)
  • Global Enqueue Service (GES)

The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances, which effectively increase the size of the SGA for an Oracle RAC instance. After one instance caches data, any other instance within the same cluster database can acquire a block image from another instance in the same database faster than by reading the block from disk. Therefore, Cache Fusion moves current blocks between instances rather than re-reading the blocks from disk. When a consistent block is needed or a changed block is required on another instance, Cache Fusion transfers the block image directly between the affected instances. Oracle RAC uses the private interconnect for inter-instance communication and block transfers. The GCS and GES processes, and the GRD collaborate to enable Cache Fusion. The image below is an illustration of the global buffer cache.

Cost of cache fusion.

RAC has a new feature called the cache fusion in comparison to OPS. Reads from disk are involved only if block is not available in the buffer caches of the other instances. The cost of a block access and cache coherency can be determined by CGS statistics and GCS events. CGS statistics and CGS events can gauge what is going on and can be a useful source of information for diagnostics.

The response time for the cache fusion transfers are determined by:

  • Physical private interconnects – More than one interconnect is required, the more interconnects the more redundancies and higher bandwidth for messages and cache fusion block transfer are available. Achieving low latencies is the objective. Private interconnects are required. Public corporate LAN might have a high bandwidth but have a low latency due to high retransmissions on encountered collisions. Interconnect performance depends on the speed that is set and redundancy.
  • IPC protocol – Oracle RAC tries to use user mode IPC inter process communication for sending data from one node to another as it does not require context switch and does not require kernel mode and runs in user application program mode. IPC protocol depends on the vendor of the hardware.
  • GCS protocol – GCS protocol depends on IPC protocol and private interconnect and is not directly affected by disk I/O except of disk I/O for the log write I/O whenever a dirty block in the buffer cache that is send over the interconnect to another instance for cache fusion reason either in write-read or write-write situation.

The response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in a write-read or write-write situation. For example a block is updated in transaction in buffer cache in instance A and the very same block image is transferred via the cache fusion traffic mechanism to buffer cache in instance C. We have to guarantee that redo log is written to the redo log files first. Other than that there is no much disk I/O performed. For example if there are 1000th updates they will result in doing much more redo than doing more reads. Heavy updates transactions will incur more disk I/O for the redo than doing more reads. The cache fusion response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in write-read or write-write situation.

Typical Latencies

Basically CR block request time and current block request time are what you are looking at.

  • CR block request time = build time + flush time + send time
  • Current block request time = pin time + flush time + send time

CR block request time is the time it takes to build the CR block in an instance that owns the appropriate image and the time to flush it, you have to write to disk , and how long it takes to send it across the interconnect.

Current block request time is how long it takes to pin the image in an instance that owns the block image and time it takes to flush it and send it across, because you cannot send it while some is changing that block at the same time. That is why you need to pin the block in exclusive mode then flush it and send it over the interconnect.

The statistics come from v$sysstat. Always query v$sysstat for the statistics or gv$sysstat.

Other latencies comes from v$ges_statistics or GV$ges_statistics view.

Latencies are dependent upon:

  • Utilization of the IPC protocol
  • Scheduling delays when the system is under high CPU or memory utilization
  • Log flushes for current blocks served

What you are primarily concerned are the average time to process CR block and the average time to process current block. Those shown values are typical. If overtime those times start to grow it might mean that you need to explore why it is taking longer. you might need to look at the wait events and the possible causes for those latencies to be growing. You need to determine why the things are changing and getting worst over time.

Waits in RAC environment

Wait events for RAC are very interesting in that like any other wait events show you all various things a session can wait on helping you identify what problem can be. RAC introduces an area that you do not need in a single instance environment.

Let’s review v$session_wait view. Oracle includes some common columns in v$session and v$session_wait views. The interesting columns are wait_time and event containing the name of the event in both views. If a session is waiting for something then when you query v$session_wait the event column would contain the name of event what a session is waiting on, for example db sequential read or log file parallel write occur in log writer (LGWR) as part of normal activity of copying records from the redo log buffer to the current online log or log file sync log when you commit also referred as a commit latency. If wait_time is 0 event shows what is waiting. If wait_time is greater than 0 event shows how long last event waited. If wait_time is -2 init parameter timed_statistics is not set. If wait_time is -1 wait_time is less than a hundred of a second and wait event is not captured.

For single instance, situation is simple, row in v$session_wait view represents either currently waiting 0 or something waited. RAC introduces complexity. When cache fusion is being done server process cannot do I/O as it wants. A single instance server process do I/O as it wants if a buffer is not in the buffer cache issues a requests and waits, for example db sequential read, and when I/O completes continues. In RAC server process makes a request to LMS background process handling cache fusion and when LMS gets involved there are several possibilities

  • one is that the instance requesting I/O have a valid copy of the block image in its own buffer cache and have enough information for the metadata global resource directory GRD and everything can be done locally without a block transfer
  • another scenario is when the requesting instance A does not have the metadata and another instance B have the GRD metadata for example block m in file n and to get the global resource metadata will require a hop and will get to instance B in order to obtain GRD metadata to identify the instance that have a valid copy of the block and if the block is either in instance A or B there are 2 hops as we already have 2 nodes involved.
  • Worst possible scenarios irrelevant to how many instances we have, assuming we have more than two instances, is when the instance that makes the request does not have the image copy of the block neither the global resource directory metadata for the block. In this case the LMS talk to LMS having the metadata who talks to LMS on a third instance that have the block image and the third instance using user mode IPC sends the block image to the first instance A requesting the block image. In the latter scenario you have a three hop situation. Three hop situation is the worst possible situation regardless of the number of nodes.

To summarize you have a requesting instance where the initial request is made for a block image by the server process, you have the instance that serves the image called the owning or serving instance and you have the instance that own the metadata in GRD for the particular block number and file number that is referred to as a mastering instance. The worst situation is when the owning, master and requesting instances are separate instances. The best case is when they are in the same instance. All wait events related to the global cache are then collected in the cluster wait class in V$ or OEM DC/GC. Wait events for RAC help you analyze what sessions are waiting for. Wait times are attributed to events that reflects the outcome of a request. Global cache waits are summarized in a broader category called cluster wait class. These events are used in ADDM or (G)V$ views to enable cache fusion diagnostics.

  • Wait times in RAC are attributed to events that reflect the outcome of a request in (g)v$session_wait :
    • Placeholders while waiting – wait_time = 0
    • Actual events after waiting – wait_time != 0

While making a request and waiting there is a placeholder event and wait_time is 0. When a block is received the placeholder event gets replaced by the actual event and the wait_time reflects that time waited.

Let’s look at the wait event views as a refresher for people that have not done it for a while.

  • V$SYSTEM_EVENT – total waits for an event
  • V$SESSION_WAIT_CLASS – waits for a wait event class by a session
  • V$SESSION_EVENT – waits for an event by a session
  • V$ACTIVE_SESSION_HISTORY – activity of recent active sessions
  • V$SESSION_WAIT_HISTORY – last 10 wait events for each active session.
  • V$SESSION_WAIT – events for which active sessions are waiting
  • V$SQLSTATS – identify SQL statements impacted by interconnect latencies

Global Cache wait events – Overview

On top (see the image) are 4 possible wait events known as a placeholder wait events. This occurs when wait_time is 0. When a server process makes a request for a block image it has no way to know if it is going to be satisfied by 1 or 2 or 3 nodes. All it does is contact local LMS process and requests a particular block image. This can be current or cr request and can be either ordinary or multiblock request so to summarize the possibilities are as follows:

  • Gc current request
  • Gc current multiblock request
  • Gc cr request
  • Gc cr multiblock request

The placeholder holds those events above whilst the process is waiting to get this block image. Once the wait is over you will know what has happened and the wait_time will be non zero value and you will have the actual event the process have been waiting in the event column of the v$session_wait instead of the placeholder event and the wait time for the actual event in the wait_time column in the v$session_wait. Let’s look at some of the events that are relevant.

Gc [current/cr] [2/3]-way – If you have 2 node cluster you cannot get 3-way as only two RAC instances are available and therefore only 2-way is possible as you can have at most two hops. If you have three or more RAC instances then 2-way or 3-way is possible. Blocks are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency. Monitor if average ms > 1ms or close to Disk I/O latency. Look at reducing latency.

Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while gc cr grant is read the block from disk and build a read consistent block once is read.

Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. You should look at it as it indicates that LMS could not dequeue message fast enough.

Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. It is also worth investigating.

Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example you are extending the high water mark and you are formatting the block images or blocks with block headers.

Gc [current/cr][failure/retry] – Not received because of failure, checksum error usually in the protocol of the private interconnect due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problems recovers and ultimately the block image can be received but it needs to retry.

Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention.

The key to understand is that there are separate wait events for the placeholder and when the event is over this event is replaced in v$session_wait with different event depending on how many hops there were, what kind of request was, what happened, was there a congestion, busy, failure or retry. Looking at (g)v$ views or AWR reports you need to see if you observe congestion, busy, failure, retry and investigate further.

Examples:

  • GC Current block 2-way

Let’s look at the case whereby the placeholder event is gc current block request and the actual event is gc current block 2-way. A process of the requesting instance asks for a block to be accessed in a current mode. The instance mastering the corresponding resource receives the requests. The master has the current version of the block. It makes a current copy using the current block and undo data it has and sends the current copy of the block to the requestor via the private interconnect. The inter-instance communication is done via the LMS processes messages as explained above. If the buffer is dirty in the owning instance it is flushed to disk with the log writer. The placeholder event changes from gc current block request to gc current block 2-way. This event might indicate a write/read contention if present in the top 5 event.

  • gc current block 3-way

    Let’s look at the case whereby the placeholder event is gc current block request and the actual event is gc current block 3-way. A process of the requesting instance asks for a block to be accessed in a current mode. The instance mastering the corresponding resource receives the request. The master forwards the request to the current holder of the block. The holder makes a current copy using the current block and undo data it has and sends a current copy of the block to the requestor via the cache fusion mechanism. The inter-instance communication is done via the LMS processes messages as explained above. If the buffer is dirty in the owning instance it is flushed to disk with the log writer. The placeholder event changes from gc current block request to gc current block 3-way. This event might indicate write/read contention if present in the top 5 events.

  • gc current grant 2 way

    Let’s look at the case whereby the placeholder event is gc current block request and the actual event is gc current grant 2-way. When an instance requests an access to a block in current mode it sends the request to the master instance. The master instance finds out that currently no instances, including itself, has any lock on the requested block. It sends a message back to the requesting instance granting it the shared lock on the block resulting to a permission to read the block from the disk data file. The placeholder event changes from gc current block request to gc current grant 2-way. This event might indicate that the instance is spending a significant amount of time in obtaining the locks if present in the top 5 list of events.

Global Enqueues Overview

In a RAC database, the GES is responsible for inter-instance resource coordination and coherency of the dictionary and library caches. GES tracks the status of all Oracle enqueue mechanisms for resources that are accessed by more than one instance. Oracle uses GES to manage concurrency for resources on transactions, tables and other structures within a RAC environment. GES is an integrated RAC component that coordinates global locks between the instances in the cluster.

An enqueue is
a memory structure that serializes access to a particular resource. Enqueue in RAC are global in order to have coherency for enqueues across all the instances and they are synchronous.

Most common enqueue are listed below

  • TX – transaction enqueue representing either row lock waits or ITL waits
  • TM – table manipulation enqueue on any table with DML. For example update block in one instance and truncate, drop or collect statistics in another instance on the same table.
  • HW – high watermark enqueue when there is extending of HW on a segment
  • TA – transaction recovery enqueue
  • SQ – sequence generation enqueue
  • US – undo segment enqueue for managing undo segments extension

The waits may constitute serious serialization points because enqueue are often even in a single instance and in RAC in addition to serialization that occurs due to enqueus they are exacerbated due to the private interconnect latency and overhead. If something does not scale well in a single instance it will not scale well in RAC due to contention problems and additional overhead contributed by the private interconnect latencies.

Session and System Statistics

There are various dynamic views to look at in order to diagnose performance problems with RAC environments.

  • V$SYSSTAT to characterize the workload.
  • V$SESSSTAT to monitor important sessions.
  • V$SEGMENT_STATISTICS includes RAC statistics on a per-segment basis identifying which segments are hot in regards to GES and GCS. Helps to identify segments that are contended over the interconnect.
  • V$ENQUEUE_STATISTICS determines the enqueues with the highest impact.
  • V$INSTANCE_CACHE_TRANSFER breaks down GCS statistics into block classes. What kind of blocks, data blocks or index blocks etc, are giving the most contention.

RAC relevant statistics group are:

  • Global Cache Service statistics
  • Global Enqueue Service statistics
  • Statistics for messages send

IPC configuration

CGS protocol depends on the IPC protocol and private interconnect. You can look at configuring the following for IPC

  • Socket buffer size: net.core.rmem_max, net.core.wmem_max.
  • Interface queue size: increase the size of the interface queue i.e. the transmit queue of the network interface issue the following command using ifconfig.
    • # ifconfig eth0 txqueuelen 10000
  • Input queue size: netdev_max_backlog specifies the maximum length of the input queues for the processors. The default value is 300 (packets). Linux has to wait up to scheduling time to flush buffers. Therefore, this value can limit the network bandwidth when receiving packets. If you want to get higher throughput, you need to condider increasing netdev_max_backlog

GCS processing

Latency of the GCS can be contributed by

  • CPU starvation
  • Memory starvation
  • IPC latencies
  • LMS configuration

In case of LMS congestion that is, when LMS cannot dequeue messages fast enough look at the GCS_SERVER_PROCESSES.

In 10g GCS_SERVER_PROCESSES is 2 by default. In 11g for one CPU, there is one GCS server process. For 2 – 8 CPUs, there will be 2 GCS server processes. For more than 8 CPUs, the number of GCS server processes will be equal to the number of CPUs divided by 4. If the result includes a fraction, ignore the fraction. For example, if you had 10 CPUs, then 10/4 would mean 2 GCS processes.

Let’s put it all together.

Private interconnect considerations

The network between the nodes of a RAC cluster must be private. The interfaces must have the same name and be in the same subnet across all nodes of the cluster. Configure the interfaces on the fastest PCI bus.

Supported links are Giga Bit Ethernet (GbE) and Infiniband (IB.

Supported protocols are UDP, RDS for Linux/Unix and TCP for Windows.

Use multiple or dual-ported NICs for redundancy (HA), load balancing, load spreading and increase bandwidth with NIC bonding/aggregation.

Large (Jumbo) Frames for GbE recommended if the global cache workload requires it.

Important settings

  • Full bit rate: Ensure that the network interface card (NIC) is configured at the maximum bandwidth that is 100Mbps, 1Gbps, 10Gbps.
  • Full duplex mode: In full-duplex operation, a connection between two devices is capable of sending data in both directions simultaneously.
    In contrast technologies that employ half-duplex operation are capable of sending information in both directions between two nodes, but only one direction or the other can be utilized at a time.
  • Flow control: control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from outrunning a slow receiver. It provides a mechanism for the receiver to control the transmission speed, so that the receiving node is not overwhelmed with data from sending nodes. Flow control should be distinguished from congestion control. Oracle recommends that for the flow control Rx=on tx=off
    • /sbin/ethtool -A eth0 autoneg on tx off rx on
  • NIC ring buffers: NIC is driver dependent and defaults are generally acceptable. Look at and investigate ring buffer events.
  • CPU(s) receiving networks interrupts:

Buffer overflows, congested links and flow control can have severe consequences for performance. Use netstat to obtain a report related to fragmentation and/or reassembly errors. Use ifconfig for reporting dropped packets or overflow.

Verify you setup

  • Use Cluster Verification utility
  • Load testing eliminates potentials for problems
  • AWR and ADDM provides an estimate of link utilization

Block access latencies increase when CPU(s) are busy and run queues are long.

  • Immediate LMS scheduling is critical for predictable block access latencies when CPU > 80% busy
  • LMS is by default elevated priority.

Fewer and busier LMS processes may be more efficient.

  • monitor their CPU utilization
  • Caveat: 1 LMS can be good for runtime performance but may impact cluster reconfiguration and instance recovery time
  • the default is good for most requirements. gcs_server_processes init parameter overrides defaults

The table below shows some characteristics of available cluster interconnects.

Measure-ment SMP Bus Memory channel Myrinet SUN SCI Gigabit Ethernet
(Gbe)
Infiniband
(IB)
Latency
(micro second)
0.5 3 7-9 10 100 <10
CPU overhead (micro second) <1 <1 <1   ~100  
Messages/sec
(million)
>10 >2     <0.1  
Bandwidth
MB/sec
>500 >100 ~250 ~70 ~50 3Gbps with ability to aggregate

Many of the Cluster vendors have designed very competitive technology. Technical solutions such as Memory Channel, SCI and Myrinet support virtual shared memory space implementing inter-nodal memory address space mapping. Connections between nodes are established by mapping part of the nodes virtual address space interconnect interface. Because of the memory mapped nature of the interface, transmitting or receiving overhead is similar to an access to local main memory. This mechanism is the fundamental reason for the low latency levels as seen in Memory Channel, SCI and Myrinet.

Aggregation

Aggregation is the ability of having separate physical NIC to operate together as one logical interface. The aggregated logical interface can operate in one of two modes:

  • Active – Standby: providing high availability and fault tolerance
  • Active – Active: providing high availability, fault tolerance and increased bandwidth equal to the sum of all bandwidths of aggregated interfaces.

Some of the Aggregation solutions are:

  • Cisco Etherchannel based 802.3ad
  • AIX Etherchannel
  • HPUX Auto Port Aggregation
  • SUN Trunking, IPMP, GLD
  • Linux Bonding (only certain modes)
  • Windows NIC teaming

Wait events worth investigation

Let’s look at some of the wait events which are worth further investigation as they represent a potential performance problem if the wait time is excessive or the event wait time is among the top 5 list in AWR report.

  1. gc cr/current blocks lost: This event shows block losses during transfers. This event almost always represents a severe performance problem and can reveal network congestion involving discarded packets and fragments, packet reassembly or timeouts, buffers overflows, flow control. Checksum errors or corrupted headers are also often the reason for the wait event. It is worth investigating the IPC configuration and possible downstream network problems (NIC, switch etc). Operating system data needs to be gathered with ifconfig, netstat and sar to name a few. ‘cr request retry’ event is likely to be seen when ‘gc cr blocks lost’ show up. High values indicate network problems. The use of an unreliable IPC protocol such as UDP may result in the value for global cache blocks lost being non-zero. Common problems and symptoms are:
  • Interconnect or Switch Problems
  • System load and scheduling
  • Contention
  • Unexpectedly high global cache latencies

Check for miss-configured or faulty interconnects that can cause:

  • Dropped packets/fragments
  • Buffer overflows
  • Packet reassembly failures or timeouts
  • Ethernet Flow control kicks in
  • TX/RX errors

Solution

  • Fix interconnect NICs and switches
  • Tune IPC buffer sizes
  1. gc block corrupt: This event shows if any blocks were corrupted during transfers. If high values are observed, there is probably an IPC, network or hardware problem.
  2. global cache null to s and global cache null to x: These events are generated by inter-instance block ping across the network. Inter-instance block ping is when two instances exchange the same block back and forth. Processes waiting for global cache null to s events are waiting for a block to be transferred from the instance that last changed it. When one instance repeatedly requests cached data blocks from the other RAC instances, these events consume a greater proportion of the total wait time. The only method for reducing these events is to reduce the number of rows per block to eliminate the need for block swapping between two instances in the RAC cluster. You can also look at implementing logical partitioning where the activity is on a selected instance. Frequent commits can also reduce some inter-instance transfers.
  3. global cache cr request: This event is generated when an instance has requested a consistent read data block and the block to be transferred had not arrived at the requesting instance. Other than examining the cluster interconnects for possible problems, there is nothing that can be done about this event.
  4. gc buffer busy: This event can be associated with buffer cache contention. However, note than there can be a multiple symptoms for the same cause. It can be seen together with ‘db file scattered reads’ event. Global cache access and serialization attributes to this event.
  5. congested: tells that The LMS cannot dequeue messages fast enough. The events that contain ‘congested’ suggest CPU saturation (runaway or spinning processes), long running queues, network configuration issues. It indicates performance problems. While investigating need to maintain a global view and remember that symptom and cause can be on different instances. This event can also happen if LSM cannot dequeue messages fast enough. gcs_server_processes init parameter controls number of LMS processes although in most of the cases the default value is sufficient. Excessive memory consumption leading to memory swapping can be another reason.
  6. busy: The events that contain ‘busy’ indicate contention. It needs investigation by drilling down into either SQL with highest cluster wait time or segment statistics with highest block transfers. Also look at objects with highest number of block transfers and global serialization.
  7. Gc [current/cr] [2/3]-way – If you have 2 node cluster you cannot get 3-way as only two RAC instances are available and therefore only 2-way is possible as you can have at most two hops. If we have three or more RAC instances then 2-way or 3-way is possible. Events are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency. The objective, if seen in the top 5 events, is to lower the latency.
    High Latencies are addressed by:
  • Check network configuration
    • Make sure that is private (do not share traffic)
    • Make sure that is running at expected bit rate
  • Find cause of high CPU consumption or high memory consumption
    • Runaway or spinning processes
  1. Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while gc cr grant is read the block from disk and build a read consistent block once is read. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency.
  2. Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. Look at congested.
  3. Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. Look for reducing concurrency and contention.
  4. Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example we are extending the high water mark and we are formatting the block images or blocks with block headers.
  5. Gc [current/cr][failure/retry] – Not received because of failure, checksum error usually in the protocol of the private interconnect due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problem recovers and ultimately the block image can be received but it needs to retry.
  6. Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention. Identify the ‘hot’ blocks in the application. Reduce the concurrency on the hot blocks.
  7. gc current grant busy – events that can be seen in applications with high insert rates where there is a serialization on HWM enqueue.

Perform a top down approach for performance analysis can be helpful. You can start with ADDM analysis then continue with AWR detail statistics and historical data and last but not least ASH will provide you with finer-grained session specific data.

Practical performance tuning considerations for RAC

Application tuning and modifying schema design is often the most beneficial performance solution when migrating to RAC. In case you are on RAC following the single instance tuning methodology and avoiding serialization reduces the private interconnect overhead.

Resize and tune the buffer cache when migrating from a single instance database to RAC to accommodate the CGS, GES and GRD overhead.

Reduce full table scans in OLTP systems
while doing updates and separate batch from OLTP at the same time. In RAC when you do a full table scan the instance doing them needs to build a read consistent version of the blocks and if the modifications are done across many instances than the undo needs to be shipped from across all the instances. The latter is a lot of unnecessary traffic. It is recommended if possible that operationally OLTP and DSS be scheduled so that they do not overlap in order to improve performance.

Use Automatic Segment Space Management. ASSM helps for RAC high insert intensive applications to avoid issues previously encountered and associated with free lists and free list groups and older ways Oracle used to manage space in segments. ASSM also provides an instance affinity to table blocks.

Use sequence caching in RAC.
Caching Oracle sequences can reduce index contention. If your business permits to have application logic where the numbers generated by the sequence are not ordered across all instances than you can use an Oracle sequence generated using cache number noorder clause. When you cache a sequence using cache number noorder clause on every instance you achieve a logical partitioning whereby the index keys on each instance are non-overlapping and each instance accesses a separate portion of the index tree. Successive index block splits can create instance affinity for the index blocks. Thus, you can avoid interconnect overhead while using indexes by avoiding a hot blocks being shipped back and forth across all instances.

Undo considerations in RAC. Undo is very expensive in RAC. Maintain a global view and understand that cause and symptoms can be on different instances and remember that UNDO is generated and handled on each instance. Avoid frequent reads of index blocks containing active transactions from many instances that result into excessive undo shipment across instances and undo contention. Undo block shipment may become a bottleneck if either

  • Excessive amount of index blocks are changed by multiple instances
  • Transaction duration is longer

This can often be seen in application that frequently read recently modified or inserted data but commit infrequently. You can avoid it if you use

  • Short transactions to reduce the probability that a index block in the cache has uncommitted data, thereby reducing the need to access undo information for consistent read
  • Increased Oracle sequence cache sizes to reduce inter-instance concurrent access to index blocks. CR versions of index blocks modified by only one instance can be fabricated without the need of remote undo block shipping.

In RAC shipment of undo, via the interconnect, introduces an extra overhead as undo is obtained from the instance where the DML took place.

High water mark considerations for high volume insert applications. For applications with high rate of inserts new blocks need to be allocated to the segment. New blocks are made available while serializing on HWM enqueue. Solution is to use large uniform extent sizes for the locally managed and automatic space managed segments that are subject to high-volume inserts.

Concurrent cross-instance calls considerations in RAC. Every time a table is truncated there is a checkpoint. In RAC there is cross-instance call to flush dirty blocks of the table that may be spread across all RAC instances. In some applications it is not uncommon to see a lot of truncate operations on tables containing temporary data. In RAC truncating tables concurrently from different instances does not scale well because it constitutes a point of serialization due to the fact that while the first truncate command is happening, the second has to wait until the first one completes.

Use partitioning to reduce inter-instance traffic. Partitioning using services allow us to avoid a lot of cache fusion overhead by having partitions accessed by users connected to some instances and other partitions accessed by users connected to other instances. It cannot be guaranteed but helps under certain circumstances. Table and index partitioning schemes can also reduce the interconnect traffic by reducing the data sets.

Avoid unnecessary parsing. In a single instance excessive parsing presents a room of concern. In addition in RAC, GES is kept busy to synchronize the library and row caches. You do not want this overhead.

Minimize locking usage.
Contention is going to be exacerbated in RAC because apart from the contention available in a single instance database in RAC you have also an interconnect overhead, so minimizing the locking overhead is very important in RAC environments. Typically this occurs with third party applications that does a lot of locking overhead due to the fact that their code is the same so that they can work on many database platforms. This is worth investigating further. Using tables instead of Oracle sequences to generate a unique numbers can be a serious problem as blocks will be cashed across all instances and will be transmitted back and forth via the private interconnect along with the undo from all instances.

Remove unselective indexes in OLTP
as they are not used by CBO for access method but while executing DML statements they steel need to be maintained. That is an overhead generally speaking but in RAC we might end up having those index blocks transferred by cache fusion. Removing unselective indexes will improve performance. Index splits are another important activity that you must be aware and look at.

Even in a single instance database applications with heavy inserts used to suffer a bad performance due to the few factors such as:

  • monotonically ordered increasing values in the index (right edge of the tree)
  • frequent index block splits

In RAC a global hot index block and index block splits will be also a problem when you account for the cluster interconnect overhead. The primary objective should be an approach aiming at equal and gradual distribution of the concurrency in the index tree. Such a distribution can be achieved if you:

  • Use a global hash partitioned indexes
  • Use reverse key indexes
  • Use cached sequences for logical partitioning

Configure interconnect properly with high bandwidth and low latency. Look at the private interconnect consideration section.

Sources:

Dowload Oracle Cache fusion, private inter connects and practical performance management considerations in Oracle RAC 

Like this:

August 2, 2011 – Posted by | oracle

7 Comments »

  1.  
    0
     
    0

     

    Rate This

    Congratulations. Nice post!

    Comment by chduarte02 | September 16, 2011 | Reply

  2.  
    0
     
    0

     

    Rate This

    Thanks for a wonderful explanation

    Comment by KRISHNA KUMAR | September 17, 2011 | Reply

  3.  
    0
     
    0

     

    Rate This

    I’m now not positive the place you are getting your info, however great topic. I needs to spend a while studying more or figuring out more. Thanks for great info I was in search of this information for my mission. Sorry for poor English. Goodbye from Japan!

    Comment by gre prep course | November 29, 2011 | Reply

  4.  
    0
     
    0

     

    Rate This

    Pretty nice post. I simply stumbled upon your blog and wished to mention that I’ve really loved surfing around your blog posts. After all I will be subscribing on your rss feed and I hope you write again soon!

    Comment by clickbank | December 5, 2011 | Reply

  5.  
    1
     
    0

     

    Rate This

    good job and great article

    Comment by Gpkanth | August 17, 2012 | Reply

  6.  
    0
     
    0

     

    Rate This

    Great Article summarizing tuning in RAC

    Comment by Sriram | August 26, 2012 | Reply

  7.  
    1
     
    0

     

    Rate This

    Nice post thanks a lot!

    Comment by Peter | December 3, 2012 | Reply

 

Leave a Reply

« Previous | Next »

Follow

Follow “Guenadi N Jilevski’s Oracle BLOG”

Get every new post delivered to your Inbox.

Join 419 other followers

Monitor_undo_tablespace

select sum(decode(status,’EXPIRED’,bytes,0))/sum(bytes) EXPIRED_PERCENT,
sum(decode(status,’UNEXPIRED’,bytes,0))/sum(bytes) UNEXPIRED_PERCENT,
sum(decode(status,’ACTIVE’,bytes,0))/sum(bytes) ACTIVE_PERCENT
from dba_undo_extents;

 

select t.start_time, t.used_ublk, s.username, r.segment_name
from v$transaction t,v$session s, dba_rollback_segs r
where t.ses_addr= s.saddr and t.xidusn=r.segment_id;

calculate optimum undo retention:

SELECT d.undo_size/(1024*1024) “ACTUAL UNDO SIZE [MByte]”,
SUBSTR(e.value,1,25) “UNDO RETENTION [Sec]”,
(TO_NUMBER(e.value) * TO_NUMBER(f.value) *
g.undo_block_per_sec) / (1024*1024)
“NEEDED UNDO SIZE [MByte]”
FROM (
SELECT SUM(a.bytes) undo_size
FROM v$datafile a,
v$tablespace b,
dba_tablespaces c
WHERE c.contents = ‘UNDO’
AND c.status = ‘ONLINE’
AND b.name = c.tablespace_name
AND a.ts# = b.ts#
) d,
v$parameter e,
v$parameter f,
(
SELECT MAX(undoblks/((end_time-begin_time)*3600*24))
undo_block_per_sec
FROM v$undostat
) g
WHERE e.name = ‘undo_retention’
AND f.name = ‘db_block_size’

–Undo advisor

DECLARE
tid NUMBER;
tname VARCHAR2(30);
oid NUMBER;
BEGIN
DBMS_ADVISOR.CREATE_TASK(‘Undo Advisor’, tid, tname, ‘Undo Advisor Task’);
DBMS_ADVISOR.CREATE_OBJECT(tname, ‘UNDO_TBS’, null, null, null, ‘null’, oid);
DBMS_ADVISOR.SET_TASK_PARAMETER(tname, ‘TARGET_OBJECTS’, oid);
DBMS_ADVISOR.SET_TASK_PARAMETER(tname, ‘START_SNAPSHOT’, 1);
DBMS_ADVISOR.SET_TASK_PARAMETER(tname, ‘END_SNAPSHOT’, 2);
DBMS_ADVISOR.execute_task(tname);
end;

–Monitor Undo of a user session–

select ses.username
, substr(ses.program, 1, 19) command
, tra.used_ublk
, from v$session ses
, v$transaction tra
where ses.saddr = tra.ses_addr;

Unix Commands

Search and find for a string in a directory:

find . -type f | xargs grep -l “shawn.o.farley@wv.gov”

 

Search for files with a specific name: find -name

find . -name "whatYouAreLookingFor" -print

How to read AWR

Search This Blog

powered by

This Blog
This Blog

Saturday, May 5, 2012

How To Read An AWR Report ?

Prerequisites for awrreport.

Statistics level should be set to typical.
Timed statistics should also be enabled.

SQL> show parameter statistics_level 

NAME TYPE VALUE ------------------------------------ -------- ------------------------------ statistics_level string TYPICAL

SQL> select STATISTICS_NAME,ACTIVATION_LEVEL, SYSTEM_STATUS from v$statistics_level;

STATISTICS_NAME ACTIVAT SYSTEM_S ---------------------------------------------------------------- ------- -------- Buffer Cache Advice TYPICAL ENABLED MTTR Advice TYPICAL ENABLED Timed Statistics TYPICAL ENABLED Timed OS Statistics ALL DISABLED Segment Level Statistics TYPICAL ENABLED PGA Advice TYPICAL ENABLED Plan Execution Statistics ALL DISABLED Shared Pool Advice TYPICAL ENABLED Modification Monitoring TYPICAL ENABLED Longops Statistics TYPICAL ENABLED Bind Data Capture TYPICAL ENABLED Ultrafast Latch Statistics TYPICAL ENABLED Threshold-based Alerts TYPICAL ENABLED Global Cache Statistics TYPICAL ENABLED Active Session History TYPICAL ENABLED Undo Advisor, Alerts and Fast Ramp up TYPICAL ENABLED Streams Pool Advice TYPICAL ENABLED Time Model Events TYPICAL ENABLED Plan Execution Sampling TYPICAL ENABLED Automated Maintenance Tasks TYPICAL ENABLED SQL Monitoring TYPICAL ENABLED Adaptive Thresholds Enabled TYPICAL ENABLED V$IOSTAT_* statistics TYPICAL ENABLED

The Header

The header basically gives you a Bird’s-eye view information on the database.

WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ DDOG 1373267642 DDOG2 2 10.2.0.2.0 YES lefty Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 2009 09-Jan-07 09:00:12 40 3.7 End Snap: 2010 09-Jan-07 10:00:19 39 2.8 Elapsed: 60.12 (mins) DB Time: 0.09 (mins) Cache Sizes ~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 528M 528M Std Block Size: 8K Shared Pool Size: 480M 480M Log Buffer: 15,152K

Elasped Time: It represents the snapshot window or the time between the two snapshots.
DB TIME: Represents the activity on the database.

If DB TIME is Greater than Elapsed Time then it means that database has high workload.

Load Profile

Meaning of various terms.

•Redo size – Indicates the amount of DML  activity happening in the database. 
•Logical and physical reads – Represents number of IO's (Physical and logical) that the
 database is performing.  
•User calls - Indicates how many user calls have occurred during the snapshot period.
 This value can give you some indication if usage has increased. 
•Parses and hard parses - Provides an indication of the efficiency of SQL re-usage.  
•Sorts - Number of sorts occurring in the database. 
•Logons – Number of logins which occurred in the database. 
•Executes – Number of SQL statements which were executed. 
•Transactions - Indicates how many transactions occurred during the snapshot period.

Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 209.78 1,014.29 Logical reads: 20.14 97.38 Block changes: 0.50 2.42 Physical reads: 0.00 0.01 Physical writes: 0.10 0.49 User calls: 1.05 5.08 Parses: 0.39 1.86 Hard parses: 0.00 0.01 Sorts: 0.52 2.53 Logons: 0.02 0.12 Executes: 0.95 4.61 Transactions: 0.21 % Blocks changed per Read: 2.49 Recursive Call %: 84.53 Rollback per transaction %: 97.45 Rows per Sort: 41.43

The percentage of soft parses should always be higher than hard parses.
Possible reasons for excessive hard parses may be a small shared pool
Or may be that bind variables are not being used.

Instance Efficiency

Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.98 Redo NoWait %: 100.00 Buffer Hit %: 100.00 In-memory Sort %: 100.00 Library Hit %: 99.64 Soft Parse %: 99.64 Execute to Parse %: 59.60 Latch Hit %: 100.00 Parse CPU to Parse Elapsd %: 130.00 % Non-Parse CPU: 96.30 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 87.98 87.97 % SQL with executions>1: 97.83 97.86 % Memory for SQL w/exec>1: 85.54 85.52

The following ratios should be above 90% in a database.

Buffer Nowait
Buffer  Hit   
Library Hit
Redo NoWait 
In-memory Sort 
Soft Parse 
Latch Hit 
Non-Parse CPU

The execute to parse ratio should be very high in a ideal database.

The execute to parse ratio is basically a measure between the number
Of times a sql is executed versus the number of times it is parsed.

The ratio will move higher as the number of executes go up, while
The number of parses either go down or remain the same.

The ratio will be close to zero if the number of executes and parses
Are almost equal.

The ratio will be negative executes are lower but the parses are higher.

% SQL with executions>1 value should also be high, a low value could mean that
The database is not using shared SQL statements which in turn could mean that
Bind variables are not being used.

Top 5 Timed Events

Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- control file parallel write 1,220 18 15 331.7 System I/O control file sequential read 6,508 6 1 110.9 System I/O CPU time 4 64.4 CGS wait for IPC msg 422,253 1 0 20.9 Other change tracking file synchrono 60 1 13 14.4 Other -------------------------------------------------------------

In an ideal database CPU and I/O should be the top wait events.
If there are events like TX – row lock contention, Latch Free then that
means there is contention in your database. If there is a high log file sync
event then check why is it happening. A possible solution for this problem is
to increase the size of the redo log buffer and to move your Logfiles to a storage
system where Disk I/O capacity is high.
Also, The db file sequential read(which means index reads) should be higher as
compared to db file scattered read(which means full table scans).

RAC Statistics

If you are running on a RAC cluster, then the AWRRPT.SQL report will provide various
RAC statistics including statistics on the number of RAC instances, as well as global
cache and enqueue related performance statistics. Here is an example of the RAC statistics
part of the report:

RAC Statistics DB/Inst: A109/a1092 Snaps: 2009-2010 Begin End ----- ----- Number of Instances: 2 2 Global Cache Load Profile ~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Global Cache blocks received: 0.11 0.52 Global Cache blocks served: 0.14 0.68 GCS/GES messages received: 0.88 4.23 GCS/GES messages sent: 0.85 4.12 DBWR Fusion writes: 0.01 0.04 Estd Interconnect traffic (KB) 2.31 Global Cache Efficiency Percentages (Target local+remote 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer access - local cache %: 99.47 Buffer access - remote cache %: 0.53 Buffer access - disk %: 0.00 Global Cache and Enqueue Services - Workload Characteristics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg global enqueue get time (ms): 0.0 Avg global cache cr block receive time (ms): 0.2 Avg global cache current block receive time (ms): 0.3 Avg global cache cr block build time (ms): 0.0 Avg global cache cr block send time (ms): 0.0 Global cache log flushes for cr blocks served %: 1.8 Avg global cache cr block flush time (ms): 4.0 Avg global cache current block pin time (ms): 0.0 Avg global cache current block send time (ms): 0.1 Global cache log flushes for current blocks served %: 0.4 Avg global cache current block flush time (ms): 0.0 Global Cache and Enqueue Services - Messaging Statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg message sent queue time (ms): ######## Avg message sent queue time on ksxp (ms): 0.1 Avg message received queue time (ms): 4.6 Avg GCS message process time (ms): 0.0 Avg GES message process time (ms): 0.0 % of direct sent messages: 45.26 % of indirect sent messages: 31.59 % of flow controlled messages: 23.15 -------------------------------------------------------------

From the perspective of performance tuning the amount of block transfer
across the interconnect should be low in comparison to localized access.
This can be seen in the "Global Cache Efficiency Percentages" section.

In the "Global Cache and Enqueue Services" the upper limit for
Avg global cache cr block receive time should be 4 and
Avg global cache current block receive time should be 8.

Time Model Statistics

Time related statistics presents the various operations which are consuming most of the database time.

Time Model Statistics DB/Inst: A109/a1092 Snaps: 2009-2010 -> Total time in database user-calls (DB Time): 5.5s -> Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic -> Ordered by % or DB time desc, Statistic name Statistic Name Time (s) % of DB Time ------------------------------------------ ------------------ ------------ sql execute elapsed time 4.5 82.8 DB CPU 3.5 64.4 connection management call elapsed time 0.1 1.6 parse time elapsed 0.1 1.3 PL/SQL execution elapsed time 0.0 .9 hard parse elapsed time 0.0 .3 sequence load elapsed time 0.0 .1 repeated bind elapsed time 0.0 .0 DB time 5.5 N/A background elapsed time 33.0 N/A background cpu time 9.7 N/A -------------------------------------------------------------

If Hard parses or parsing time is very high then further investigation
should be done to resolve the problem. 

Wait class and Wait Event Statistics

Closely associated with the time model section of the report are the wait class and wait
event statistics sections. Within Oracle, the duration of a large number of operations 
(e.g. Writing to disk or to the control file) is metered. These are known as wait events,
because each of these operations requires the system to wait for the event to complete. 
Thus, the execution of some database operation (e.g. a SQL query) will have a number of
wait events associated with it. We can try to determine which wait events are causing us
problems by looking at the wait classes and the wait event reports generated from AWR.
Wait classes define "buckets" that allow for summation of various wait times. Each wait 
event is assigned to one of these buckets (for example System I/O or User I/O). These buckets
allow one to quickly determine which subsystem is likely suspect in performance problems 
(e.g. the network, or the cluster). Here is an example of the wait class report section:

Wait Class DB/Inst: A109/a1092 Snaps: 2009-2010 -> s - second -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> us - microsecond - 1000000th of a second -> ordered by wait time desc, waits desc Avg %Time Total Wait wait Waits Wait Class Waits -outs Time (s) (ms) /txn -------------------- ---------------- ------ ---------------- ------- --------- System I/O 8,142 .0 25 3 10.9 Other 439,596 99.6 3 0 589.3 User I/O 112 .0 0 3 0.2 Cluster 443 .0 0 0 0.6 Concurrency 216 .0 0 0 0.3 Commit 16 .0 0 2 0.0 Network 3,526 .0 0 0 4.7 Application 13 .0 0 0 0.0 -------------------------------------------------------------

In this report the system I/O wait class has the largest number of waits (total of 25 seconds)
and an average wait of 3 milliseconds. 

Wait events are normal occurrences, but if a particular sub-system is having a problem 
performing (e.g. the disk sub-system) this fact will appear in the form of one or more 
wait events with an excessive duration. The wait event report then provides some insight 
into the detailed wait events. Here is an example of the wait event report (we have 
eliminated some of the bulk of this report, because it can get quite long). Note that 
this section is sorted by wait time (listed in microseconds).

                                                                   Avg
                                             %Time  Total Wait    wait     Waits
Event Waits -outs Time (s) (ms) /txn ---------------------------- -------------- ------ ----------- ------- --------- control file parallel write 1,220 .0 18 15 1.6 control file sequential read 6,508 .0 6 1 8.7 CGS wait for IPC msg 422,253 100.0 1 0 566.0 change tracking file synchro 60 .0 1 13 0.1 db file parallel write 291 .0 0 1 0.4 db file sequential read 90 .0 0 4 0.1 reliable message 136 .0 0 1 0.2 log file parallel write 106 .0 0 2 0.1 lms flush message acks 1 .0 0 60 0.0 gc current block 2-way 200 .0 0 0 0.3 change tracking file synchro 59 .0 0 1 0.1

In this example our control file parallel write waits (which occurs during writes to the 
control file) are taking up 18 seconds total, with an average wait of 15 milliseconds per 
wait. Additionally we can see that we have 1.6 waits per transaction (or 15ms * 1.6 per transaction = 24ms).

Operating System Statistics
This part of the report provides some basic insight into OS performance, and OS configuration 
too. This report may vary depending on the OS platform that your database is running on. 
Here is an example from a Linux system:

Statistic Total -------------------------------- -------------------- BUSY_TIME 128,749 IDLE_TIME 1,314,287 IOWAIT_TIME 18,394 NICE_TIME 54 SYS_TIME 31,633 USER_TIME 96,586 LOAD 0 RSRC_MGR_CPU_WAIT_TIME 0 PHYSICAL_MEMORY_BYTES 3,349,528 NUM_CPUS 4

In this example output, for example, we have 4 CPU's on the box.

SQL In Need of Tuning
Next in the report we find several different reports that present SQL statements that 
might be improved by tuning. There are a number of different reports that sort offending 

SQL statements by the following criteria:

•Elapsed time

•CPU time

•Buffer gets

•Physical reads

•Executions

•Parse calls

•Sharable memory

•Version count

•Cluster wait time

While these reports might not help tune specific application problems, they can help you 
find more systemic SQL problems that you might not find when tuning a specific application 
module. Here is an example of the Buffer gets report:

 Gets CPU Elapsed Buffer Gets Executions per Exec %Total Time (s) Time (s) SQL Id -------------- ------------ ------------ ------ -------- --------- ------------- 2,163 7 309.0 3.0 0.03 0.04 c7sn076yz7030 select smontabv.cnt, smontab.time_mp, smontab.scn, smontab.num_mappings, smon tab.tim_scn_map, smontab.orig_thread from smon_scn_time smontab, (sel ect max(scn) scnmax, count(*)+sum(NVL2(TIM_SCN_MAP,NUM_MAPPINGS, 0)) cnt from smon_scn_time where thread=0) smontabv where smon 1,442 721 2.0 2.0 0.05 0.05 6ssrk2dqj7jbx select job, nvl2(last_date, 1, 0) from sys.job$ where (((:1 <= next_date) and (n ext_date <= :2)) or ((last_date is null) and (next_date < :3))) and (field1 = :4 or (field1 = 0 and 'Y' = :5)) and (this_date is null) order by next_date, j ob 1,348 1 1,348.0 1.9 0.04 0.04 bv1djzzmk9bv6 Module: TOAD 9.0.0.160 Select table_name from DBA_TABLES where owner = 'CDOL2_01' order by 1 1,227 1 1,227.0 1.7 0.07 0.08 d92h3rjp0y217 begin prvt_hdm.auto_execute( :db_id, :inst_id, :end_snap ); end; 896 4 224.0 1.2 0.03 0.03 6hszmvz1wjhbt Module: TOAD 9.0.0.160 Select distinct Cons.constraint_name, cons.status, cons.table_name, cons.constra int_type ,cons.last_change from sys.user_constraints cons where 1=1 a nd cons.status='DISABLED'

In this report we find a SQL statement that seems to be churning through 309 buffers per execution. 
While the execution times are not terrible we might want to look closer into the SQL statement 
and try to see if we could tune it (in fact this is Oracle issued SQL that we would not tune anyway).

Instance Activity Stats

This section provides us with a number of various statistics (such as, how many DBWR Checkpoints 
occurred, or how many consistent gets occurred during the snapshot). Here is a partial example of the report:

Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- consistent changes 9 0.0 0.0 consistent gets 70,445 19.5 94.4 consistent gets - examination 8,728 2.4 11.7 consistent gets direct 0 0.0 0.0 consistent gets from cache 70,445 19.5 94.4 cursor authentications 2 0.0 0.0 data blocks consistent reads - u 5 0.0 0.0 db block changes 1,809 0.5 2.4 db block gets 2,197 0.6 3.0 db block gets direct 0 0.0 0.0 db block gets from cache 2,033 0.6 2.7

Tablespace and Data File IO Stats

The tablespace and data file IO stats report provides information on tablespace IO performance. 
From this report you can determine if the tablespace datafiles are suffering from sub-standard 
performance in terms of IO response from the disk sub-system. Here is a partial example of the 
tablespace report:

Tablespace ------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ SYSAUX 1 0 0.0 1.0 159 0 13 0.8 UNDOTBS2 1 0 10.0 1.0 98 0 0 0.0 SYSTEM 1 0 10.0 1.0 46 0 0 0.0 AUD 1 0 0.0 1.0 1 0 0 0.0 CDOL2_INDEX 1 0 10.0 1.0 1 0 0 0.0 CDOL_DATA 1 0 10.0 1.0 1 0 0 0.0 DBA_DEF 1 0 10.0 1.0 1 0 0 0.0 UNDOTBS1 1 0 10.0 1.0 1 0 0 0.0 USERS 1 0 10.0 1.0 1 0 0 0.0 USER_DEF 1 0 10.0 1.0 1 0 0 0.0

If the tablespace IO report seems to indicate a tablespace has IO problems, we can then use the 
file IO stat report allows us to drill into the datafiles of the tablespace in question and 
determine what the problem might be. Here is an example of the File IO stat report:

Tablespace Filename ------------------------ ---------------------------------------------------- Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) -------------- ------- ------ ------- ------------ -------- ---------- ------ AUD +ASM01/a109/datafile/aud.296.604081931 1 0 0.0 1.0 1 0 0 0.0 CDOL2_INDEX +ASM01/a109/datafile/cdol2_index_001.dbf 1 0 10.0 1.0 1 0 0 0.0 CDOL_DATA +ASM01/a109/datafile/cdol_data_001.dbf 1 0 10.0 1.0 1 0 0 0.0 DBA_DEF +ASM01/a109/datafile/dba_def.294.604081931 1 0 10.0 1.0 1 0 0 0.0 SYSAUX +ASM01/a109/datafile/sysaux.299.604081927 1 0 0.0 1.0 159 0 13 0.8 SYSTEM +ASM01/a109/datafile/system.301.604081919 1 0 10.0 1.0 46 0 0 0.0 UNDOTBS1 +ASM01/a109/datafile/undotbs1.300.604081925 1 0 10.0 1.0 1 0 0 0.0 UNDOTBS2 +ASM01/a109/datafile/undotbs2.292.604081931 1 0 10.0 1.0 98 0 0 0.0 USERS +ASM01/a109/datafile/users.303.604081933 1 0 10.0 1.0 1 0 0 0.0 USER_DEF +ASM01/a109/datafile/user_def.291.604081933 1 0 10.0 1.0 1 0 0 0.0 -------------------------------------------------------------

Buffer Pool Statistics

The buffer pool statistics report follows. It provides a summary of the buffer pool 
configuration and usage statistics as seen in this example:

 Free Writ Buffer Number of Pool Buffer Physical Physical Buff Comp Busy P Buffers Hit% Gets Reads Writes Wait Wait Waits --- ---------- ---- -------------- ------------ ----------- ---- ---- ---------- D 64,548 100 72,465 0 355 0 0 13 -------------------------------------------------------------

In this case, we have a database where all the buffer pool requests came out of the buffer 
pool and no physical reads were required. We also see a few (probably very insignificant in 
our case) buffer busy waits.

Instance Recovery Stats

The instance recovery stats report provides information related to instance recovery. By analyzing 
this report, you can determine roughly how long your database would have required to perform 
crash recovery during the reporting period. Here is an example of this report:

-> B: Begin snapshot, E: End snapshot Targt Estd Log File Log Ckpt Log Ckpt MTTR MTTR Recovery Actual Target Size Timeout Interval (s) (s) Estd IOs Redo Blks Redo Blks Redo Blks Redo Blks Redo Blks - ----- ----- ---------- --------- --------- ---------- --------- ------------ B 0 19 196 575 183 92160 183 N/A E 0 19 186 258 96 92160 96 N/A -------------------------------------------------------------

Buffer Pool Advisory

The buffer pool advisory report answers the question, how big should you make your database 
buffer cache. It provides an extrapolation of the benefit or detriment that would result if 
you added or removed memory from the database buffer cache. These estimates are based on 
the current size of the buffer cache and the number of logical and physical IO's encountered 
during the reporting point. This report can be very helpful in "rightsizing" your buffer cache. 
Here is an example of the output of this report:

 Est Phys Size for Size Buffers for Read Estimated P Est (M) Factor Estimate Factor Physical Reads --- -------- ------ ---------------- ------ ------------------ D 48 .1 5,868 4.9 803,496 D 96 .2 11,736 4.0 669,078 D 144 .3 17,604 3.3 550,831 D 192 .4 23,472 2.8 462,645 D 240 .5 29,340 2.3 379,106 D 288 .5 35,208 1.8 305,342 D 336 .6 41,076 1.4 238,729 D 384 .7 46,944 1.2 200,012 D 432 .8 52,812 1.1 183,694 D 480 .9 58,680 1.0 172,961 D 528 1.0 64,548 1.0 165,649 D 576 1.1 70,416 1.0 161,771 D 624 1.2 76,284 1.0 159,728 D 672 1.3 82,152 1.0 158,502 D 720 1.4 88,020 1.0 157,723 D 768 1.5 93,888 0.9 157,124 D 816 1.5 99,756 0.9 156,874 D 864 1.6 105,624 0.9 156,525 D 912 1.7 111,492 0.9 156,393 D 960 1.8 117,360 0.9 155,388 -------------------------------------------------------------

In this example we currently have 528GB allocated to the SGA (represented by the size 
factor column with a value of 1.0. It appears that if we were to reduce the memory allocated 
to the SGA to half of the size of the current SGA (freeing the memory to the OS for other processes) 
we would incur an increase of about 1.8 times the number of physical IO's in the process.

PGA Reports

The PGA reports provide some insight into the health of the PGA. The PGA Aggr Target Stats report
provides information on the configuration of the PGA Aggregate Target parameter during the 
reporting period. 

The PGA Aggregate Target Histogram report provides information on the size of various operations 
(e.g. sorts). It will indicate if PGA sort operations occurred completely in memory, or if some 
of those operations were written out to disk. 

Finally the PGA Memory Advisor, much like the buffer pool advisory report, provides some insight 
into how to properly size your PGA via the PGA_AGGREGATE_TARGET database parameter. The PGA Memory 
Advisor report is shown here:

 Estd Extra Estd PGA Estd PGA PGA Target Size W/A MB W/A MB Read/ Cache Overalloc Est (MB) Factr Processed Written to Disk Hit % Count ---------- ------- ---------------- ---------------- -------- ---------- 44 0.1 289,899.2 7,844.9 97.0 1,124 88 0.3 289,899.2 7,576.9 97.0 1,073 176 0.5 289,899.2 3.3 100.0 0 263 0.8 289,899.2 3.3 100.0 0 351 1.0 289,899.2 3.3 100.0 0 421 1.2 289,899.2 0.0 100.0 0 491 1.4 289,899.2 0.0 100.0 0 562 1.6 289,899.2 0.0 100.0 0 632 1.8 289,899.2 0.0 100.0 0 702 2.0 289,899.2 0.0 100.0 0 1,053 3.0 289,899.2 0.0 100.0 0 1,404 4.0 289,899.2 0.0 100.0 0 2,106 6.0 289,899.2 0.0 100.0 0 2,808 8.0 289,899.2 0.0 100.0 0 -------------------------------------------------------------

Shared Pool Advisory

The shared pool advisory report provides assistance in right sizing the Oracle shared pool. 
Much like the PGA Memory Advisor or the Buffer Pool advisory report, it provides some insight 
into what would happen should you add or remove memory from the shared pool. This can help 
you reclaim much needed memory if you have over allocated the shared pool, and can significantly 
improve performance if you have not allocated enough memory to the shared pool. Here is an 
example of the shared pool advisory report:

 Est LC Est LC Est LC Est LC Shared SP Est LC Time Time Load Load Est LC Pool Size Size Est LC Saved Saved Time Time Mem Size(M) Factr (M) Mem Obj (s) Factr (s) Factr Obj Hits ---------- ----- -------- ------------ ------- ------ ------- ------ ----------- 192 .4 54 3,044 ####### .8 ####### 382.1 22,444,274 240 .5 92 5,495 ####### .9 ####### 223.7 22,502,102 288 .6 139 8,122 ####### .9 53,711 102.5 22,541,782 336 .7 186 12,988 ####### 1.0 17,597 33.6 22,562,084 384 .8 233 17,422 ####### 1.0 7,368 14.1 22,569,402 432 .9 280 23,906 ####### 1.0 3,553 6.8 22,571,902 480 1.0 327 28,605 ####### 1.0 524 1.0 22,573,396 528 1.1 374 35,282 ####### 1.0 1 .0 22,574,164 576 1.2 421 40,835 ####### 1.0 1 .0 22,574,675 624 1.3 468 46,682 ####### 1.0 1 .0 22,575,055 672 1.4 515 52,252 ####### 1.0 1 .0 22,575,256 720 1.5 562 58,181 ####### 1.0 1 .0 22,575,422 768 1.6 609 64,380 ####### 1.0 1 .0 22,575,545 816 1.7 656 69,832 ####### 1.0 1 .0 22,575,620 864 1.8 703 75,168 ####### 1.0 1 .0 22,575,668 912 1.9 750 78,993 ####### 1.0 1 .0 22,575,695 960 2.0 797 82,209 ####### 1.0 1 .0 22,575,719 -------------------------------------------------------------

SGA Target Advisory

The SGA target advisory report is somewhat of a summation of all the advisory reports previously 
presented in the AWR report. It helps you determine the impact of changing the settings of the 
SGA target size in terms of overall database performance. The report uses a value called DB Time 
as a measure of the increase or decrease in performance relative to the memory change made. Also 
the report will summarize an estimate of physical reads associated with the listed setting for 
the SGA. Here is an example of the SGA target advisory report:

SGA Target SGA Size Est DB Est Physical Size (M) Factor Time (s) Reads ---------- ---------- ------------ ---------------- 528 0.5 25,595 769,539 792 0.8 20,053 443,095 1,056 1.0 18,443 165,649 1,320 1.3 18,354 150,476 1,584 1.5 18,345 148,819 1,848 1.8 18,345 148,819 2,112 2.0 18,345 148,819

In this example, our SGA Target size is currently set at 1056MB. We can see from this report that 
if we increased the SGA target size to 2112MB, we would see almost no performance improvement 
(about a 98 second improvement overall). In this case, we may determine that adding so much 
memory to the database is not cost effective, and that the memory can be better used elsewhere.

Memory Advisory

Memory advisory reports for the streams pool and the java pool also appear in the report 
(assuming you are using the streams pool). These reports take on the same general format as 
the other memory advisor reports.

Buffer Wait Statistics

The buffer wait statistics report helps you drill down on specific buffer wait events, and where 
the waits are occurring. In the following report we find that the 13 buffer busy waits we saw in 
the buffer pool statistics report earlier are attributed to data block waits. We might then want 
to pursue tuning remedies to these waits if the waits are significant enough. Here is an example 
of the buffer wait statistics report:

Class Waits Total Wait Time (s) Avg Time (ms) ------------------ ----------- ------------------- -------------- data block 13 0 1

Enqueue Activity

The Enqueue activity report provides information on enqueues (higher level Oracle locking) that occur. 
As with other reports, if you see high levels of wait times in these reports, you might dig further 
into the nature of the enqueue and determine the cause of the delays. Here is an example of this report section:
Enqueue Type (Request Reason)

------------------------------------------------------------------------------ Requests Succ Gets Failed Gets Waits Wt Time (s) Av Wt Time(ms) ------------ ------------ ----------- ----------- ------------ -------------- PS-PX Process Reservation 386 358 28 116 0 .43 US-Undo Segment 276 276 0 228 0 .18 TT-Tablespace 90 90 0 42 0 .71 WF-AWR Flush 12 12 0 7 0 1.43 MW-MWIN Schedule 2 2 0 2 0 5.00 TA-Instance Undo 12 12 0 12 0 .00 UL-User-defined 7 7 0 7 0 .00 CF-Controlfile Transaction 5,737 5,737 0 5 0 .00

Undo Segment Summary

The undo segment summary report provides basic information on the performance of undo tablespaces.

Latch Activity

The latch activity report provides information on Oracle's low level locking mechanism called a 
latch. From this report you can determine if Oracle is suffering from latching problems, and if 
so, which latches are causing the greates amount of contention on the system. Here is a partial 
example of the latch activity report (it is quite long):

 Pct Avg Wait Pct Get Get Slps Time NoWait NoWait Latch Name Requests Miss /Miss (s) Requests Miss ------------------------ -------------- ------ ------ ------ ------------ ------ ASM allocation 122 0.0 N/A 0 0 N/A ASM map headers 60 0.0 N/A 0 0 N/A ASM map load waiting lis 11 0.0 N/A 0 0 N/A ASM map operation freeli 30 0.0 N/A 0 0 N/A ASM map operation hash t 45,056 0.0 N/A 0 0 N/A ASM network background l 1,653 0.0 N/A 0 0 N/A AWR Alerted Metric Eleme 14,330 0.0 N/A 0 0 N/A Consistent RBA 107 0.0 N/A 0 0 N/A FAL request queue 75 0.0 N/A 0 0 N/A FAL subheap alocation 75 0.0 N/A 0 0 N/A FIB s.o chain latch 14 0.0 N/A 0 0 N/A FOB s.o list latch 93 0.0 N/A 0 0 N/A JS broadcast add buf lat 826 0.0 N/A 0 0 N/A JS broadcast drop buf la 826 0.0 N/A 0 0 N/A

In this example our database does not seem to be experiencing any major latch problems, as the 
wait times on the latches are 0, and our get miss pct (Pct Get Miss) is 0 also.

There is also a latch sleep breakdown report which provides some additional detail if a latch is 
being constantly moved into the sleep cycle, which can cause additional performance issues.

The latch miss sources report provides a list of latches that encountered sleep conditions. This 
report can be of further assistance when trying to analyze which latches are causing problems 
with your database.

Segments by Logical Reads and Segments by Physical Reads

The segments by logical reads and segments by physical reads reports provide information on the 
database segments (tables, indexes) that are receiving the largest number of logical or physical 
reads. These reports can help you find objects that are "hot" objects in the database. You may 
want to review the objects and determine why they are hot, and if there are any tuning 
opportunities available on those objects (e.g. partitioning), or on SQL accessing those objects. 
For example, if an object is showing up on the physical reads report, it may be that an index 
is needed on that object. Here is an example of the segments by logical reads report:

Segments by Logical Reads DB/Inst: A109/a1092 Snaps: 2009-2010 -> Total Logical Reads: 72,642 -> Captured Segments account for 96.1% of Total Tablespace Subobject Obj. Logical Owner Name Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- SYS SYSAUX SYS_IOT_TOP_8813 INDEX 52,192 71.85 SYS SYSTEM SMON_SCN_TIME TABLE 4,704 6.48 SYS SYSTEM I_JOB_NEXT INDEX 2,432 3.35 SYS SYSTEM OBJ$ TABLE 1,344 1.85 SYS SYSTEM TAB$ TABLE 1,008 1.39 -------------------------------------------------------------

Additional Reports

Several segment related reports appear providing information on:
•Segments with ITL waits

•Segments with Row lock waits

•Segments with buffer busy waits

•Segments with global cache buffer waits

•Segments with CR Blocks received

•Segments with current blocks received

These reports help provide more detailed information on specific segments that might be 
experiencing performance problems.

The dictionary cache and library cache statistics reports provide performance information 
on the various areas in the data dictionary cache and the library cache.

The process memory summary, SGA memory summary, and the SGA breakdown difference reports 
provide summary information on how memory allocated to the database is allocated amongst 
the various components. Other memory summary reports may occur if you have certain optional 
components installed (such as streams).

The database parameter summary report provides a summary of the setting of all the database 
parameters during the snapshot report. If the database parameters changed during the period 
of the report, then the old and new parameters will appear on the report.

Related: -

1) How to Tune Shared Pool Cache ?

Reference: ToadWorld

2 comments:

  1. really good and informative

    ReplyDelete

  2. This is best article in AWR I have ever read… Thanks for sharing.

    ReplyDelete

 

 

 

Subscribe to: Post Comments (Atom)

Schedule AWR Report Generation

set echo off set head off set feed off

spool /orasys/home/oracle/Kishi/Logs/Awr_report_daily.lst; select ‘Define begin_snap= ‘|| snap_id from dba_hist_snapshot where EXTRACT(HOUR FROM BEGIN_INTERVAL_TIME)=9 and EXTRACT(year FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,’YYYY’) and EXTRACT(month FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,’MM’) and EXTRACT(day FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,’DD’);

SELECT ‘Define end_snap= ‘|| snap_id from dba_hist_snapshot where EXTRACT(HOUR FROM BEGIN_INTERVAL_TIME)=18 and EXTRACT(year FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,’YYYY’) and EXTRACT(month FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,’MM’) and EXTRACT(day FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,’DD’);

select ‘define  report_type  = ‘ || ”’html”’ from dual; select ‘define  report_name  = /orasys/home/oracle/Kishi/Logs/AWR_REPORT_Daily.html’ from dual; select ‘define  inst_name    = ‘ || INSTANCE_NAME from v$instance; select ‘define  db_name    = ‘ || name from v$database; select ‘define  dbid = ‘ || dbid from v$database; select ‘define inst_num = ‘ || INSTANCE_NUMBER from v$instance; select ‘define  num_days     = 3’ from dual; select ‘@$ORACLE_HOME/rdbms/admin/awrrpti.sql’ from dual; spool off;

Database_Space_Scripts

–Free/used at tablespace level

rem ———————————————————————–
rem Filename: tsspace.sql
rem Purpose: Show Used/free space in Meg by tablespace name
rem Author: Kishore
rem ———————————————————————–

tti “Space Usage for Database in Meg”

SELECT Total.name “Tablespace Name”,
nvl(Free_space, 0) Free_space,
nvl(total_space-Free_space, 0) Used_space,
total_space
FROM
(select tablespace_name, sum(bytes/1024/1024) Free_Space
from sys.dba_free_space
group by tablespace_name
) Free,
(select b.name, sum(bytes/1024/1024) TOTAL_SPACE
from sys.v_$datafile a, sys.v_$tablespace B
where a.ts# = b.ts#
group by b.name
) Total
WHERE Free.Tablespace_name(+) = Total.name
ORDER BY Total.name
/

tti off

–Free/used space at datafile level

REM ——————————————————————————
REM Filename: dffree.sql
REM Purpose: Shows current size, used & freespace within the datafiles.
REM Author: Kishore
REM ——————————————————————————

TTI “Allocated, Used & Free space within datafiles”

COLUMN free_space_mb format 999999.90
COLUMN allocated_mb format 999999.90
COLUMN used_mb format 999999.90

SELECT SUBSTR (df.NAME, 1, 40) file_name, df.bytes / 1024 / 1024 allocated_mb,
((df.bytes / 1024 / 1024) – NVL (SUM (dfs.bytes) / 1024 / 1024, 0))
used_mb,
NVL (SUM (dfs.bytes) / 1024 / 1024, 0) free_space_mb
FROM v$datafile df, dba_free_space dfs
WHERE df.file# = dfs.file_id(+)
GROUP BY dfs.file_id, df.NAME, df.file#, df.bytes
ORDER BY file_name;

TTI off

–Used space at segment level

rem ———————————————————————–
rem Filename: usedsp.sql
rem Purpose: Used space in Meg by segment type
rem Author: Frank Naude, Oracle FAQ
rem ———————————————————————–

set pagesize 50000
set line 80

col “Total Used Meg” format 999,999,990
col “Data part” format 999,999,990
col “Index part” format 999,999,990
col “LOB part” format 999,999,990
col “RBS part” format 999,999,990

tti ‘Used space in Meg by segment type’

select sum(bytes)/1024/1024 “Total Used”,
sum( decode( substr(segment_type,1,5), ‘TABLE’, bytes/1024/1024, 0))
“Data part”,
sum( decode( substr(segment_type,1,5), ‘INDEX’, bytes/1024/1024, 0))
“Index part”,
sum( decode( substr(segment_type,1,3), ‘LOB’, bytes/1024/1024, 0))
“LOB part”,
sum( decode(segment_type, ‘ROLLBACK’, bytes/1024/1024, 0))
“RBS part”,
sum( decode(segment_type, ‘TEMPORARY’, bytes/1024/1024, 0))
“TEMP part”
from sys.dba_segments
/
tti off

tti “Total database size”

select sum(bytes)/1024/1024 “Total DB size in Meg”
from sys.v_$datafile
/
tti off

–Summary of database space history over a period of time

rem ———————————————————————–
rem Filename: spacehist.sql
rem Purpose: Save summary of database space history over time
rem Notes: Set JOB_QUEUE_PROCESSES to a value > 0 or schedule from
rem an external scheduler (corn, at…)
rem Date: 15-May-2002
rem Author: Frank Naude, Oracle FAQ
rem ———————————————————————–

—————————————————————————
— Create history table…
—————————————————————————
drop table db_space_hist;
create table db_space_hist (
timestamp date,
total_space number(8),
used_space number(8),
free_space number(8),
pct_inuse number(5,2),
num_db_files number(5)
);

—————————————————————————
— Stored proc to populate table…
—————————————————————————
create or replace procedure db_space_hist_proc as
begin
— Delete old records…
delete from db_space_hist where timestamp > SYSDATE + 364;
— Insert current utilization values…
insert into db_space_hist
select sysdate, total_space,
total_space-nvl(free_space,0) used_space,
nvl(free_space,0) free_space,
((total_space – nvl(free_space,0)) / total_space)*100 pct_inuse,
num_db_files
from ( select sum(bytes)/1024/1024 free_space
from sys.DBA_FREE_SPACE ) FREE,
( select sum(bytes)/1024/1024 total_space,
count(*) num_db_files
from sys.DBA_DATA_FILES) FULL;
commit;
end;
/
show errors

—————————————————————————
— Schedule the job using the DB Job System. This section can be removed if
— the job is sceduled via an external scheduler.
—————————————————————————
declare
v_job number;
begin
select job into v_job from user_jobs where what like ‘db_space_hist_proc%’;
dbms_job.remove(v_job);
dbms_job.submit(v_job, ‘db_space_hist_proc;’, sysdate,
‘sysdate+7’); — Run every 7 days
dbms_job.run(v_job);
dbms_output.put_line(‘Job ‘||v_job||’ re-submitted.’);
exception
when NO_DATA_FOUND then
dbms_job.submit(v_job, ‘db_space_hist_proc;’, sysdate,
‘sysdate+7’); — Run every 7 days
dbms_job.run(v_job);
dbms_output.put_line(‘Job ‘||v_job||’ submitted.’);
end;
/

—————————————————————————
— Generate a space history report…
—————————————————————————
select to_char(timestamp, ‘DD Mon RRRR HH24:MI’) “Timestamp”,
total_space “DBSize (Meg)”,
used_space “Free (Meg)”,
free_space “Used (Meg)”,
pct_inuse “% Used”,
num_db_files “Num DB Files”
from db_space_hist
order by timestamp;

–List segments that can not extend

rem ———————————————————————–
rem Filename: exterror.sql
rem Purpose: Segments that will cause errors when they try to extent!!!
rem Author: Frank Naude, Oracle FAQ
rem ———————————————————————–

prompt Objects that cannot extend (no space in TS)

column Sname form a40 heading ‘Object Name’
column Stype form a15 heading ‘Type’
column Size form 9,999 heading ‘Size’
column Next form 99,999 heading ‘Next’
column Tname form a15 heading ‘TsName’

select a.owner||’.’||a.segment_name “Sname”,
a.segment_type “Stype”,
a.bytes/1024/1024 “Size”,
a.next_extent/1024/1024 “Next”,
a.tablespace_name “TName”
from sys.dba_segments a
where a.tablespace_name not like ‘T%MP%’ — Exclude TEMP tablespaces
and next_extent * 1 > ( — Cannot extend 1x, can change to 2x…
select max(b.bytes)
from dba_free_space b
where a.tablespace_name = b.tablespace_name)
order by 3 desc
/

— THIS QUERY GIVES THE SAME RESULTS, BUT IS WAY TOO SLOW

— select a.owner, a.segment_name, b.tablespace_name,
— decode(ext.extents,1,b.next_extent,
— a.bytes*(1+b.pct_increase/100)) nextext,
— freesp.largest
— from dba_extents a,
— dba_segments b,
— (select owner, segment_name, max(extent_id) extent_id,
— count(*) extents
— from dba_extents
— group by owner, segment_name
— ) ext,
— (select tablespace_name, max(bytes) largest
— from dba_free_space
— group by tablespace_name
— ) freesp
— where a.owner=b.owner
— and a.segment_name=b.segment_name
— and a.owner=ext.owner
— and a.segment_name=ext.segment_name
— and a.extent_id=ext.extent_id
— and b.tablespace_name = freesp.tablespace_name
— and decode(ext.extents,1,b.next_extent, a.bytes*(1+b.pct_increase/100)) > freesp.largest
— /

–Some segment sizing recommendations

rem ———————————————————————–
rem Filename: sizing.sql
rem Purpose: Give some segment sizing recommendations
rem Date: 04-Jul-1999
rem Author: Frank Naude, Oracle FAQ
rem ———————————————————————–

prompt Database block size:

select to_number(value) “Block size in bytes”
from sys.v_$parameter
where name = ‘db_block_size’
/

prompt Max number of possible extents (if not set to UNLIMITED)
prompt is db_block_size/16-7

select to_number(value)/16-7 “MaxExtents”
from sys.v_$parameter
where name = ‘db_block_size’
/

prompt The recommended min extent size is a multiple of
prompt db_block_size * db_file_multiblock_read_count. This gives
prompt the chunks Oracle ask from the OS when doing read-ahead
prompt with full table scans.

select to_number(a.value) * to_number(b.value) / 1024 “Min extent size in K”
from sys.v_$parameter a, sys.v_$parameter b
where a.name = ‘db_block_size’
and b.name = ‘db_file_multiblock_read_count’
/

–Show database growth in Meg per month for the last year

rem ———————————————————————–
rem Filename: dbgrowth.sql
rem Purpose: Show database growth in Meg per month for the last year
rem DB Version: 8.0 or above
rem Note: File extending is not factored in as it’s not the available
rem in the dictionary.
rem Date: 19-Mar-2000
rem Author: Frank Naude, Oracle FAQ
rem ———————————————————————–

set pagesize 50000
tti “Database growth per month for last year”

select to_char(creation_time, ‘RRRR Month’) “Month”,
sum(bytes)/1024/1024 “Growth in Meg”
from sys.v_$datafile
where creation_time > SYSDATE-365
group by to_char(creation_time, ‘RRRR Month’)
/

tti off

–tables whose high water mark no# of used blocks

REM ########################################################################
REM ## Author : Sunil Kumar
REM ## Senior DBA
REM ## Email : sunilagarwal@hotmail.com
REM ## sunil.kumar@thehartford.com
REM ##
REM ## Script to determine highwater mark of tables. It is essential
REM ## to run ANALYZE_SCHEMA utility or ANALYZE TABLE commands before
REM ## running this script for accurate statistics.
REM ##
REM ## It displays all the tables whose high water mark no# of used
REM ## blocks.
REM ##
REM ## How To Run:
REM ## SQL>@highwtr
REM ## It will ask for the owner (press enter for all owners) and
REM ## the table name (press enter for all tables).
REM ##
REM ########################################################################

set verify off
column owner format a10
column alcblks heading ‘Allocated|Blocks’ just c
column usdblks heading ‘Used|Blocks’ just c
column hgwtr heading ‘High|Water’ just c
break on owner skip page

select
a.owner,
a.table_name,
b.blocks alcblks,
a.blocks usdblks,
(b.blocks-a.empty_blocks-1) hgwtr
from
dba_tables a,
dba_segments b
where
a.table_name=b.segment_name
and a.owner=b.owner
and a.owner not in(‘SYS’,’SYSTEM’)
and a.blocks (b.blocks-a.empty_blocks-1)
and a.owner like upper(‘&owner’)||’%’
and a.table_name like upper(‘&table_name’)||’%’
order by 1,2
/

set verify on
clear columns
clear breaks

–Check a TS is fragmentd and type of fragmentation

REM ——————————————————————————
REM Filename: tsfrag.sql
REM Purpose: Shows whether the TBS is fragmented and the type of fragmentation.
REM Author: Kam Muhamad Sirajdin [mdsirajdin@yahoo.com]
REM ——————————————————————————

TTI “Tablespace Fragmentation Details”

SELECT dfsc.tablespace_name tablespace_name,
DECODE (
dfsc.percent_extents_coalesced,
100,
(DECODE (
GREATEST ((SELECT COUNT (1)
FROM dba_free_space dfs
WHERE dfs.tablespace_name = dfsc.tablespace_name), 1),
1,
‘No Frag’,
‘Bubble Frag’
)
),
‘Possible Honey Comb Frag’
)
fragmentation_status
FROM dba_free_space_coalesced dfsc
ORDER BY dfsc.tablespace_name;

TTI off

–Index fragmentation status for a schema

/* ************************************************************* */
/* Index Fragmentation Status (idsfrag.sql): */
/* */
/* This script will report the index fragmentation status */
/* for a schema. */
/* */
/* Note: – Do not run this scrip during peak processing hours!!! */
/* – This script will fail for locked tables. */
/* */
/* ************************************************************* */

prompt — Drop and create temporary table to hold stats…
drop table my_index_stats
/
create table my_index_stats (
index_name varchar2(30),
height number(8),
del_lf_rows number(8),
distinct_keys number(8),
rows_per_key number(10,2),
blks_gets_per_access number(10,2)
)
/

prompt — Save script which we will later use to populate the above table…
insert into my_index_stats
select NAME, HEIGHT, DEL_LF_ROWS, DISTINCT_KEYS, ROWS_PER_KEY,
BLKS_GETS_PER_ACCESS
from INDEX_STATS
— Note this open line…

save /tmp/save_index_stats.sql replace

prompt
prompt — Spool listing with validate commands…
col line1 newline
col line2 newline
col line3 newline
set pagesize 0
set echo off
set termout off
set trimspool on
set feed off
set linesize 200
spool /tmp/validate_indexes.sql
select ‘prompt Process table ‘||owner||’.’||table_name||
‘, index ‘||index_name||’…’ line1,
‘validate index ‘||owner||’.’||index_name||’;’ line2,
‘@/tmp/save_index_stats.sql’ line3
from sys.dba_indexes where owner = ‘SCOTT’
order by table_name, index_name
/
spool off
set termout on
set feed on

prompt
prompt — Run script to validate indexes…
@/tmp/validate_indexes.sql

prompt — Print nice report…
set pagesize 50000
set trimspool on
col height format 99999
col del_rows format 9999999
col rows/key format 999999.9
spool idxfrag.lst
select INDEX_NAME, HEIGHT, DEL_LF_ROWS “DEL_ROWS”, DISTINCT_KEYS “DIST KEYS”,
ROWS_PER_KEY “ROWS/KEY”,
BLKS_GETS_PER_ACCESS “BLKS/ACCESS”
from MY_INDEX_STATS
/
spool off

— Cleanup
drop table my_index_stats
/
! rm /tmp/validate_indexes.sql
! rm /tmp/save_index_stats.sql

prompt
prompt Report is in idxfrag.lst
prompt Done!!!

Hidden Parameters

Hidden Parameters in Oracle. How to Change Hidden Parameter

The hidden parameters start with an “_”.They can not be viewed from the output of show parameter
or querying v$parameter unless and untill they are set explicitly in init.ora.
However if you want to view all the hidden parameters and their default values the following query
could be of help,

SELECT
a.ksppinm “Parameter”, b.ksppstvl “Session Value”, c.ksppstvl “Instance Value”
FROM x$ksppi a, x$ksppcv b, x$ksppsv c
WHERE
a.indx = b.indx AND a.indx = c.indx AND a.ksppinm LIKE ‘/_%’ escape ‘/’
/

In order to see the listing of all hidden parameters query,

select *
from SYS.X$KSPPI
where substr(KSPPINM,1,1) = ‘_’;

Change Hidden Parameters in Oracle
—————————————
It is never recommended to modify these hidden parameters without the assistance of Oracle Support.Changing these parameters may lead to high performance degradation and other problems in the database.
In order to change hidden parameter,
1)If you use pfile then in your initSID.ora you can entry of the hidden parameter and start the database.

2)If you want to use for the current session you can use ALTER SESSION SET ….

3)To set it permanently if you use spfile then use, ALTER SYSTEM SET …… SCOPE=SPFILE. Since hidden parameter starts with underscore(_) to access it you have to specify within double quotes. If you use SCOPE=SPFILE then in order to take effect you need to restart database. You can use SCOPE=BOTH if parameter can be set in the session also.

For hidden parameter or the values that start with underscore(_) must be specified within double quotes(“) as below.

SQL> alter system set “_offline_rollback_segments”=”_SYSSMU1$” SCOPE=spfile;

System altered.

General_Health_Check

Oracle Database Health check scripts
Hi All,
Lot of time DBAs are asked to check the health of the Database,Health of the Database can be check in various ways.It includes:

SL No Monitoring Scope Current Status OS Level
1 Physical memory / Load :Load normal, Load averages: 0.35, 0.37, 0.36
2 OS Space threshold ( archive, ora_dump etc.. ) :Sufficient Space available.
3 Top 10 process consuming memory:No process using exceptional high memory
4 Free volumes available :Sufficient disk space is available on the mount points
5 Filesystem space Under normal threshold
Database level.
6 Check extents / Pro active Space addition:Space is being regularly added.
7 Check alert log for ORA- and warn messages.
8 Major wait events (latch/enqueue/Lib cache pin) No major wait events
9 Max Sessions
10 Long running Jobs 6 inactive sessions running for more than 8 hrs
11 Invalid objects 185
12 Analyze Jobs ( once in a week ) Done on 20-JAN-2008 Time 06:00:06
13 Temp usage / Rollback segment usage Normal
14 Nologging Indexes
15 Hotbackup/Coldbackup Gone fine
16 Redo generation normal
17 PQ proceses Normal
18 I/O Generation Under normal threshold
19 2 PC Pending transactions 0
DR / backup
1 Sync arch Normal
2 Purge arch Normal
3 Recovery status Normal
20)DATABASE HEALTH CHECK SCRIPT: Showing locks and Archive generation details

In Detail DATABASE Health check:
OPERATING SYSTEM:

1)Physical memory/ Load:
1) Free:free command displays amount of total, free and used physical memory (RAM) in the system as well as showing information on shared memory, buffers, cached memory and swap space used by the Linux kernel.
Usage:
$ free -m

2) vmstat:vmstat reports report virtual memory statistics, which has information about processes, swap, free, buffer and cache memory, paging space, disk IO activity, traps, interrupts, context switches and CPU activity
Usage:
$vmstat 5

3) top:top command displays dynamic real-time view of the running tasks managed by kernel and in Linux system. The memory usage stats by top command include real-time live total, used and free physical memory and swap memory with their buffers and cached memory size respectively
Usage:
$top
4) ps :ps command reports a snapshot on information of the current active processes. ps will show the percentage of memory resource that is used by each process or task running in the system. With this command, top memory hogging processes can be identified.
Usage:
$ps aux

2) OS Space threshold ( archive, ora_dump etc.. ):
Checking the OS space is available in all filesystems,specially the location which is having archive logs ,oracle Database files.We can use the below OS commands:
$df –h
$du –csh *
3) Top 10 process consuming memory:
We can Displaying top 10 memory consuming processes as follows:

ps aux|head -1;ps aux|sort -m

We can use the top command, and press M which orders the process list by memory usage.

4) Free volumes available:

We have to make sure Sufficient disk space is available on the mount points on each OS servers where the Database is up and running.

$df –h

5)Filesystem space:

Under normal threshold.Check the filesystem in the OS side whether the sufficient space is available at all mount points.

DATABASE :

6)Check extents / Pro active Space addition:
Check each of the Data,Index and temporary tablespaces for extend and blocks
Allocation details.

SET LINES 1000
SELECT SEGMENT_NAME,TABLESPACE_NAME,EXTENTS,BLOCKS
FROM DBA_SEGMENTS;

SELECT SEGMENT_NAME,TABLESPACE_NAME,EXTENTS,BLOCKS
FROM DBA_SEGMENTS WHERE TABLESPACE_NAME=’STAR01D’;

7) Check alert log for ORA- and warn messages:

Checking the alert log file regulary is a vital task we have to do.In the alert log files we have to looks for the following things:

1) Look for any of the oracle related errors.
Open the alert log file with less or more command and search for ORA-
This will give you the error details and time of occurrence.

2) Look for the Database level or Tablespace level changes
Monitor the alert log file and search the file for each Day activities happening
In the Database either whether it is bouncing of Database.Increase in the size of the tablespaces,Increase in the size of the Database parameters.In the 11g Database we can look for TNS errors in the alert log file.

8) Major wait events (latch/enqueue/Lib cache pin):

We can check the wait events details with the help of below queries:

SELECT s.saddr, s.SID, s.serial#, s.audsid, s.paddr, s.user#, s.username,
s.command, s.ownerid, s.taddr, s.lockwait, s.status, s.server,
s.schema#, s.schemaname, s.osuser, s.process, s.machine, s.terminal,
UPPER (s.program) program, s.TYPE, s.sql_address, s.sql_hash_value,
s.sql_id, s.sql_child_number, s.sql_exec_start, s.sql_exec_id,
s.prev_sql_addr, s.prev_hash_value, s.prev_sql_id,
s.prev_child_number, s.prev_exec_start, s.prev_exec_id,
s.plsql_entry_object_id, s.plsql_entry_subprogram_id,
s.plsql_object_id, s.plsql_subprogram_id, s.module, s.module_hash,
s.action, s.action_hash, s.client_info, s.fixed_table_sequence,
s.row_wait_obj#, s.row_wait_file#, s.row_wait_block#,
s.row_wait_row#, s.logon_time, s.last_call_et, s.pdml_enabled,
s.failover_type, s.failover_method, s.failed_over,
s.resource_consumer_group, s.pdml_status, s.pddl_status, s.pq_status,
s.current_queue_duration, s.client_identifier,
s.blocking_session_status, s.blocking_instance, s.blocking_session,
s.seq#, s.event#, s.event, s.p1text, s.p1, s.p1raw, s.p2text, s.p2,
s.p2raw, s.p3text, s.p3, s.p3raw, s.wait_class_id, s.wait_class#,
s.wait_class, s.wait_time, s.seconds_in_wait, s.state,
s.wait_time_micro, s.time_remaining_micro,
s.time_since_last_wait_micro, s.service_name, s.sql_trace,
s.sql_trace_waits, s.sql_trace_binds, s.sql_trace_plan_stats,
s.session_edition_id, s.creator_addr, s.creator_serial#
FROM v$session s
WHERE ( (s.username IS NOT NULL)
AND (NVL (s.osuser, ‘x’) ‘SYSTEM’)
AND (s.TYPE ‘BACKGROUND’) AND STATUS=’ACTIVE’
)
ORDER BY “PROGRAM”;

The following query provides clues about whether Oracle has been waiting for library cache activities:

Select sid, event, p1raw, seconds_in_wait, wait_time
From v$session_wait
Where event = ‘library cache pin’
And state = ‘WAITING’;

The below Query gives details of Users sessions wait time and state:

SELECT NVL (s.username, ‘(oracle)’) AS username, s.SID, s.serial#, sw.event,
sw.wait_time, sw.seconds_in_wait, sw.state
FROM v$session_wait sw, v$session s
WHERE s.SID = sw.SID
ORDER BY sw.seconds_in_wait DESC;

9) Max Sessions:
There should not be more than 6 inactive sessions running for more than 8 hours in a Database in order to minimize the consumption of CPU and I/O resources.

a)Users and Sessions CPU consumption can be obtained by below query:

Set lines 1000
select ss.username, se.SID,VALUE/100 cpu_usage_seconds
from v$session ss, v$sesstat se, v$statname sn
where se.STATISTIC# = sn.STATISTIC#
and NAME like ‘%CPU used by this session%’
and se.SID = ss.SID and ss.status=’ACTIVE’
and ss.username is not null
order by VALUE desc;

b) Users and Sessions CPU and I/O consumption can be obtained by below query:

— shows Day wise,User wise,Process id of server wise- CPU and I/O consumption
set linesize 140
col spid for a6
col program for a35 trunc
select p.spid SPID,to_char(s.LOGON_TIME,’DDMonYY HH24:MI’) date_login,s.username,decode(nvl(p.background,0),1,bg.description, s.program ) program,
ss.value/100 CPU,physical_reads disk_io,(trunc(sysdate,’J’)-trunc(logon_time,’J’)) days,
round((ss.value/100)/(decode((trunc(sysdate,’J’)-trunc(logon_time,’J’)),0,1,(trunc(sysdate,’J’)-trunc(logon_time,’J’)))),2) cpu_per_day
from V$PROCESS p,V$SESSION s,V$SESSTAT ss,V$SESS_IO si,V$BGPROCESS bg
where s.paddr=p.addr and ss.sid=s.sid
and ss.statistic#=12 and si.sid=s.sid
and bg.paddr(+)=p.addr
and round((ss.value/100),0) > 10
order by 8;

10) Long running Jobs:

We can find out long running jobs with the help of the below query:

col username for a20
col message for a50
col remaining for 9999
select username,to_char(start_time, ‘hh24:mi:ss dd/mm/yy’) started,
time_remaining remaining, message
from v$session_longops
where time_remaining = 0
order by time_remaining desc;

11) Invalid objects:

We can check the invalid objects with the help of the below query:

select owner||’ ‘||object_name||’ ‘||created||’ ‘||status from dba_objects where status=’INVALID’;

12) Analyze Jobs ( once in a week ):

We need to analyze the jobs that are running once in a week as a golden rule.
The below steps can be considered for analyzing jobs.

Analyzing a Running Job
The status of a job or a task changes several times during its life cycle. A job can have the following as its status:
Scheduled: The job is created and will run at the specified time.
Running: The job is being executed and is in progress.
Initialization Error: The job or step could not be run successfully. If a step in a job fails initialization, the job status is Initialization Error.
Failed: The job was executed but failed.
Succeeded: The job was executed completely.
Stopped: The user canceled the job.
Stop Pending: The user has stopped the job. The already running steps are completing execution.
Suspended: This indicates that the execution of the job is deferred.
Inactive: This status indicates that the target has been deleted.
Reassigned: The owner of the job has changed.
Skipped: The job was not executed at the specified time and has been omitted.
The running jobs can be found out by the help of below query:

select sid, job,instance from dba_jobs_running;

We can find out the failed jobs and Broken jobs details with the help of the Below query:

select job||’ ‘||schema_user||’ ‘||Broken||’ ‘||failures||’ ‘||what||’ ‘||last_date||’ ‘||last_sec from dba_jobs;

13) Temp usage / Rollback segment/PGA usage:

We can get information of temporary tablespace usage details with the help of below query:
Set lines 1000
SELECT b.tablespace,
ROUND(((b.blocks*p.value)/1024/1024),2)||’M’ “SIZE”,
a.sid||’,’||a.serial# SID_SERIAL,
a.username,
a.program
FROM sys.v_$session a,
sys.v_$sort_usage b,
sys.v_$parameter p
WHERE p.name = ‘db_block_size’
AND a.saddr = b.session_addr
ORDER BY b.tablespace, b.blocks;

We can get information of Undo tablespace usage details with the help of the below query:
set lines 1000
SELECT TO_CHAR(s.sid)||’,’||TO_CHAR(s.serial#) sid_serial,
NVL(s.username, ‘None’) orauser,
s.program,
r.name undoseg,
t.used_ublk * TO_NUMBER(x.value)/1024||’K’ “Undo”
FROM sys.v_$rollname r,
sys.v_$session s,
sys.v_$transaction t,
sys.v_$parameter x
WHERE s.taddr = t.addr
AND r.usn = t.xidusn(+)
AND x.name = ‘db_block_size’;

We can get the PGA usage details with the help of the below query:
select st.sid “SID”, sn.name “TYPE”,
ceil(st.value / 1024 / 1024/1024) “GB”
from v$sesstat st, v$statname sn
where st.statistic# = sn.statistic#
and sid in
(select sid from v$session where username like UPPER(‘&user’))
and upper(sn.name) like ‘%PGA%’
order by st.sid, st.value desc;
Enter value for user: STARTXNAPP
14)Validating the Backup:

We have to verify the Hotbackup/Coldbackup(or any physical or logical backup) of all the Production and non-production Databases went fine.Make sure you are having a valid backups of all the Databases.Check the Backup locations to make sure the Backup completed on time with the required Backup data.

14)Hotbackup/Coldbackup:
Validating the backup of Database.It should complete on time with the required data for restoring and recovery purpose if required.

15) Redo generation/Archive logs generation details:
We should make sure there should not be frequent log switch happening in a Database.If there are frequent log switches than archive logs might generate more which may decrease the performance of the Database however in a production Database log switches could vary depending upon the Server configuration between 5 to 20.

We can the log switch details with the help of the below query:

Redolog switch Datewise and hourwise:
——————————-
set lines 120;
set pages 999;
select to_char(first_time,’DD-MON-RR’) “Date”,
to_char(sum(decode(to_char(first_time,’HH24′),’00’,1,0)),’99’) ” 00″,
to_char(sum(decode(to_char(first_time,’HH24′),’01’,1,0)),’99’) ” 01″,
to_char(sum(decode(to_char(first_time,’HH24′),’02’,1,0)),’99’) ” 02″,
to_char(sum(decode(to_char(first_time,’HH24′),’03’,1,0)),’99’) ” 03″,
to_char(sum(decode(to_char(first_time,’HH24′),’04’,1,0)),’99’) ” 04″,
to_char(sum(decode(to_char(first_time,’HH24′),’05’,1,0)),’99’) ” 05″,
to_char(sum(decode(to_char(first_time,’HH24′),’06’,1,0)),’99’) ” 06″,
to_char(sum(decode(to_char(first_time,’HH24′),’07’,1,0)),’99’) ” 07″,
to_char(sum(decode(to_char(first_time,’HH24′),’08’,1,0)),’99’) ” 08″,
to_char(sum(decode(to_char(first_time,’HH24′),’09’,1,0)),’99’) ” 09″,
to_char(sum(decode(to_char(first_time,’HH24′),’10’,1,0)),’99’) ” 10″,
to_char(sum(decode(to_char(first_time,’HH24′),’11’,1,0)),’99’) ” 11″,
to_char(sum(decode(to_char(first_time,’HH24′),’12’,1,0)),’99’) ” 12″,
to_char(sum(decode(to_char(first_time,’HH24′),’13’,1,0)),’99’) ” 13″,
to_char(sum(decode(to_char(first_time,’HH24′),’14’,1,0)),’99’) ” 14″,
to_char(sum(decode(to_char(first_time,’HH24′),’15’,1,0)),’99’) ” 15″,
to_char(sum(decode(to_char(first_time,’HH24′),’16’,1,0)),’99’) ” 16″,
to_char(sum(decode(to_char(first_time,’HH24′),’17’,1,0)),’99’) ” 17″,
to_char(sum(decode(to_char(first_time,’HH24′),’18’,1,0)),’99’) ” 18″,
to_char(sum(decode(to_char(first_time,’HH24′),’19’,1,0)),’99’) ” 19″,
to_char(sum(decode(to_char(first_time,’HH24′),’20’,1,0)),’99’) ” 20″,
to_char(sum(decode(to_char(first_time,’HH24′),’21’,1,0)),’99’) ” 21″,
to_char(sum(decode(to_char(first_time,’HH24′),’22’,1,0)),’99’) ” 22″,
to_char(sum(decode(to_char(first_time,’HH24′),’23’,1,0)),’99’) ” 23″
from v$log_history
group by to_char(first_time,’DD-MON-RR’)
order by 1
/
Archive logs generations is directly proportional to the number of log switches happening in a Database. If there are frequent log switches than archive logs might generate more which can affect the performance of Database.

We can use the below queries for archive logs generation details:

a)Archive logs by dates:
set lines 1000
select to_char(first_time,’DD-MON-RR’) “Date”,
to_char(sum(decode(to_char(first_time,’HH24′),’00’,1,0)),’99’) ” 00″,
to_char(sum(decode(to_char(first_time,’HH24′),’01’,1,0)),’99’) ” 01″,
to_char(sum(decode(to_char(first_time,’HH24′),’02’,1,0)),’99’) ” 02″,
to_char(sum(decode(to_char(first_time,’HH24′),’03’,1,0)),’99’) ” 03″,
to_char(sum(decode(to_char(first_time,’HH24′),’04’,1,0)),’99’) ” 04″,
to_char(sum(decode(to_char(first_time,’HH24′),’05’,1,0)),’99’) ” 05″,
to_char(sum(decode(to_char(first_time,’HH24′),’06’,1,0)),’99’) ” 06″,
to_char(sum(decode(to_char(first_time,’HH24′),’07’,1,0)),’99’) ” 07″,
to_char(sum(decode(to_char(first_time,’HH24′),’08’,1,0)),’99’) ” 08″,
to_char(sum(decode(to_char(first_time,’HH24′),’09’,1,0)),’99’) ” 09″,
to_char(sum(decode(to_char(first_time,’HH24′),’10’,1,0)),’99’) ” 10″,
to_char(sum(decode(to_char(first_time,’HH24′),’11’,1,0)),’99’) ” 11″,
to_char(sum(decode(to_char(first_time,’HH24′),’12’,1,0)),’99’) ” 12″,
to_char(sum(decode(to_char(first_time,’HH24′),’13’,1,0)),’99’) ” 13″,
to_char(sum(decode(to_char(first_time,’HH24′),’14’,1,0)),’99’) ” 14″,
to_char(sum(decode(to_char(first_time,’HH24′),’15’,1,0)),’99’) ” 15″,
to_char(sum(decode(to_char(first_time,’HH24′),’16’,1,0)),’99’) ” 16″,
to_char(sum(decode(to_char(first_time,’HH24′),’17’,1,0)),’99’) ” 17″,
to_char(sum(decode(to_char(first_time,’HH24′),’18’,1,0)),’99’) ” 18″,
to_char(sum(decode(to_char(first_time,’HH24′),’19’,1,0)),’99’) ” 19″,
to_char(sum(decode(to_char(first_time,’HH24′),’20’,1,0)),’99’) ” 20″,
to_char(sum(decode(to_char(first_time,’HH24′),’21’,1,0)),’99’) ” 21″,
to_char(sum(decode(to_char(first_time,’HH24′),’22’,1,0)),’99’) ” 22″,
to_char(sum(decode(to_char(first_time,’HH24′),’23’,1,0)),’99’) ” 23″
from v$log_history
group by to_char(first_time,’DD-MON-RR’)
order by 1
/
b)Archive log generation details Day-wise :

select to_char(COMPLETION_TIME,’DD-MON-YYYY’),count(*)
from v$archived_log group by to_char(COMPLETION_TIME,’DD-MON-YYYY’)
order by to_char(COMPLETION_TIME,’DD-MON-YYYY’);

c) Archive log count of the day:

select count(*)
from v$archived_log
where trunc(completion_time)=trunc(sysdate);

count of archived logs generated today on hourly basis:
——————————————————-
select to_char(first_time,’DD-MON-RR’) “Date”,
to_char(sum(decode(to_char(first_time,’HH24′),’00’,1,0)),’99’) ” 00″,
to_char(sum(decode(to_char(first_time,’HH24′),’01’,1,0)),’99’) ” 01″,
to_char(sum(decode(to_char(first_time,’HH24′),’02’,1,0)),’99’) ” 02″,
to_char(sum(decode(to_char(first_time,’HH24′),’03’,1,0)),’99’) ” 03″,
to_char(sum(decode(to_char(first_time,’HH24′),’04’,1,0)),’99’) ” 04″,
to_char(sum(decode(to_char(first_time,’HH24′),’05’,1,0)),’99’) ” 05″,
to_char(sum(decode(to_char(first_time,’HH24′),’06’,1,0)),’99’) ” 06″,
to_char(sum(decode(to_char(first_time,’HH24′),’07’,1,0)),’99’) ” 07″,
to_char(sum(decode(to_char(first_time,’HH24′),’08’,1,0)),’99’) ” 08″,
to_char(sum(decode(to_char(first_time,’HH24′),’09’,1,0)),’99’) ” 09″,
to_char(sum(decode(to_char(first_time,’HH24′),’10’,1,0)),’99’) ” 10″,
to_char(sum(decode(to_char(first_time,’HH24′),’11’,1,0)),’99’) ” 11″,
to_char(sum(decode(to_char(first_time,’HH24′),’12’,1,0)),’99’) ” 12″,
to_char(sum(decode(to_char(first_time,’HH24′),’13’,1,0)),’99’) ” 13″,
to_char(sum(decode(to_char(first_time,’HH24′),’14’,1,0)),’99’) ” 14″,
to_char(sum(decode(to_char(first_time,’HH24′),’15’,1,0)),’99’) ” 15″,
to_char(sum(decode(to_char(first_time,’HH24′),’16’,1,0)),’99’) ” 16″,
to_char(sum(decode(to_char(first_time,’HH24′),’17’,1,0)),’99’) ” 17″,
to_char(sum(decode(to_char(first_time,’HH24′),’18’,1,0)),’99’) ” 18″,
to_char(sum(decode(to_char(first_time,’HH24′),’19’,1,0)),’99’) ” 19″,
to_char(sum(decode(to_char(first_time,’HH24′),’20’,1,0)),’99’) ” 20″,
to_char(sum(decode(to_char(first_time,’HH24′),’21’,1,0)),’99’) ” 21″,
to_char(sum(decode(to_char(first_time,’HH24′),’22’,1,0)),’99’) ” 22″,
to_char(sum(decode(to_char(first_time,’HH24′),’23’,1,0)),’99’) ” 23″
from v$log_history
where to_char(first_time,’DD-MON-RR’)=’16-AUG-10′
group by to_char(first_time,’DD-MON-RR’)
order by 1
/

16)I/O Generation:
We can find out CPU and I/O generation details for all the users in the Database with the help of the below query:
— Show IO per session,CPU in seconds, sessionIOS.
set linesize 140
col spid for a6
col program for a35 trunc
select p.spid SPID,to_char(s.LOGON_TIME,’DDMonYY HH24:MI’) date_login,s.username,decode(nvl(p.background,0),1,bg.description, s.program ) program,
ss.value/100 CPU,physical_reads disk_io,(trunc(sysdate,’J’)-trunc(logon_time,’J’)) days,
round((ss.value/100)/(decode((trunc(sysdate,’J’)-trunc(logon_time,’J’)),0,1,(trunc(sysdate,’J’)-trunc(logon_time,’J’)))),2) cpu_per_day
from V$PROCESS p,V$SESSION s,V$SESSTAT ss,V$SESS_IO si,V$BGPROCESS bg
where s.paddr=p.addr and ss.sid=s.sid
and ss.statistic#=12 and si.sid=s.sid
and bg.paddr(+)=p.addr
and round((ss.value/100),0) > 10
order by 8;
To know what the session is doing and what kind of sql it is using:

— what kind of sql a session is using
set lines 9999
set pages 9999

select s.sid, q.sql_text from v$sqltext q, v$session s
where q.address = s.sql_address
and s.sid = &sid order by piece;

eg: sid=1853

17)Sync arch:
In a Dataguard environment we have to check primary is in sync with the secondary Database.This we can check as follows:
The V$ MANAGED_STANDBY view on the standby database site shows you the activities performed by
both redo transport and Redo Apply processes in a Data Guard environment
SELECT PROCESS, CLIENT_PROCESS, SEQUENCE#, STATUS FROM V$MANAGED_STANDBY;
In some situations, automatic gap recovery may not take place and you will need to perform gap recovery manually. For example, you will need to perform gap recovery manually if you are using logical standby databases and the primary database is not available.
The following sections describe how to query the appropriate views to determine which log files are missing and perform manual recovery.
On a physical standby database
To determine if there is an archive gap on your physical standby database, query the V$ARCHIVE_GAP view as shown in the following example:
SQL> SELECT * FROM V$ARCHIVE_GAP;

If it displays no rows than the primary Database is in sync with the standy Database.If it display any information with row than manually we have to apply the archive logs.

After you identify the gap, issue the following SQL statement on the primary database to locate the archived redo log files on your primary database (assuming the local archive destination on the primary database is LOG_ARCHIVE_DEST_1):
Eg:
SELECT NAME FROM V$ARCHIVED_LOG WHERE THREAD#=1 AND DEST_ID=1 AND SEQUENCE# BETWEEN 7 AND 10;
Copy these log files to your physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE statement on your physical standby database. For example:
SQL> ALTER DATABASE REGISTER LOGFILE
‘/physical_standby1/thread1_dest/arcr_1_7.arc’;
SQL> ALTER DATABASE REGISTER LOGFILE
‘/physical_standby1/thread1_dest/arcr_1_8.arc’;

After you register these log files on the physical standby database, you can restart Redo Apply. The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next gap that is currently blocking Redo Apply from continuing. After resolving the gap and starting Redo Apply, query the V$ARCHIVE_GAP fixed view again on the physical standby database to determine the next gap sequence, if there is one. Repeat this process until there are no more gaps.

On a logical standby database:
To determine if there is an archive gap, query the DBA_LOGSTDBY_LOG view on the logical standby database. For example, the following query indicates there is a gap in the sequence of archived redo log files because it displays two files for THREAD 1 on the logical standby database. (If there are no gaps, the query will show only one file for each thread.) The output shows that the highest registered file is sequence number 10, but there is a gap at the file shown as sequence number 6:
SQL> COLUMN FILE_NAME FORMAT a55
SQL> SELECT THREAD#, SEQUENCE#, FILE_NAME FROM DBA_LOGSTDBY_LOG L
2> WHERE NEXT_CHANGE# NOT IN
3> (SELECT FIRST_CHANGE# FROM DBA_LOGSTDBY_LOG WHERE L.THREAD# = THREAD#)
4> ORDER BY THREAD#,SEQUENCE#;

THREAD# SEQUENCE# FILE_NAME
———- ———- ———————————————–
1 6 /disk1/oracle/dbs/log-1292880008_6.arc
1 10 /disk1/oracle/dbs/log-1292880008_10.arc

Copy the missing log files, with sequence numbers 7, 8, and 9, to the logical standby system and register them using the ALTER DATABASE REGISTER LOGICAL LOGFILE statement on your logical standby database. For example:
SQL> ALTER DATABASE REGISTER LOGICAL LOGFILE ‘/disk1/oracle/dbs/log-1292880008_10.arc’;

After you register these log files on the logical standby database, you can restart SQL Apply.

The DBA_LOGSTDBY_LOG view on a logical standby database only returns the next gap that is currently blocking SQL Apply from continuing. After resolving the identified gap and starting SQL Apply, query the DBA_LOGSTDBY_LOG view again on the logical standby database to determine the next gap sequence, if there is one. Repeat this process until there are no more gaps.
Monitoring Log File Archival Information:
Step 1 Determine the current archived redo log file sequence numbers.
Enter the following query on the primary database to determine the current archived redo log file sequence numbers:
SQL> SELECT THREAD#, SEQUENCE#, ARCHIVED, STATUS FROM V$LOG
WHERE STATUS=’CURRENT’;
Step 2 Determine the most recent archived redo log file.
Enter the following query at the primary database to determine which archived redo log file contains the most recently transmitted redo data:
SQL> SELECT MAX(SEQUENCE#), THREAD# FROM V$ARCHIVED_LOG GROUP BY THREAD#;
Step 3 Determine the most recent archived redo log file at each destination.
Enter the following query at the primary database to determine which archived redo log file was most recently transmitted to each of the archiving destinations:
SQL> SELECT DESTINATION, STATUS, ARCHIVED_THREAD#, ARCHIVED_SEQ#
2> FROM V$ARCHIVE_DEST_STATUS
3> WHERE STATUS ‘DEFERRED’ AND STATUS ‘INACTIVE’;

DESTINATION STATUS ARCHIVED_THREAD# ARCHIVED_SEQ#
—————— —— —————- ————-
/private1/prmy/lad VALID 1 947
standby1 VALID 1 947
The most recently written archived redo log file should be the same for each archive destination listed. If it is not, a status other than VALID might identify an error encountered during the archival operation to that destination.
Step 4 Find out if archived redo log files have been received.
You can issue a query at the primary database to find out if an archived redo log file was not received at a particular site. Each destination has an ID number associated with it. You can query the DEST_ID column of the V$ARCHIVE_DEST fixed view on the primary database to identify each destination’s ID number.
Assume the current local destination is 1, and one of the remote standby destination IDs is 2. To identify which log files are missing at the standby destination, issue the following query:
SQL> SELECT LOCAL.THREAD#, LOCAL.SEQUENCE# FROM
2> (SELECT THREAD#, SEQUENCE# FROM V$ARCHIVED_LOG WHERE DEST_ID=1)
3> LOCAL WHERE
4> LOCAL.SEQUENCE# NOT IN
5> (SELECT SEQUENCE# FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND
6> THREAD# = LOCAL.THREAD#);

THREAD# SEQUENCE#
——— ———
1 12
1 13
1 14

18)Purge arch:
We have to make sure the archive logs files are purged safely or move to Tape drive or any other location in order to make space for new archive logs files in the Archive logs destination locations.

19)Recovery status:
In order to do recover make sure you are having latest archive logs,so that you can restore and do the recovery if required.

20) MY DATABASE HEALTH CHECK SCRIPT:
/* SCRIPT FOR MONITORING AND CHECKING HEALTH OF DATABASE-USEFUL FOR PRODUCTION DATABASES */

— SHOWS RUNNING JOBS
select ‘RUNNING JOBS’, sid, job,instance from dba_jobs_running;
set lines 1000
— SHOWS ARCHIVE LOGS GENERAION DETAILS HOURLY AND DATE WISE BASIS
select ‘ARCHIVE LOG REPORT’,to_char(first_time,’DD-MON-RR’) “Date”,
to_char(sum(decode(to_char(first_time,’HH24′),’00’,1,0)),’99’) ” 00″,
to_char(sum(decode(to_char(first_time,’HH24′),’01’,1,0)),’99’) ” 01″,
to_char(sum(decode(to_char(first_time,’HH24′),’02’,1,0)),’99’) ” 02″,
to_char(sum(decode(to_char(first_time,’HH24′),’03’,1,0)),’99’) ” 03″,
to_char(sum(decode(to_char(first_time,’HH24′),’04’,1,0)),’99’) ” 04″,
to_char(sum(decode(to_char(first_time,’HH24′),’05’,1,0)),’99’) ” 05″,
to_char(sum(decode(to_char(first_time,’HH24′),’06’,1,0)),’99’) ” 06″,
to_char(sum(decode(to_char(first_time,’HH24′),’07’,1,0)),’99’) ” 07″,
to_char(sum(decode(to_char(first_time,’HH24′),’08’,1,0)),’99’) ” 08″,
to_char(sum(decode(to_char(first_time,’HH24′),’09’,1,0)),’99’) ” 09″,
to_char(sum(decode(to_char(first_time,’HH24′),’10’,1,0)),’99’) ” 10″,
to_char(sum(decode(to_char(first_time,’HH24′),’11’,1,0)),’99’) ” 11″,
to_char(sum(decode(to_char(first_time,’HH24′),’12’,1,0)),’99’) ” 12″,
to_char(sum(decode(to_char(first_time,’HH24′),’13’,1,0)),’99’) ” 13″,
to_char(sum(decode(to_char(first_time,’HH24′),’14’,1,0)),’99’) ” 14″,
to_char(sum(decode(to_char(first_time,’HH24′),’15’,1,0)),’99’) ” 15″,
to_char(sum(decode(to_char(first_time,’HH24′),’16’,1,0)),’99’) ” 16″,
to_char(sum(decode(to_char(first_time,’HH24′),’17’,1,0)),’99’) ” 17″,
to_char(sum(decode(to_char(first_time,’HH24′),’18’,1,0)),’99’) ” 18″,
to_char(sum(decode(to_char(first_time,’HH24′),’19’,1,0)),’99’) ” 19″,
to_char(sum(decode(to_char(first_time,’HH24′),’20’,1,0)),’99’) ” 20″,
to_char(sum(decode(to_char(first_time,’HH24′),’21’,1,0)),’99’) ” 21″,
to_char(sum(decode(to_char(first_time,’HH24′),’22’,1,0)),’99’) ” 22″,
to_char(sum(decode(to_char(first_time,’HH24′),’23’,1,0)),’99’) ” 23″
from v$log_history
group by to_char(first_time,’DD-MON-RR’)
order by 1
/
— WHAT ALL THE SESSIONS ARE GETTING BLOCKED
select ‘SESSIONS BLOCKED’,process,sid, blocking_session from v$session where blocking_session is not null;
— WHICH SESSION IS BLOCKING WHICH SESSION
set lines 9999
set pages 9999
select s1.username || ‘@’ || s1.machine
|| ‘ ( SID=’ || s1.sid || ‘ ) is blocking ‘
|| s2.username || ‘@’ || s2.machine || ‘ ( SID=’ || s2.sid || ‘ ) ‘ AS blocking_status
from v$lock l1, v$session s1, v$lock l2, v$session s2
where s1.sid=l1.sid and s2.sid=l2.sid
and l1.BLOCK=1 and l2.request > 0
and l1.id1 = l2.id1
and l2.id2 = l2.id2 ;
— SHOWS BLOCK CHANGES DETAILS AND PHYSICAL READS DETAIL
select a.sid,b.username,block_gets,consistent_gets,physical_reads,block_changes
from V$SESS_IO a,V$SESSION b
where a.sid=b.sid and block_changes > 10000 order by block_changes desc;
— show IO per session / CPU in seconds. sessionIOS.sql
set linesize 140
col spid for a6
col program for a35 trunc
select p.spid SPID,to_char(s.LOGON_TIME,’DDMonYY HH24:MI’) date_login,s.username,decode(nvl(p.background,0),1,bg.description, s.program ) program,ss.value/100 CPU,physical_reads disk_io,(trunc(sysdate,’J’)-trunc(logon_time,’J’)) days,round((ss.value/100)/(decode((trunc(sysdate,’J’)-trunc(logon_time,’J’)),0,1,(trunc(sysdate,’J’)-trunc(logon_time,’J’)))),2) cpu_per_day
from V$PROCESS p,V$SESSION s,V$SESSTAT ss,V$SESS_IO si,V$BGPROCESS bg
where s.paddr=p.addr and ss.sid=s.sid
and ss.statistic#=12 and si.sid=s.sid
and bg.paddr(+)=p.addr
and round((ss.value/100),0) > 10
order by 8;
— SCRIPT TO IDENTIFY LONG RUNNING STATEMENTS
rem LONGOPS.SQL
rem Long Running Statements
rem Helmut Pfau, Oracle Deutschland GmbH
set linesize 120
col opname format a20
col target format a15
col units format a10
col time_remaining format 99990 heading Remaining[s]
col bps format 9990.99 heading [Units/s]
col fertig format 90.99 heading “complete[%]”
select sid,
opname,
target,
sofar,
totalwork,
units,
(totalwork-sofar)/time_remaining bps,
time_remaining,
sofar/totalwork*100 fertig
from v$session_longops
where time_remaining > 0
/
— ACTIVE SESSIONS IN DATABASE
select ‘ACTIVE SESSION’, sid, serial#,machine, osuser,username,status from v$session where username!=’NULL’ and status=’ACTIVE’;
— WHAT SQL A SESSION IS USING
set lines 9999
set pages 9999
select s.sid, q.sql_text from v$sqltext q, v$session s
where q.address = s.sql_address
and s.sid = &sid order by piece;

eg:SID=1844
I would like to add one more script which will tell me details regarding the Size of the Database used,occupied and available and Tablespace usage
details along with hit ratio of various SGA components which can be very helpfull
to monitor the performance of the Databases.

Database_monitor.sql:

ttitle “1. :============== Tablespace Usage Information ==================:” skip 2
set linesize 140
col Total format 99999.99 heading “Total space(MB)”
col Used format 99999.99 heading “Used space(MB)”
col Free format 99999.99 heading “Free space(MB)”
break on report
compute sum of Total space(MB) on report
compute sum of Used space(MB) on report
compute sum of Free space(MB) on report
select a.tablespace_name, round(a.bytes/1024/1024,2) Total,
round( nvl( b.bytes,0)/1024/1024,2) Used,
round(nvl(c.bytes, 0)/1024/1024,2) Free ,
round(nvl(b.bytes,0)*100/nvl(a.bytes,0),2) “% Used”
from sys.sm$ts_avail a, sys.sm$ts_used b, sys.sm$ts_free c
where a.tablespace_name=b.tablespace_name(+)
and b.tablespace_name=c.tablespace_name(+);

ttitle “2. :============== Hit Ratio Information ==================:” skip 2
set linesize 80
clear columns
clear breaks
set pagesize 60 heading off termout off echo off verify off
REM
col val1 new_val lib noprint
select 100*(1-(SUM(Reloads)/SUM(Pins))) val1
from V$LIBRARYCACHE;

ttitle off
col val2 new_val dict noprint
select 100*(1-(SUM(Getmisses)/SUM(Gets))) val2
from V$ROWCACHE;

ttitle off
col val3 new_val phys_reads noprint
select Value val3
from V$SYSSTAT
where Name = ‘physical reads’;

ttitle off
col val4 new_val log1_reads noprint
select Value val4
from V$SYSSTAT
where Name = ‘db block gets’;

ttitle off
col val5 new_val log2_reads noprint
select Value val5
from V$SYSSTAT
where Name = ‘consistent gets’;

ttitle off
col val6 new_val chr noprint
select 100*(1-(&phys_reads / (&log1_reads + &log2_reads))) val6
from DUAL;

ttitle off
col val7 new_val avg_users_cursor noprint
col val8 new_val avg_stmts_exe noprint
select SUM(Users_Opening)/COUNT(*) val7,
SUM(Executions)/COUNT(*) val8
from V$SQLAREA;
ttitle off
set termout on
set heading off
ttitle –
center ‘SGA Cache Hit Ratios’ skip 2

select ‘Data Block Buffer Hit Ratio : ‘||&chr db_hit_ratio,
‘ Shared SQL Pool ‘,
‘ Dictionary Hit Ratio : ‘||&dict dict_hit,
‘ Shared SQL Buffers (Library Cache) ‘,
‘ Cache Hit Ratio : ‘||&lib lib_hit,
‘ Avg. Users/Stmt : ‘||
&avg_users_cursor||’ ‘,
‘ Avg. Executes/Stmt : ‘||
&avg_stmts_exe||’ ‘
from DUAL;

ttitle “3. :============== Sort Information ==================:” skip 2

select A.Value Disk_Sorts,
B.Value Memory_Sorts,
ROUND(100*A.Value/
DECODE((A.Value+B.Value),0,1,(A.Value+B.Value)),2)
Pct_Disk_Sorts
from V$SYSSTAT A, V$SYSSTAT B
where A.Name = ‘sorts (disk)’
and B.Name = ‘sorts (memory)’;

ttitle “4. :============== Database Size Information ==================:” skip 2

select sum(bytes/1024/1024/1024) Avail from sm$ts_avail union all select sum(bytes/1024/1024/1024) Used from sm$ts_used union all select sum(bytes/1024/1024/1024) Free from sm$ts_free;