OSWatcher Analysis

So you installed OSWatcher as advised by Oracle for an SR, but can it do more than just generate SR data? Yes it can!

Make sure you’ve got GUI access either natively or using an X Server then invoke OSWBB for analysis and graphs. It needs Java to show the GUI but you can do some things without it:

[oracle@server oswbb]$ java -jar /oswbba.jar -i /archive

Starting OSW Analyzer V7.3.3
 OSWatcher Analyzer Written by Oracle Center of Expertise
 Copyright (c) 2014 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...

<snipped for brevity>
Parsing file server.com_iostat_17.02.22.0100.dat ...
Parsing file server.com_netstat_17.02.22.1000.dat ...
Parsing file server.com_netstat_17.02.22.1100.dat ...

Parsing file server.com_top_17.02.22.0100.dat ...
Parsing file server.com_top_17.02.22.0200.dat ...

Parsing file server.com_ps_17.02.22.1100.dat ...
<end of snipping>
Parsing Completed.

Enter 1 to Display CPU Process Queue Graphs
 Enter 2 to Display CPU Utilization Graphs
 Enter 3 to Display CPU Other Graphs
 Enter 4 to Display Memory Graphs
 Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
 Enter 7 to Generate All Memory Gif Files
 Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location of Gif Directory
 Enter T to Alter Graph Time Scale Only (Does not change analysis dataset)
 Enter D to Return to Default Graph Time Scale
 Enter R to Remove Currently Displayed Graphs

Enter A to Analyze Data
 Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)

Enter P to Generate A Profile
 Enter X to Export Parsed Data to File
 Enter Q to Quit Program

Please Select an Option:

Selection any of the Graph options loads a java-applet and gives images like these:

If you need a more thorough analysis then option ‘A’ gives you a comprehensive text based report. Here’s the first few setions of a repot I just ran on one of DBs that was hitting severe performance issues at around 4am every day:

This report is best viewed in a fixed font editor like textpad...

OSWatcher Analyzer

Input Archive:       /dboracle/orabase/oswbb/archive
Archive Source Dest: /dboracle/orabase/oswbb/archive
Archive Start Time:  Feb 22 04:00:29 2017
Archive Stop Time:   Feb 22 14:00:14 2017
Hostname:            SERVER.COM
OS Version:          Linux
Snapshot Freq:       60
CPU COUNT:           16

############################################################################
# Contents Of This Report:
#
# Section 1: System Status
# Section 2: System Slowdowns 
#   Section 2.1: System Slowdown RCA Process Level Ordered By Impact
# Section 3: System General Findings
# Section 4: CPU Detailed Findings
#   Section 4.1: CPU Run Queue:
#   Section 4.2: CPU Utilization: Percent Busy
#   Section 4.3: CPU Utilization: Percent Sys
# Section 5: Memory Detailed Findings
#   Section 5.1: Memory: Process Swap Queue 
#   Section 5.2: Memory: Scan Rate 
#   Section 5.3  Memory: Page In: 
#   Section 5.4  Memory: Page Tables (Linux only): 
#   Section 5.5: Top 5 Memory Consuming Processes Beginning
#   Section 5.6: Top 5 Memory Consuming Processes Ending
# Section 6: Disk Detailed Findings
#   Section 6.1: Disk Percent Utilization Findings
#   Section 6.2: Disk Service Times Findings
#   Section 6.3: Disk Wait Queue Times Findings
#   Section 6.4: Disk Throughput Findings
#   Section 6.5: Disk Reads Per Second
#   Section 6.6: Disk Writes Per Second
#   Section 6.7: Disk Percent CPU waiting on I/O
# Section 7: Network Detailed Findings
#   Section 7.1  Network Data Link Findings
#   Section 7.2: Network IP Findings
#   Section 7.3: Network UDP Findings
#   Section 7.4: Network TCP Findings
# Section 8: Process Detailed Findings
#   Section 8.1: PS Process Summary Ordered By Time
#   Section 8.2: PS for Processes With Status = D or T Ordered By Time
#   Section 8.3: PS for (Processes with CPU > 0) When System Idle CPU < 30% Ordered By Time
#   Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot
#   Section 8.5: Top RSS Processes Increasing Memory Per Snapshot
#


############################################################################
# Section 1: System Status
#
# This section lists the status of each major subsystem. Status values are:
# Critical: The subsystem requires immediate attention
# Warning:  The subsystem detailed findings should be reviewed
# OK:       The subsystem was found to be okay
# Unknown:  The status of the subsystem could not be determined
#
#
Subsystem       Status
------------------------
CPU             OK
MEMORY          WARNING
I/O             WARNING 
NET             OK

############################################################################
# Section 2.0: System Slowdown Summary Ordered By Impact
#
# This section lists the times when the OS started to slowdown. oswbba is
# able to measure this by looking at the timestamps in the individual files
# it collects. It compares the time between the snapshots and looks to see 
# how this time differs from the expected timestamp which will be the oswbb 
# $ Snapshot Freq value listed at the top of this file. Any slowdowns listed
# in this section will be ordered by the slowdown Secs column.The subsystem
# most likely responsible for the slowdown will be identified here along with.
# possible reason codes
#

SnapTime        Variance   Secs        Flags   SubSystem    Reasons (Most Likely)
--------------------------------------------------------------------------------
Wed Feb 22 04:08:50  3.3    200  0000-00-220   MEMORY       High Paging In Rate
Wed Feb 22 05:15:35  2.1    128  0000-00-220   MEMORY       High Paging In Rate
Wed Feb 22 05:13:27  2.0    125  0000-00-220   I/O          High Service Time, High Percent Busy
Wed Feb 22 05:20:25  1.7    107  0000-00-220   I/O          High Service Time, High Percent Busy
Wed Feb 22 05:11:22  1.4     88  0000-00-220   I/O          High Service Time, High Percent Busy
Wed Feb 22 05:24:54  1.4     88  0000-00-220   I/O          High Service Time, High Percent Busy

############################################################################
# Section 2.1: System Slowdown RCA Process Level Ordered By Impact
#
# This section lists any processes that could be identified in the OSWatcher data 
# which caused the slowdown in Section 2.0 above.
#

   No identifiable processes found

############################################################################
# Section 3: System General Findings
#
# This section lists all general findings that require attention. Each 
# finding has a status along with a subsystem. Further advice may also
# available regarding the finding.
#

WARNING: Memory high paging rate observed.
 Advise: The OS is low on free memory.
 Check:  The system is under stress with respect to memory

WARNING : Disk heavy utilization observed.
 Advise: Check disks to see why utilization is so high.
 Check:  Hot disk: I/O distribution should be evaluated
 Check:  The system is undersized with respect to I/O capacity
 Check:  AWR for SQL regression causing more I/O

WARNING : Disk high service time observed.
 Advise: Check disks to see why service time is so high.
 Check:  Hot disk: I/O distribution should be evaluated
 Check:  Disk may be defective


############################################################################
# Section 4: CPU Detailed Findings
#
# This section list cpu run queue, cpu percent utilization (busy) and cpu 
# percent running in SYSTEM mode statistics
#
############################################################################
# Section 4.1: CPU RUN QUEUE:
# Run queue should not exceed (Value/#CPU > 3) for any long period of time. 
# Below lists the number of times (NUMBER) and percent of the number of times
# (PERCENT) that run queue was High (>3) or Very High (>6). Pay attention to 
# high spanning multiple snaps as this represents the number of times run 
# queue remained high in back to back snapshots
#
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 594   100.00
High (>3)                                   2     0.34
Very High (>6)                              1     0.17
High spanning multiple snaps                0        0

The following snaps recorded very high run queue values:
SnapTime                      Value   Value/#CPU
------------------------------------------------
Wed Feb 22 12:38:11 GMT 2017    177           11

############################################################################
# Section 4.2: CPU UTILIZATION: PERCENT BUSY
# CPU utilization should not be high over long periods of time. The higher 
# the cpu utilization the longer it will take processes to run.  Below lists 
# the number of times (NUMBER) and percent of the number of times (PERCENT) 
# that cpu percent busy was High (>95%) or Very High (100%). Pay attention 
# to high spanning multiple snaps as this represents the number of times cpu
# percent busy remained high in back to back snapshots
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 594   100.00
High (>95%)                                 3     0.51
Very High (100%)                            1     0.17
High spanning multiple snaps                0        0

CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy:
SnapTime                      
------------------------------
Wed Feb 22 11:31:09 GMT 2017


############################################################################
# Section 4.3:CPU UTILIZATION: PERCENT SYS
# CPU utilization running in SYSTEM mode should not be greater than 30% over
# long periods of time. The higher system cpu utilization the longer it will
# take processes to run. Pay attention to high spanning multiple snaps as it
# is important that cpu utilization not stay persistently high (>30%)
#
                                       NUMBER  PERCENT
Snaps captured in archive                 594   100.00
High (>30%)                                 0        0
Very High (50%)                             0        0
High spanning multiple snaps                0        0

CPU UTILIZATION: The following snaps recorded ROOT processes using high percent cpu:
SnapTime                          Pid   CPU   Command
-----------------------------------------------------
Wed Feb 22 04:16:52 GMT 2017    11602  39.3   perl
Wed Feb 22 08:30:03 GMT 2017      495  47.4   parse_deepsight
Wed Feb 22 09:30:05 GMT 2017    32256  81.5   parse_deepsight
Wed Feb 22 10:24:07 GMT 2017     8429  50.5   java
Wed Feb 22 12:11:10 GMT 2017    31112  47.4   grep

CPU UTILIZATION: The following snaps recorded ORACLE Background processes using high percent cpu:
SnapTime                          Pid   CPU   Command
-----------------------------------------------------
Wed Feb 22 11:08:08 GMT 2017    31258  71.1   emagent
Wed Feb 22 11:14:08 GMT 2017    31258  65.5   emagent
Wed Feb 22 11:16:08 GMT 2017    31258  67.1   emagent
Wed Feb 22 11:17:09 GMT 2017    31258  91.8   emagent

The report has plenty more details if you run it. Interestingly it appears our issue is down to Cloud Control Agent! Perhaps another blog post coming up about how to stop emagent hogging all the CPU.

Leave a Reply