Tried to diagnose performance issue but missing snapshots. AWR was configured to run as standard hourly intervals with retention set to 28 days. Nothing had changed and all was working since the DB was created.
MMON up and available
I couldn’t take manual snapshot either, just hangs
MMON trace files shows:
kgskmetricupd: cg[OTHER_GROUPS] used_quanta greater than threshold - Google and metalink reveal nothing about the error... Unable to schedule a MMON slave at: Auto Flush Main 1 Slave action has been temporarily suspended - Slave action had prior policy violations. Unknown return code: 101
– Clearly something is making MMON unhappy. Let’s look into MMON policy violations. Metalink has an article that seems to fit: 761298.1
– Recommends killing MMON process as it will respawn
[root@server kateh]# ps -ef|grep mmon root 3567 1889 0 16:00 pts/1 00:00:00 grep mmon oracle 20761 1 0 Feb23 ? 00:00:13 asm_mmon_+ASM oracle 29275 1 0 11:23 ? 00:00:23 ora_mmon_ORCL [root@server kateh]# kill -9 29275
[root@server kateh]# ps -ef|grep mmon root 3567 1889 0 16:00 pts/1 00:00:00 grep mmon oracle 20761 1 0 Feb23 ? 00:00:13 asm_mmon_+ASM oracle 21354 1 0 11:25 ? 00:00:23 ora_mmon_ORCL
However, still can’t generate a snpashot manually, command hangs and same error in the MMON trace file
So Oracle docs say that MMON will suspend if Oracle hits certain system resource limits, understandable, is that what’s happening here?
Looking at TOP shows that we are currently utilising swap space and therefore MMON will not work! Turns out Oracle itself was using swap (not ideal and something we are switching to hugepages to resolve). Restarting Oracle fixed the swap provision and MMON suddenly sprung into life generating AWR snaps.
If the recommended action of restarting MMON doesn’t work, then check your system limits as this can also cause MMON to hang