Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Prompt

Users said the database was slow for about 10 minutes sometime in the last hour. Use appm MCP to
check what happened and tell me the likely window, evidence, root cause, and next steps. 

Database Selection

Which database should I investigate for the slowness in the last hour?

  1. oci_19c
     Investigate the oci_19c database
❯ 2. oci_26ai
     Investigate the oci_26ai database
  3. Both
     Check both databases for the slow window

Response

Likely slow window

2026-05-14 09:03:00–09:14:50 UTC (~12 minutes). DB activity jumped from ~1 sample/min (steady-state) to 60–122 samples/min during this range, then dropped back. Matches the user’s “about 10 minutes” description.

Evidence

Wait breakdown for the window:

Wait classEventSamples
User I/Odb file scattered read663
Applicationenq: TX - row lock contention120
CPU ForegroundON CPU48
System I/Olog file sequential read12

Two FINSYS_CORE DBMS_SCHEDULER jobs on inst 1 / us-db01 drive almost all of it:

  • sid 498 (J000) — 711 samples (09:03–09:14) running three reporting queries against PAYMENT_TRANSACTIONS, all on plan 228481044:

    • 6cvtf8r8mny3xSELECT SUM(amount) FROM payment_transactions WHERE status='HELD' AND amount>5582.85 AND currency='EUR'
    • fara30aac18vtSELECT COUNT(*) FROM PAYMENT_TRANSACTIONS WHERE STATUS='PENDING' AND VALUE_DATE < TRUNC(SYSDATE)
    • 6rjja38fb6xrqSELECT COUNT(*), SUM(AMOUNT) FROM PAYMENT_TRANSACTIONS WHERE TXN_TYPE IN ('WIRE','SEPA') AND STATUS != 'SETTLED'
    • All three plans are TABLE ACCESS FULL on PAYMENT_TRANSACTIONS (cost 27K–987K, est. 1.6M–9.9M rows) → that’s the db file scattered read storm.
  • sid 199 (J002) — 120 samples (09:06–09:07:59) stuck on enq: TX - row lock contention waiting for 9hkqfa1ht9cvr: UPDATE SETTLEMENT_LOCKS SET ACQUIRED_BY=…, EXPECTED_RELEASE=SYSDATE+1/1440 WHERE RESOURCE_ID=1 Blocker = sid 32 (inst 1, J001, FINSYS_CORE) — also a scheduler job, idle-in-transaction (no ASH samples while it held the row).

Root cause

  1. Concurrent full scans of PAYMENT_TRANSACTIONS. Three FINSYS_CORE scheduler jobs kicked off ~09:03 and ran reporting aggregates with no supporting indexes for the predicates, so the optimizer chose FTS for all of them. The combined multi-million-row scans saturated buffer cache / I/O for the duration.
  2. Stuck application-level mutex. SETTLEMENT_LOCKS is used as a row-lock mutex (RESOURCE_ID=1); the SQL sets EXPECTED_RELEASE = SYSDATE + 1/1440 (60 s), but the holder (sid 32) kept it ~2 min, blocking sid 199. Likely the protected critical section is slower than budgeted (probably because of the I/O storm from issue 1), or the holder isn’t committing promptly.

Next steps

  1. Add indexes on PAYMENT_TRANSACTIONS to eliminate the FTS path:
    • (STATUS, VALUE_DATE) — for the PENDING/VALUE_DATE query
    • (STATUS, CURRENCY, AMOUNT) — for the HELD/EUR query (or a partial/function-based index on STATUS='HELD')
    • (TXN_TYPE, STATUS) — or a partial index WHERE STATUS != 'SETTLED' for the WIRE/SEPA query (likely the highest-value index — non-settled rows are a small fraction)
  2. Stagger the FINSYS_CORE scheduler jobs. They all start near :03 and pile onto the same table; spread them out or consolidate the three aggregates into a single pass.
  3. Investigate the SETTLEMENT_LOCKS holder (sid 32 / J001). Find the calling job, verify it commits/rolls back promptly. The 60-second EXPECTED_RELEASE budget is being exceeded by 2×; that may be a symptom of issue 1, but the pattern itself (single-row mutex with no timeout enforcement) is fragile.
  4. Refresh stats / verify the optimizer picks the new indexes after they exist — cardinality estimates (9.9M rows for STATUS='PENDING') should be validated against reality.
  5. Confirm the I/O subsystem wasn’t itself degraded — but the per-SQL evidence is already strong; this is application-induced, not a storage incident.