Skip to main content

How to Handle IguanaX Crash

Summary

This document lists common IguanaX crash and hanging/unresponsive issues, including:

  • Typical symptoms and error messages

  • Known or likely root causes

  • Step‑by‑step troubleshooting and resolution procedures

  • Prevention and best practices for production environments

List of Common IguanaX Crashes

Case 1 - IguanaX Service Running but UI / API Completely Unresponsive

Typical Symptoms

  • Web dashboard does not load or hangs indefinitely, even on / or /status

  • curl to IguanaX HTTP port returns nothing and times out

  • No new log entries even though traffic should be flowing

  • Service appears running at OS level; stopping the service takes a very long time

  • No crash dump files generated

Error / Log Patterns

  • ServiceErrorLog may be sparse or show:

    • Long‑running DB calls

    • Timeouts to external systems

    • Possible deadlock‑like behavior (threads waiting)

  • OS may show high CPU, high I/O wait, or high context switches.

Root Cause (Typical Patterns)

  1. Resource exhaustion or blocking

    • Long‑running DB queries saturating worker threads

    • External APIs or SFTP endpoints hanging connections

    • Thread pool starvation or internal locking (contention on configuration/log resources)

  2. OS / environment interaction

    • Very bare‑bones images (e.g. AL2023 with missing libraries) causing delayed responses or blocking I/O

    • Network firewall or security groups slowing or stalling connections

  3. IguanaX bug

    • Race condition or deadlock in IguanaX itself (requires DEV/QA analysis of crash/hang traces)

Possible Solutions / Troubleshooting Steps

  1. Immediate recovery

    • Attempt graceful restart:

      • Linux: systemctl stop iguana (or appropriate service name), wait for clean stop; if it never exits, capture diagnostics and then force‑kill as last resort.

      • Windows: net stop Iguana or stop via Services console; if it hangs, capture dump then terminate.

    • After restart, monitor ServiceErrorLog closely.

  2. Diagnostics collection

    • Capture full ServiceErrorLog.txt and IguanaX logs for the time window.

    • On Linux, run:

      • top -H -p <iguana_pid> to inspect per‑thread CPU

      • strace -p <iguana_pid> -tt -T -f -o iguana_strace.log for a short period (if permissible)

    • On Windows, generate a process dump via Task Manager or ProcDump.

  3. Check external dependencies

    • DB:

      • Review connection pool size, query performance, and timeouts.

      • Confirm DB firewall/security groups allow stable connections.

    • File system:

      • Check for slow or blocked NFS/SAN paths (log / config directories).
    • Network:

      • Validate LB/Firewall configuration isn't stalling connections.
  4. Configuration optimization

    • Reduce simultaneous heavy tasks (bulk imports, log exports) during peak hours.

    • Add timeouts and retries around external calls in Components.

    • Ensure IguanaX working directory is on a local, fast disk.

Case 2 - IguanaX Crash with Crash Dump Generated

Typical Symptoms

  • IguanaX service terminates unexpectedly

  • Operating system shows service stopped; attempts to access web UI fail immediately

  • Crash dump file(s) created in IguanaX working/config folder (similar behavior to Iguana 6 crash dumps, see Starter Guides)

Error / Log Patterns

  • Sudden stop in ServiceErrorLog and IguanaX logs

  • OS may log application fault (Windows Event Viewer, Linux journalctl / syslog)

  • Crash dump may include:

    • Access violation / segmentation fault

    • Assertion failure

    • Memory allocation or resource errors

Root Cause (Typical Patterns)

  • Core IguanaX bug (null pointers, bad memory access, unhandled exception)

  • Third‑party library failures (DB drivers, SSL libraries)

  • Edge cases in configuration or Component execution that trigger core engine issues

Possible Solutions / Troubleshooting Steps

  1. Immediate actions

    • Confirm no data loss or recovery needs (check HA/cold‑standby, log queues).

    • If safe, restart IguanaX and observe:

      • Does it crash again immediately?

      • Does it only crash under specific traffic or user actions?

  2. Collect artifacts for analysis

    • Compress and upload:

      • Crash dump file(s)

      • ServiceErrorLog.txt

      • IguanaX logs around the crash

      • OS logs (Event Viewer, journalctl, etc.)

  3. Analyze patterns

    • Does the crash correlate with:

      • Specific channel / component?

      • High concurrency / peak load?

      • Specific payload content?

  4. Permanent fix

    • DEV analyzes crash dumps and source.

    • Once fixed version released:

      • Recommend upgrade to the fixed IguanaX version.

      • Add that version and ICS/DEV references under this case.

Case 3 - IguanaX Hang / Corruption After Ungraceful Shutdown (Post‑Patching / Server Events)

Typical Symptoms

  • IguanaX behaves strangely or fails to start after:

    • OS patching / server reboot

    • Hypervisor/VM events

  • ServiceErrorLog shows no orderly stop; long gap between logs indicating an ungraceful shutdown.

  • Channels may disappear, or configuration may appear inconsistent/corrupt.

Error / Log Patterns

  • Large jump in ServiceErrorLog timestamps between last normal operation and next startup.

  • No stop event logs.

  • Subsequent log entries may show:

    • Start‑up issues

    • Corrupted repos or configuration errors

Root Cause

  • Ungraceful server shutdown (power loss, forced reboot, kill -9, VMware/Hyper‑V host reboot, cloud instance termination) leading to:

    • Partial writes

    • Corrupted configuration or log indexes

    • Inconsistent state on disk

Possible Solutions / Troubleshooting Steps

  1. Immediate recovery

    • Attempt to start IguanaX and observe errors.

    • If start fails or behaves abnormally, consider rebuild of working directories similar to Iguana 6 practice

      Typical steps (adapt for IguanaX):

      1. Backup the entire working directory.

      2. Stop IguanaX service.

      3. Rename or remove specific sub‑folders (e.g., edit, run, IguanaConfigurationRepo equivalent in IguanaX).

      4. Restart IguanaX to allow it to rebuild from the main repo.

  2. Data and configuration validation

    • Verify:

      • Channels, users, server settings, and certificates are present.

      • Logs and queues are readable.

    • If corruption is severe, restore from a known‑good backup.

  3. Process / runbook update

    • Update customer's support plan:

      • Before scheduled patching or maintenance:

        • Gracefully stop IguanaX service.
      • Only after OS and hardware are stable, start IguanaX again.

    • Provide doc link or internal KBA that explains graceful shutdown process for Windows and Linux:

      • (Add actual IguanaX‑specific shutdown doc link here once available.)
  4. Prevention

    • Implement standard ops practice:

      • systemctl stop / net stop before patch windows.

      • Confirm IguanaX is fully stopped before reboot.

    • Consider HA or redundant instances for planned maintenance:

      IguanaX High Availability

Case 4 - Severe Slowness / Freezing on Save or Configuration Change

Typical Symptoms

  • Whenever users save changes, the UI freezes and the entire server becomes sluggish.

  • Other users cannot log in or operations stall for several minutes.

  • After a restart or rebuild of certain directories, performance returns to normal.

Error / Log Patterns

  • High latency or blocking when committing configuration.

  • No major CPU spikes, but operations stall.

  • Potential internal repo corruption or heavy locking.

Root Cause

  • Corruption or internal inconsistency in configuration repositories.

  • Lock contention or expensive migrations run on every save.

Possible Solutions / Troubleshooting Steps

  1. Backup first

    • Backup:

      • IguanaX working directory.

      • Repositories holding configuration.

  2. Rebuild config‑related directories

    • Stop IguanaX.

    • Rename/remove IguanaX equivalents of:

      • edit

      • run

      • IguanaConfigurationRepo

    • Start IguanaX and allow it to rebuild from the main repo.

    • Re‑test saving configuration.

  3. Check repository location and performance

    • Ensure configuration and logs live on fast local storage, not slow network shares.

    • Validate disk health (SMART, I/O latency).

  4. Prevention

    • Regular backups of repos.

    • Avoid abrupt interruptions during config save operations (no mid‑save restarts or crashes).

    • For large environments, coordinate configuration changes during lower‑traffic windows.

Case 5 - IguanaX terminated due to Failed precondition: Place != NULL ../COL/COLhashmap.h:316 Assertion failed: Place != __null

Typical Symptoms

  • Iguana will not startup.

Error / Log Patterns

  • Crash dump contains: IguanaX terminated due to Failed precondition: Place != NULL ../COL/COLhashmap.h:316 Assertion failed: Place != __null

  • Iguana Service logs contain several Saving messages found in journal log files..., indicating a crash loop on startup.

  • Iguana Service logs contain several Iguana has detected a crash loop and has started in safe mode. No components have been autostarted.

Root Cause

  • This is a bug in versions < 117

  • An edge case in journal recovery (on startup) where a component only has a special ‘delete-queue-consumer' position to be written to the log file. The old logic removed the special ‘delete-queue-consumer' position and tried to write an empty position map to the component log, causing the crash.

  • Could happen from initially unlinking two components, Iguana crashes, Iguana enters crash loop on startup due to edge case described above.

Possible Solutions / Troubleshooting Steps

  1. Backup first

    • Backup:

      • IguanaX working directory.

      • Repositories holding configuration.

      • IguanaX binary.

  2. Upgrade to the latest version

  3. This bug was fixed as of version 117.

Case 6 - IguanaX crashes when linking components

Typical Symptoms

  • Iguana crashes when you try to link two components together.

Error / Log Patterns

  • One or more components have ‘invalid' Component States:

    • In the Iguana Service logs, under the ‘Component States' log entry, if you see a Component with LastDataId = [-0-0 | 18446744073709551615 | 18446744073709551615], this is considered ‘invalid'.

Root Cause

  • This is a bug in versions < 116

  • Component's LastDataId was not set properly if the component only has non-DATA (info, warn, error, etc.) messages in the logs.

  • Because of the uninitialized bad LastDataId, linking these components to downstream consumer components would cause Iguana to crash.

Possible Solutions / Troubleshooting Steps

  1. Backup first

    • Backup:

      • IguanaX working directory.

      • Repositories holding configuration.

      • IguanaX binary.

  2. Upgrade to the latest version

  3. This bug was fixed as of version 116.