IguanaX High Availability
Overview
High Availability (HA) applies to the entire integration stack, not just IguanaX itself. Designing IguanaX for HA requires understanding networking, operating systems, storage, load balancing, application behavior, and message processing semantics.
IguanaX HA focuses on keeping integration services available in the presence of predictable failures such as:
-
Server crashes
-
Network isolation
-
Planned maintenance
-
IguanaX process failures
This document explains core HA concepts, key terminology, and availability guarantees in IguanaX. Detailed architecture patterns are covered separately in the IguanaX HA Designs pages.
Key Terminology
To better understand Iguana HA strategies and design, here we look at some key concepts and terminology:
Term | Definition |
|---|---|
High Availability (HA) | The ability of IguanaX to remain available and responsive despite predictable failures such as server crashes, process failures, or planned maintenance. HA focuses on fast recovery, not zero downtime. |
Application / Interface / Service | The externally visible endpoint that clients connect to, typically defined by an IP address and port. In IguanaX, this is usually a From LLP or From HTTP(S) listener. |
Active Instance | The IguanaX instance currently responsible for processing messages, running pollers, and accepting traffic for a given service. |
Passive Instance | A fully running IguanaX instance that does not process messages until it is promoted during failover. It continuously monitors the Active instance. |
Active–Passive (AP) | An HA model where one IguanaX instance is Active and another is on standby. Used when message sequencing, queue recovery, or polling exclusivity is required. |
Active–Active (AA) | An HA model where multiple IguanaX instances simultaneously accept inbound traffic for the same service. Suitable for HTTP/REST; generally not suitable for polling or strict HL7 sequencing. |
Load Balancer (LB) | A network component that routes LLP or HTTP traffic to healthy IguanaX instances and removes unhealthy ones using health checks. |
Health Check | A probe (for example, |
Heartbeat | A continuous liveness signal exchanged between IguanaX instances to detect failures and coordinate promotion or demotion. |
Failover | The automatic or controlled promotion of a Passive IguanaX instance to Active when the current Active instance becomes unavailable. |
Failback | The optional process of returning the Active role to the original node after recovery, usually performed manually during low-traffic periods (can be done automatically). |
Polling Components | IguanaX components such as From DB, From File, or From FTP that actively fetch data. These must run on only one Active instance to avoid duplicate processing. |
Listener Components | Inbound components such as From LLP and From HTTP(S) that receive messages pushed from external systems. These may operate in Active–Passive or Active–Active mode depending on design. |
Message Queue | The internal IguanaX structure where inbound messages are stored before processing. Queues may be local or shared depending on HA architecture. |
Queue Recovery | The ability for a backup IguanaX instance to resume message processing from the exact point where the Active instance failed, without loss or duplication. |
Message Sequencing | Guarantee that messages are processed in the same order they are received. Requires coordinated queue ownership and recovery. |
Working Directory (WorkingDir) | The filesystem location where IguanaX stores queues, logs, and configuration artifacts. It also contains application data such as cached resources. May be local or shared depending on HA design. |
Shared WorkingDir | A working directory accessible by multiple IguanaX instances, typically via NAS or cloud file services, used to enable queue recovery and sequencing. |
Fault Tolerance | The ability of a system to continue operating when a single component fails, often through redundancy. HA systems are composed of fault-tolerant components. |
Disaster Recovery (DR) | The process of restoring IguanaX services after a major outage such as site loss or regional failure. DR is geographically separated and distinct from HA. |
Understanding IguanaX High Availability
Server Failures vs Component Failures
IguanaX HA is designed to handle server-level failures, not Component logic errors.
-
If a Component fails due to bad data or code bugs, failover will not fix the issue
-
The same failure would occur on the backup IguanaX instance
IguanaX HA handles:
-
Server crashes
-
IguanaX instance crashes
-
Network issues
-
Storage failure
Component behavior must still be designed defensively such as adding retry logic, error notification…etc.
Service Failover vs Server Failover
IguanaX is not a single application — it is a platform that hosts many services.
Each inbound listener (LLP, HTTP):
-
Has its own availability requirements
-
May behave differently during failover
HA design must consider:
-
Which services can be Active-Active
-
Which must remain Active-Passive
-
How pollers are fenced
Failover planning must happen per interface, not just per server.
Guaranteed Availability in IguanaX
HA guarantees service availability, not zero downtime.
What IguanaX HA Can Guarantee
-
An endpoint is always listening at a known address
-
Traffic is routed to a healthy IguanaX instance
-
Failover occurs automatically for server failures
Common Customer Expectations
Instant Failover
-
Achieved via load balancers and hot-standby instances
-
Requires both instances to be running
-
Typical detection and failover time in seconds
Queue Recovery
-
Requires shared storage for logs
-
Enables automated recovery of IguanaX message queues.
-
Guarantees that no messages are lost
Message Sequence
-
Shared logs and queue files ensure that messages are processed in the same order in which they are received by IguanaX.
-
Continue to be processed in the same order after failover to another IguanaX instance
-
May involve a brief downtime during failover while queue positions are re-synchronized
High Availability vs Disaster Recovery
High Availability and Disaster Recovery solve different problems.
High Availability | Disaster Recovery |
|---|---|
Handles single failures | Handles site-wide disasters |
Short outages | Long outages |
Automatic failover | Manual recovery |
Same region/datacenter | Separate region/site |
Technology-focused | People + process + technology |
In summary, High Availability (HA) keeps services running, while Disaster Recovery (DR) restores services after a catastrophic failure. A production environment typically requires both.
It is not recommended to combine HA and DR into a single solution, such as placing the Passive IguanaX instance in a different region. Doing so would require a global load balancer and shared storage across regions, which can significantly reduce performance and introduce high operational and infrastructure costs.
HA in IguanaX: Key Design Considerations
Area | What Must Be Considered |
|---|---|
Redundancy | IguanaX instances, configuration storage, queue and log storage, and load balancers must be duplicated to eliminate single points of failure. |
Fault Detection | Health checks and heartbeats are required to detect failures quickly and distinguish between planned and unplanned outages. |
Traffic Routing | LLP and HTTP traffic routing and Polling components must be controlled to ensure correct processing and avoid duplicates. |
Data Consistency | Message ordering, queue recovery guarantees must be defined and preserved during failover. |
Different IguanaX HA designs make different trade-offs between these considerations.