IguanaX High Availability

Overview

High Availability (HA) applies to the entire integration stack, not just IguanaX itself. Designing IguanaX for HA requires understanding networking, operating systems, storage, load balancing, application behavior, and message processing semantics.

IguanaX HA focuses on keeping integration services available in the presence of predictable failures such as:

Server crashes
Network isolation
Planned maintenance
IguanaX process failures

This document explains core HA concepts, key terminology, and availability guarantees in IguanaX. Detailed architecture patterns are covered separately in the IguanaX HA Designs pages.

Key Terminology

To better understand Iguana HA strategies and design, here we look at some key concepts and terminology:

Term	Definition
High Availability (HA)	The ability of IguanaX to remain available and responsive despite predictable failures such as server crashes, process failures, or planned maintenance. HA focuses on fast recovery, not zero downtime.
Application / Interface / Service	The externally visible endpoint that clients connect to, typically defined by an IP address and port. In IguanaX, this is usually a From LLP or From HTTP(S) listener.
Active Instance	The IguanaX instance currently responsible for processing messages, running pollers, and accepting traffic for a given service.
Passive Instance	A fully running IguanaX instance that does not process messages until it is promoted during failover. It continuously monitors the Active instance.
Active–Passive (AP)	An HA model where one IguanaX instance is Active and another is on standby. Used when message sequencing, queue recovery, or polling exclusivity is required.
Active–Active (AA)	An HA model where multiple IguanaX instances simultaneously accept inbound traffic for the same service. Suitable for HTTP/REST; generally not suitable for polling or strict HL7 sequencing.
Load Balancer (LB)	A network component that routes LLP or HTTP traffic to healthy IguanaX instances and removes unhealthy ones using health checks.
Health Check	A probe (for example, `https://<host>:7654/os`) used by a load balancer or supervisor to determine whether an IguanaX instance is healthy and ready to receive traffic.
Heartbeat	A continuous liveness signal exchanged between IguanaX instances to detect failures and coordinate promotion or demotion.
Failover	The automatic or controlled promotion of a Passive IguanaX instance to Active when the current Active instance becomes unavailable.
Failback	The optional process of returning the Active role to the original node after recovery, usually performed manually during low-traffic periods (can be done automatically).
Polling Components	IguanaX components such as From DB, From File, or From FTP that actively fetch data. These must run on only one Active instance to avoid duplicate processing.
Listener Components	Inbound components such as From LLP and From HTTP(S) that receive messages pushed from external systems. These may operate in Active–Passive or Active–Active mode depending on design.
Message Queue	The internal IguanaX structure where inbound messages are stored before processing. Queues may be local or shared depending on HA architecture.
Queue Recovery	The ability for a backup IguanaX instance to resume message processing from the exact point where the Active instance failed, without loss or duplication.
Message Sequencing	Guarantee that messages are processed in the same order they are received. Requires coordinated queue ownership and recovery.
Working Directory (WorkingDir)	The filesystem location where IguanaX stores queues, logs, and configuration artifacts. It also contains application data such as cached resources. May be local or shared depending on HA design.
Shared WorkingDir	A working directory accessible by multiple IguanaX instances, typically via NAS or cloud file services, used to enable queue recovery and sequencing.
Fault Tolerance	The ability of a system to continue operating when a single component fails, often through redundancy. HA systems are composed of fault-tolerant components.
Disaster Recovery (DR)	The process of restoring IguanaX services after a major outage such as site loss or regional failure. DR is geographically separated and distinct from HA.

Understanding IguanaX High Availability

Server Failures vs Component Failures

IguanaX HA is designed to handle server-level failures, not Component logic errors.

If a Component fails due to bad data or code bugs, failover will not fix the issue
The same failure would occur on the backup IguanaX instance

IguanaX HA handles:

Server crashes
IguanaX instance crashes
Network issues
Storage failure

Component behavior must still be designed defensively such as adding retry logic, error notification…etc.

Service Failover vs Server Failover

IguanaX is not a single application — it is a platform that hosts many services.

Each inbound listener (LLP, HTTP):

Has its own availability requirements
May behave differently during failover

HA design must consider:

Which services can be Active-Active
Which must remain Active-Passive
How pollers are fenced

Failover planning must happen per interface, not just per server.

Guaranteed Availability in IguanaX

HA guarantees service availability, not zero downtime.

What IguanaX HA Can Guarantee

An endpoint is always listening at a known address
Traffic is routed to a healthy IguanaX instance
Failover occurs automatically for server failures

Common Customer Expectations

Instant Failover

Achieved via load balancers and hot-standby instances
Requires both instances to be running
Typical detection and failover time in seconds

Queue Recovery

Requires shared storage for logs
Enables automated recovery of IguanaX message queues.
Guarantees that no messages are lost

Message Sequence

Shared logs and queue files ensure that messages are processed in the same order in which they are received by IguanaX.
Continue to be processed in the same order after failover to another IguanaX instance
May involve a brief downtime during failover while queue positions are re-synchronized

High Availability vs Disaster Recovery

High Availability and Disaster Recovery solve different problems.

High Availability	Disaster Recovery
Handles single failures	Handles site-wide disasters
Short outages	Long outages
Automatic failover	Manual recovery
Same region/datacenter	Separate region/site
Technology-focused	People + process + technology

In summary, High Availability (HA) keeps services running, while Disaster Recovery (DR) restores services after a catastrophic failure. A production environment typically requires both.

It is not recommended to combine HA and DR into a single solution, such as placing the Passive IguanaX instance in a different region. Doing so would require a global load balancer and shared storage across regions, which can significantly reduce performance and introduce high operational and infrastructure costs.

HA in IguanaX: Key Design Considerations

Area	What Must Be Considered
Redundancy	IguanaX instances, configuration storage, queue and log storage, and load balancers must be duplicated to eliminate single points of failure.
Fault Detection	Health checks and heartbeats are required to detect failures quickly and distinguish between planned and unplanned outages.
Traffic Routing	LLP and HTTP traffic routing and Polling components must be controlled to ensure correct processing and avoid duplicates.
Data Consistency	Message ordering, queue recovery guarantees must be defined and preserved during failover.

Different IguanaX HA designs make different trade-offs between these considerations.

Overview​

Key Terminology​

Understanding IguanaX High Availability​

Server Failures vs Component Failures​

Service Failover vs Server Failover​

Guaranteed Availability in IguanaX​

What IguanaX HA Can Guarantee​

Common Customer Expectations​

Instant Failover​

Queue Recovery​

Message Sequence​

High Availability vs Disaster Recovery​

HA in IguanaX: Key Design Considerations​