Skip to main content

StatisticalAnomalyDetection@1

Node StatisticalAnomalyDetection@1 is used to detect anomalies in numeric data streams using various statistical methods.

Adapter Prerequisites

Node Configuration

For fields path, targetPath, targetValueWriteMode, and targetValueKind, see Overview.

transformations:
- type: StatisticalAnomalyDetection@1
path: $.Items[*] # Path to array of items to analyze
targetPath: $.anomalies # Path where anomaly results will be stored
resetStatistics: false # Reset statistics on each run (true = stateless, false = stateful)
detectors:
- path: $.Attributes.GrossTotal # JSONPath to the numeric value to monitor
groupByPath: $.Attributes.Issuer.Attributes.CompanyName # Optional: Group statistics by this path
contextPath: $.Attributes.DocumentNumber # Optional: Include context in anomaly results
method: PercentChange # Detection method: ZScore, Iqr, PercentChange, MovingAverage
threshold: 50.0 # Threshold for anomaly detection (interpretation depends on method)
minSamples: 2 # Minimum samples required before detection starts
maxSamples: 1000 # Maximum samples to keep in memory (0 = unlimited)
windowSize: 10 # Window size for moving average method

Detection Methods

Z-Score Method

The Z-Score method detects anomalies by measuring how many standard deviations a data point is from the mean of the distribution. It assumes data follows a normal distribution.

How it works:

  • Calculates the mean (μ) and standard deviation (σ) from collected samples
  • For each new value, computes Z-Score = |value - μ| / σ
  • Values exceeding the threshold are flagged as anomalies

Parameters:

  • threshold: Number of standard deviations (e.g., 3.0 for 3σ rule)
    • 2.0 = ~95% of normal data (5% outliers expected)
    • 3.0 = ~99.7% of normal data (0.3% outliers expected)
    • 4.0 = ~99.99% of normal data (very rare outliers only)
  • minSamples: Recommended minimum 30 for statistical validity

When to use:

  • Data follows normal/Gaussian distribution
  • Consistent, stable processes
  • Manufacturing quality control
  • Server response time monitoring

Further reading:

IQR (Interquartile Range) Method

The IQR method uses quartiles to detect outliers, making it robust against extreme values and suitable for non-normal distributions.

How it works:

  • Calculates Q1 (25th percentile) and Q3 (75th percentile)
  • Computes IQR = Q3 - Q1
  • Defines bounds: Lower = Q1 - (threshold × IQR), Upper = Q3 + (threshold × IQR)
  • Values outside these bounds are anomalies

Parameters:

  • threshold: IQR multiplier
    • 1.5 = Standard outlier detection (Tukey's method)
    • 3.0 = Extreme outlier detection
    • Custom values for domain-specific needs
  • minSamples: Recommended minimum 10-20 for stable quartiles

When to use:

  • Skewed or non-normal distributions
  • Data with natural outliers
  • Financial data analysis
  • Customer behavior metrics

Further reading:

Percent Change Method

Detects anomalies based on the percentage change from the previous value, ideal for detecting sudden jumps or drops in sequential data.

How it works:

  • Calculates: Change = |current_value - last_value| / |last_value| × 100
  • Flags as anomaly if change exceeds threshold percentage
  • Simple but effective for trend monitoring

Parameters:

  • threshold: Maximum allowed percentage change
    • 10.0 = 10% change triggers anomaly
    • 50.0 = 50% change triggers anomaly
    • 100.0 = Doubling/halving triggers anomaly
  • minSamples: Can work with just 1 previous sample

When to use:

  • Stock price monitoring
  • Sales volume tracking
  • Traffic pattern analysis
  • Resource utilization monitoring

Note: Not suitable for values that can be zero or near-zero (division issues).

Moving Average Method

Detects anomalies by comparing values against a moving average, effectively identifying deviations from recent trends.

How it works:

  • Maintains a sliding window of recent values
  • Calculates the mean of the window (moving average)
  • Computes deviation: |value - moving_avg| / moving_avg × 100
  • Flags anomaly if deviation exceeds threshold

Parameters:

  • threshold: Maximum percentage deviation from moving average
    • 10.0 = 10% deviation from average
    • 25.0 = 25% deviation from average
  • windowSize: Number of recent values for average calculation
    • Smaller (5-10): More responsive to changes
    • Larger (20-50): More stable, less sensitive to noise
  • minSamples: Must be at least equal to windowSize

When to use:

  • Trending data with seasonal patterns
  • Network traffic analysis
  • Temperature monitoring
  • Business metrics with weekly/daily cycles

Further reading:

Output Format

The node produces an array of anomaly results at the target path. Each anomaly object contains the following fields:

Output Fields

FieldTypeDescription
pathstringThe JSONPath that was monitored (from detector configuration)
valuenumberThe actual numeric value that triggered the anomaly
isAnomalybooleanAlways true in output (non-anomalies are not included)
scorenumberAnomaly severity score. Interpretation varies by method:
Z-Score: Number of standard deviations from mean
IQR: Distance from bounds divided by IQR
PercentChange: Actual percentage change
MovingAverage: Percentage deviation from average
methodstringDetection method used: "ZScore", "Iqr", "PercentChange", or "MovingAverage"
reasonstringHuman-readable explanation of why the anomaly was detected, includes calculated values and thresholds
contextanyAdditional context data from contextPath (if configured). Can be any JSON type depending on source data

Example Output

[
{
"path": "$.Attributes.GrossTotal",
"value": 2880,
"isAnomaly": true,
"score": 900.0,
"method": "PercentChange",
"reason": "Change: 900.00% (threshold: 50.0%)",
"context": "Document-5"
},
{
"path": "$.Attributes.Temperature",
"value": 45.2,
"isAnomaly": true,
"score": 3.5,
"method": "ZScore",
"reason": "Z-Score: 3.50 (threshold: 3.0)",
"context": "Sensor-A1"
},
{
"path": "$.Attributes.ResponseTime",
"value": 1250,
"isAnomaly": true,
"score": 2.1,
"method": "Iqr",
"reason": "Value outside IQR bounds [150.00, 850.00]",
"context": "Server-02"
},
{
"path": "$.Attributes.Traffic",
"value": 15000,
"isAnomaly": true,
"score": 35.5,
"method": "MovingAverage",
"reason": "Deviation from MA: 35.50% (threshold: 25.0%)",
"context": "2024-01-15T10:00:00"
}
]

Interpreting the Score

The score field interpretation depends on the detection method:

MethodScore InterpretationTypical Anomaly Thresholds
ZScoreStandard deviations from mean> 2.0 mild, > 3.0 strong, > 4.0 extreme
IqrMultiple of IQR distance from bounds> 0 outlier, > 1.0 strong outlier
PercentChangeActual percentage changeDepends on domain (e.g., > 50% for prices)
MovingAveragePercentage deviation from average> 20% mild, > 50% strong deviation

Output Behavior

  • Empty array: No anomalies detected in the current batch
  • Stateful mode (resetStatistics: false): Statistics accumulate across runs, improving accuracy over time
  • Stateless mode (resetStatistics: true): Each run starts fresh, suitable for independent batches
  • Grouping: When using groupByPath, separate statistics are maintained for each group

Examples

Example 1: Detect Invoice Amount Anomalies by Issuer

transformations:
- type: StatisticalAnomalyDetection@1
path: $.documents.Items[*]
targetPath: $.invoiceAnomalies
resetStatistics: false
detectors:
- path: $.Attributes.GrossTotal
groupByPath: $.Attributes.Issuer.Attributes.CompanyName
contextPath: $.Attributes.DocumentNumber
method: PercentChange
threshold: 50.0
minSamples: 2

Example 2: Multiple Detector Configuration

transformations:
- type: StatisticalAnomalyDetection@1
path: $.measurements[*]
targetPath: $.detectedAnomalies
detectors:
# Detect sudden temperature spikes
- path: $.temperature
method: ZScore
threshold: 3.0
minSamples: 10
# Detect pressure changes
- path: $.pressure
method: PercentChange
threshold: 20.0
minSamples: 1
# Detect flow rate deviations from moving average
- path: $.flowRate
method: MovingAverage
threshold: 15.0
windowSize: 20
minSamples: 20

Example 3: Stateless Anomaly Detection

transformations:
- type: StatisticalAnomalyDetection@1
path: $.sensorData[*]
targetPath: $.alerts
resetStatistics: true # Each run starts with fresh statistics
detectors:
- path: $.value
method: Iqr
threshold: 1.5
minSamples: 5
maxSamples: 100 # Keep only last 100 samples in memory

Example 4: Financial Monitoring with IQR

transformations:
- type: StatisticalAnomalyDetection@1
path: $.transactions[*]
targetPath: $.suspiciousTransactions
resetStatistics: false
detectors:
- path: $.amount
groupByPath: $.accountType # Separate statistics per account type
contextPath: $.transactionId
method: Iqr
threshold: 3.0 # Detect extreme outliers only
minSamples: 20
maxSamples: 500 # Sliding window of 500 transactions

Notes

  • Stateful vs Stateless: When resetStatistics is false, the node maintains statistics across pipeline runs, allowing it to learn from historical data. When true, each run starts fresh.
  • GroupBy: The groupByPath parameter allows separate statistical tracking for different groups (e.g., per customer, per sensor).
  • Memory Management: Use maxSamples to limit memory usage for long-running pipelines. Older samples are removed in FIFO order.
  • Context Data: The contextPath parameter includes additional information with anomaly results for better debugging and analysis.
  • Method Selection Guide:
    • Use Z-Score for normally distributed data with stable patterns
    • Use IQR for skewed data or when robust outlier detection is needed
    • Use PercentChange for detecting sudden changes in sequential data
    • Use MovingAverage for trending data with expected variations
  • Minimum Samples: Each method requires different minimum samples for reliable detection. Z-Score needs more samples (30+) for statistical validity, while PercentChange can work with just 1 previous value.