StatisticalAnomalyDetection@1
Node StatisticalAnomalyDetection@1
is used to detect anomalies in numeric data streams using various statistical methods.
Adapter Prerequisites
Node Configuration
For fields path
, targetPath
, targetValueWriteMode
, and targetValueKind
, see Overview.
transformations:
- type: StatisticalAnomalyDetection@1
path: $.Items[*] # Path to array of items to analyze
targetPath: $.anomalies # Path where anomaly results will be stored
resetStatistics: false # Reset statistics on each run (true = stateless, false = stateful)
detectors:
- path: $.Attributes.GrossTotal # JSONPath to the numeric value to monitor
groupByPath: $.Attributes.Issuer.Attributes.CompanyName # Optional: Group statistics by this path
contextPath: $.Attributes.DocumentNumber # Optional: Include context in anomaly results
method: PercentChange # Detection method: ZScore, Iqr, PercentChange, MovingAverage
threshold: 50.0 # Threshold for anomaly detection (interpretation depends on method)
minSamples: 2 # Minimum samples required before detection starts
maxSamples: 1000 # Maximum samples to keep in memory (0 = unlimited)
windowSize: 10 # Window size for moving average method
Detection Methods
Z-Score Method
The Z-Score method detects anomalies by measuring how many standard deviations a data point is from the mean of the distribution. It assumes data follows a normal distribution.
How it works:
- Calculates the mean (μ) and standard deviation (σ) from collected samples
- For each new value, computes Z-Score = |value - μ| / σ
- Values exceeding the threshold are flagged as anomalies
Parameters:
- threshold: Number of standard deviations (e.g., 3.0 for 3σ rule)
- 2.0 = ~95% of normal data (5% outliers expected)
- 3.0 = ~99.7% of normal data (0.3% outliers expected)
- 4.0 = ~99.99% of normal data (very rare outliers only)
- minSamples: Recommended minimum 30 for statistical validity
When to use:
- Data follows normal/Gaussian distribution
- Consistent, stable processes
- Manufacturing quality control
- Server response time monitoring
Further reading:
IQR (Interquartile Range) Method
The IQR method uses quartiles to detect outliers, making it robust against extreme values and suitable for non-normal distributions.
How it works:
- Calculates Q1 (25th percentile) and Q3 (75th percentile)
- Computes IQR = Q3 - Q1
- Defines bounds: Lower = Q1 - (threshold × IQR), Upper = Q3 + (threshold × IQR)
- Values outside these bounds are anomalies
Parameters:
- threshold: IQR multiplier
- 1.5 = Standard outlier detection (Tukey's method)
- 3.0 = Extreme outlier detection
- Custom values for domain-specific needs
- minSamples: Recommended minimum 10-20 for stable quartiles
When to use:
- Skewed or non-normal distributions
- Data with natural outliers
- Financial data analysis
- Customer behavior metrics
Further reading:
Percent Change Method
Detects anomalies based on the percentage change from the previous value, ideal for detecting sudden jumps or drops in sequential data.
How it works:
- Calculates: Change = |current_value - last_value| / |last_value| × 100
- Flags as anomaly if change exceeds threshold percentage
- Simple but effective for trend monitoring
Parameters:
- threshold: Maximum allowed percentage change
- 10.0 = 10% change triggers anomaly
- 50.0 = 50% change triggers anomaly
- 100.0 = Doubling/halving triggers anomaly
- minSamples: Can work with just 1 previous sample
When to use:
- Stock price monitoring
- Sales volume tracking
- Traffic pattern analysis
- Resource utilization monitoring
Note: Not suitable for values that can be zero or near-zero (division issues).
Moving Average Method
Detects anomalies by comparing values against a moving average, effectively identifying deviations from recent trends.
How it works:
- Maintains a sliding window of recent values
- Calculates the mean of the window (moving average)
- Computes deviation: |value - moving_avg| / moving_avg × 100
- Flags anomaly if deviation exceeds threshold
Parameters:
- threshold: Maximum percentage deviation from moving average
- 10.0 = 10% deviation from average
- 25.0 = 25% deviation from average
- windowSize: Number of recent values for average calculation
- Smaller (5-10): More responsive to changes
- Larger (20-50): More stable, less sensitive to noise
- minSamples: Must be at least equal to windowSize
When to use:
- Trending data with seasonal patterns
- Network traffic analysis
- Temperature monitoring
- Business metrics with weekly/daily cycles
Further reading:
Output Format
The node produces an array of anomaly results at the target path. Each anomaly object contains the following fields:
Output Fields
Field | Type | Description |
---|---|---|
path | string | The JSONPath that was monitored (from detector configuration) |
value | number | The actual numeric value that triggered the anomaly |
isAnomaly | boolean | Always true in output (non-anomalies are not included) |
score | number | Anomaly severity score. Interpretation varies by method: • Z-Score: Number of standard deviations from mean • IQR: Distance from bounds divided by IQR • PercentChange: Actual percentage change • MovingAverage: Percentage deviation from average |
method | string | Detection method used: "ZScore" , "Iqr" , "PercentChange" , or "MovingAverage" |
reason | string | Human-readable explanation of why the anomaly was detected, includes calculated values and thresholds |
context | any | Additional context data from contextPath (if configured). Can be any JSON type depending on source data |
Example Output
[
{
"path": "$.Attributes.GrossTotal",
"value": 2880,
"isAnomaly": true,
"score": 900.0,
"method": "PercentChange",
"reason": "Change: 900.00% (threshold: 50.0%)",
"context": "Document-5"
},
{
"path": "$.Attributes.Temperature",
"value": 45.2,
"isAnomaly": true,
"score": 3.5,
"method": "ZScore",
"reason": "Z-Score: 3.50 (threshold: 3.0)",
"context": "Sensor-A1"
},
{
"path": "$.Attributes.ResponseTime",
"value": 1250,
"isAnomaly": true,
"score": 2.1,
"method": "Iqr",
"reason": "Value outside IQR bounds [150.00, 850.00]",
"context": "Server-02"
},
{
"path": "$.Attributes.Traffic",
"value": 15000,
"isAnomaly": true,
"score": 35.5,
"method": "MovingAverage",
"reason": "Deviation from MA: 35.50% (threshold: 25.0%)",
"context": "2024-01-15T10:00:00"
}
]
Interpreting the Score
The score
field interpretation depends on the detection method:
Method | Score Interpretation | Typical Anomaly Thresholds |
---|---|---|
ZScore | Standard deviations from mean | > 2.0 mild, > 3.0 strong, > 4.0 extreme |
Iqr | Multiple of IQR distance from bounds | > 0 outlier, > 1.0 strong outlier |
PercentChange | Actual percentage change | Depends on domain (e.g., > 50% for prices) |
MovingAverage | Percentage deviation from average | > 20% mild, > 50% strong deviation |
Output Behavior
- Empty array: No anomalies detected in the current batch
- Stateful mode (
resetStatistics: false
): Statistics accumulate across runs, improving accuracy over time - Stateless mode (
resetStatistics: true
): Each run starts fresh, suitable for independent batches - Grouping: When using
groupByPath
, separate statistics are maintained for each group
Examples
Example 1: Detect Invoice Amount Anomalies by Issuer
transformations:
- type: StatisticalAnomalyDetection@1
path: $.documents.Items[*]
targetPath: $.invoiceAnomalies
resetStatistics: false
detectors:
- path: $.Attributes.GrossTotal
groupByPath: $.Attributes.Issuer.Attributes.CompanyName
contextPath: $.Attributes.DocumentNumber
method: PercentChange
threshold: 50.0
minSamples: 2
Example 2: Multiple Detector Configuration
transformations:
- type: StatisticalAnomalyDetection@1
path: $.measurements[*]
targetPath: $.detectedAnomalies
detectors:
# Detect sudden temperature spikes
- path: $.temperature
method: ZScore
threshold: 3.0
minSamples: 10
# Detect pressure changes
- path: $.pressure
method: PercentChange
threshold: 20.0
minSamples: 1
# Detect flow rate deviations from moving average
- path: $.flowRate
method: MovingAverage
threshold: 15.0
windowSize: 20
minSamples: 20
Example 3: Stateless Anomaly Detection
transformations:
- type: StatisticalAnomalyDetection@1
path: $.sensorData[*]
targetPath: $.alerts
resetStatistics: true # Each run starts with fresh statistics
detectors:
- path: $.value
method: Iqr
threshold: 1.5
minSamples: 5
maxSamples: 100 # Keep only last 100 samples in memory
Example 4: Financial Monitoring with IQR
transformations:
- type: StatisticalAnomalyDetection@1
path: $.transactions[*]
targetPath: $.suspiciousTransactions
resetStatistics: false
detectors:
- path: $.amount
groupByPath: $.accountType # Separate statistics per account type
contextPath: $.transactionId
method: Iqr
threshold: 3.0 # Detect extreme outliers only
minSamples: 20
maxSamples: 500 # Sliding window of 500 transactions
Notes
- Stateful vs Stateless: When
resetStatistics
isfalse
, the node maintains statistics across pipeline runs, allowing it to learn from historical data. Whentrue
, each run starts fresh. - GroupBy: The
groupByPath
parameter allows separate statistical tracking for different groups (e.g., per customer, per sensor). - Memory Management: Use
maxSamples
to limit memory usage for long-running pipelines. Older samples are removed in FIFO order. - Context Data: The
contextPath
parameter includes additional information with anomaly results for better debugging and analysis. - Method Selection Guide:
- Use Z-Score for normally distributed data with stable patterns
- Use IQR for skewed data or when robust outlier detection is needed
- Use PercentChange for detecting sudden changes in sequential data
- Use MovingAverage for trending data with expected variations
- Minimum Samples: Each method requires different minimum samples for reliable detection. Z-Score needs more samples (30+) for statistical validity, while PercentChange can work with just 1 previous value.