MachineLearningAnomalyDetection@1
Node MachineLearningAnomalyDetection@1
uses ML.NET algorithms to detect spikes and change points in time series data using machine learning models.
Adapter Prerequisites
Node Configuration
For fields path
, targetPath
, targetValueWriteMode
, and targetValueKind
, see Overview.
transformations:
- type: MachineLearningAnomalyDetection@1
path: $.Items[*] # Path to array of items to analyze
targetPath: $.anomalies # Path where anomaly results will be stored
resetStatistics: false # Reset time series data on each run (true = stateless, false = stateful)
detectors:
- path: $.Attributes.Value # JSONPath to the numeric value in time series
groupByPath: $.Attributes.SensorId # Optional: Group time series by this path
contextPath: $.Attributes.Timestamp # Optional: Include context in anomaly results
detectSpikes: true # Enable spike detection
detectChangePoints: true # Enable change point detection
minDataPoints: 10 # Minimum data points required before detection starts
maxDataPoints: 1000 # Maximum data points to keep in memory (0 = unlimited)
spikeConfidence: 95 # Confidence level for spike detection (0-100, e.g. 95 = 95%)
changePointConfidence: 95 # Confidence level for change point detection (0-100, e.g.95 = 95%)
pValueHistoryLength: 100 # History length for spike p-value calculation
changeHistoryLength: 10 # History length for change point detection
Detection Types
Spike Detection (IID Spike)
Spike detection identifies sudden, temporary anomalies in time series data that deviate significantly from the expected pattern. The algorithm uses the IID (Independent and Identically Distributed) assumption with adaptive thresholds based on statistical properties of the data.
How it works:
- The algorithm maintains a sliding window of historical p-values (probability values) that indicate how likely each data point is given the historical distribution
- When a new value arrives, it calculates its p-value based on the learned distribution
- If the p-value falls below a threshold (derived from the confidence level), it's marked as a spike
- Uses martingale scores for improved sequential anomaly detection (Exchangeability Martingales)
Parameters:
- spikeConfidence (0-100): Statistical confidence level for detection as decimal. 95 means 95% confidence that the point is anomalous (5% false positive rate)
- pValueHistoryLength: Size of the sliding window for p-value calculation. Larger values provide more stable detection but slower adaptation to pattern changes
Output values:
- score: The anomaly score (higher = more anomalous)
- pValue: The probability of observing this value (lower = more unlikely/anomalous)
When to use:
- Fraud detection in financial transactions
- Network intrusion detection
- Sensor malfunction identification
- Quality control in manufacturing
Further reading:
Change Point Detection (IID Change Point)
Change point detection identifies persistent shifts in the statistical properties of time series data, indicating that the underlying data generation process has fundamentally changed.
How it works:
- The algorithm monitors the distribution parameters (mean, variance) of the time series
- Uses a sliding window approach to compare recent data distribution with historical patterns
- Employs martingale-based methods and cumulative sum (CUSUM) techniques to detect distribution changes
- When the martingale value exceeds a threshold, it signals a change point
- Unlike spikes, change points indicate lasting changes rather than temporary deviations
Parameters:
- changePointConfidence (0-100): Statistical confidence level as decimal. 95 means 95% confidence that a significant change occurred
- changeHistoryLength: Number of points used to establish the baseline distribution for comparison
Output values:
- score: The magnitude of the change detected
- pValue: Statistical significance of the change (lower = more significant)
- martingaleValue: Cumulative evidence for change (higher = stronger evidence)
When to use:
- Detecting seasonal transitions
- Identifying system configuration changes
- Market regime shifts in trading
- Process drift in industrial monitoring
- Customer behavior pattern changes
Mathematical foundation: The algorithm uses the Power Martingale method and CUSUM (Cumulative Sum Control Chart) for sequential change detection.
Further reading:
Output Format
The node produces an array of detected anomalies at the target path. Each anomaly object contains the following fields:
Common Fields (all anomaly types)
Field | Type | Description |
---|---|---|
type | string | Anomaly type: "spike" or "changePoint" |
confidence | number | The confidence level used for detection (0-100). Copied from the detector configuration (e.g. 95 for 95% confidence) |
level | number | Detection level indicator. Value > 0 indicates an anomaly was detected. For spikes and change points, this is typically 1.0 when detected, 0.0 when not |
score | number | Anomaly severity score. Higher values indicate stronger anomalies. For spikes: typically 0-10+, for change points: typically 0-5+ |
pValue | number | Statistical p-value (0-1). The probability of observing this data point given the historical distribution. Lower values indicate higher anomaly confidence. See detailed interpretation table below |
timestamp | ISO 8601 | UTC timestamp when the anomaly was detected (not the data point's timestamp) |
seriesKey | string | Group identifier from groupByPath . Empty string if no grouping is used. Allows tracking which series the anomaly belongs to |
currentValue | number | The actual numeric value that triggered the anomaly detection |
context | any | Additional context data from contextPath (if configured). Can be string, number, or object depending on the source data |
Change Point Specific Fields
Field | Type | Description |
---|---|---|
martingaleValue | number | Cumulative evidence score for change point detection. Higher values (typically > 10) indicate stronger evidence of a persistent change. This value accumulates over time until a change is detected |
Example Output
[
{
"type": "spike",
"confidence": 95,
"level": 1.0,
"score": 3.2,
"pValue": 0.001,
"timestamp": "2024-01-15T10:30:00Z",
"seriesKey": "Sensor-A1",
"currentValue": 145.7,
"context": "2024-01-15T10:30:00"
},
{
"type": "changePoint",
"confidence": 95,
"level": 1.0,
"score": 2.8,
"pValue": 0.002,
"martingaleValue": 12.5,
"timestamp": "2024-01-15T11:00:00Z",
"seriesKey": "Sensor-A1",
"currentValue": 98.3,
"context": "2024-01-15T11:00:00"
}
]
Interpreting the Output
- Empty array: No anomalies detected in the current batch
- Multiple entries: Each entry represents a distinct anomaly. A single data point might trigger both spike and change point detection
- seriesKey grouping: When using
groupByPath
, anomalies from different groups will have differentseriesKey
values
Understanding p-Values
The p-value represents the probability of observing this data point under normal conditions. Lower p-values indicate stronger anomalies:
p-Value Range | Scientific Notation | Interpretation | Anomaly Strength | Standard Deviations | Action Required |
---|---|---|---|---|---|
> 0.1 | > 1E-01 | Normal variation | None | < 1.6σ | No action |
0.05 - 0.1 | 5E-02 to 1E-01 | Borderline | Weak | ~1.6-2σ | Monitor |
0.01 - 0.05 | 1E-02 to 5E-02 | Significant | Moderate | ~2-2.6σ | Investigate |
0.001 - 0.01 | 1E-03 to 1E-02 | Highly significant | Strong | ~2.6-3.3σ | Alert required |
0.0001 - 0.001 | 1E-04 to 1E-03 | Very highly significant | Very strong | ~3.3-3.9σ | Immediate attention |
0.00001 - 0.0001 | 1E-05 to 1E-04 | Extremely significant | Extreme | ~3.9-4.4σ | Critical issue |
< 0.00001 | < 1E-05 | Exceptionally rare | Exceptional | > 4.4σ | Emergency response |
< 1E-08 | < 1E-08 | Almost impossible | Maximum | > 5.5σ | System failure likely |
Real-world examples of p-values:
- p = 0.05: Normal daily fluctuation in website traffic
- p = 0.001: Unusual spike in server response time
- p = 1E-05: Major sensor malfunction or data corruption
- p = 1E-08: Critical system failure, hardware defect, or extreme external event
Relationship with confidence level:
- If
spikeConfidence = 95
, the threshold is p < 0.05 - If
spikeConfidence = 99
, the threshold is p < 0.01 - A p-value of 1E-08 is 160,000x smaller than a 95% confidence threshold
Score Interpretation
- Spike scores:
- 1-2: Mild deviation
- 2-3: Moderate anomaly
- 3-5: Strong anomaly
-
5: Severe anomaly
- Change point scores:
- 1-2: Minor shift
- 2-3: Significant change
-
3: Major trend change
- Martingale values (change points only):
- 5-10: Evidence accumulating
- 10-20: Strong evidence
-
20: Definitive change
## Examples
### Example 1: IoT Sensor Anomaly Detection
```yaml
transformations:
- type: MachineLearningAnomalyDetection@1
path: $.sensorReadings[*]
targetPath: $.detectedAnomalies
resetStatistics: false
detectors:
- path: $.temperature
groupByPath: $.sensorId
contextPath: $.timestamp
detectSpikes: true
detectChangePoints: true
minDataPoints: 20
maxDataPoints: 500
spikeConfidence: 95
changePointConfidence: 0.90
pValueHistoryLength: 100
changeHistoryLength: 20
Example 2: Financial Transaction Monitoring
transformations:
- type: MachineLearningAnomalyDetection@1
path: $.transactions[*]
targetPath: $.suspiciousActivities
detectors:
- path: $.amount
groupByPath: $.accountId
contextPath: $.transactionId
detectSpikes: true
detectChangePoints: false # Only detect spikes for fraud detection
minDataPoints: 50
spikeConfidence: 0.99 # High confidence to reduce false positives
pValueHistoryLength: 200
Example 3: Multi-Metric Monitoring
transformations:
- type: MachineLearningAnomalyDetection@1
path: $.metrics[*]
targetPath: $.alerts
resetStatistics: false
detectors:
# CPU usage spike detection
- path: $.cpuUsage
groupByPath: $.serverId
detectSpikes: true
detectChangePoints: false
minDataPoints: 10
spikeConfidence: 0.90
pValueHistoryLength: 50
# Memory trend changes
- path: $.memoryUsage
groupByPath: $.serverId
detectSpikes: false
detectChangePoints: true
minDataPoints: 30
changePointConfidence: 95
changeHistoryLength: 15
# Network traffic anomalies (both types)
- path: $.networkThroughput
groupByPath: $.serverId
detectSpikes: true
detectChangePoints: true
minDataPoints: 20
maxDataPoints: 1000
Example 4: Stateless Detection for Batch Processing
transformations:
- type: MachineLearningAnomalyDetection@1
path: $.batchData[*]
targetPath: $.batchAnomalies
resetStatistics: true # Each batch starts fresh
detectors:
- path: $.value
detectSpikes: true
detectChangePoints: true
minDataPoints: 10
maxDataPoints: 100 # Limit memory for batch processing
spikeConfidence: 95
changePointConfidence: 95
Notes
- Machine Learning Models: Uses ML.NET's SSA (Singular Spectrum Analysis) based algorithms for robust anomaly detection
- Minimum Data Requirements: The algorithms require at least
minDataPoints
observations before producing reliable results - Stateful Processing: When
resetStatistics
isfalse
, the node maintains time series history across pipeline runs, improving detection accuracy over time - GroupBy Functionality: The
groupByPath
parameter enables independent time series tracking for different entities (e.g., per sensor, per user) - Memory Management: Use
maxDataPoints
to implement a sliding window and limit memory usage in long-running pipelines - Confidence vs Sensitivity: Higher confidence values reduce false positives but may miss subtle anomalies; adjust based on your use case
- Real-time vs Batch: The node works for both real-time streaming (stateful) and batch processing (stateless) scenarios
- IID Assumption: Both algorithms assume data points are Independent and Identically Distributed within their respective windows, which works well for most stationary time series but may need parameter tuning for highly seasonal or trending data