Skip to main content

MachineLearningAnomalyDetection@1

Node MachineLearningAnomalyDetection@1 uses ML.NET algorithms to detect spikes and change points in time series data using machine learning models.

Adapter Prerequisites

Node Configuration

For fields path, targetPath, targetValueWriteMode, and targetValueKind, see Overview.

transformations:
- type: MachineLearningAnomalyDetection@1
path: $.Items[*] # Path to array of items to analyze
targetPath: $.anomalies # Path where anomaly results will be stored
resetStatistics: false # Reset time series data on each run (true = stateless, false = stateful)
detectors:
- path: $.Attributes.Value # JSONPath to the numeric value in time series
groupByPath: $.Attributes.SensorId # Optional: Group time series by this path
contextPath: $.Attributes.Timestamp # Optional: Include context in anomaly results
detectSpikes: true # Enable spike detection
detectChangePoints: true # Enable change point detection
minDataPoints: 10 # Minimum data points required before detection starts
maxDataPoints: 1000 # Maximum data points to keep in memory (0 = unlimited)
spikeConfidence: 95 # Confidence level for spike detection (0-100, e.g. 95 = 95%)
changePointConfidence: 95 # Confidence level for change point detection (0-100, e.g.95 = 95%)
pValueHistoryLength: 100 # History length for spike p-value calculation
changeHistoryLength: 10 # History length for change point detection

Detection Types

Spike Detection (IID Spike)

Spike detection identifies sudden, temporary anomalies in time series data that deviate significantly from the expected pattern. The algorithm uses the IID (Independent and Identically Distributed) assumption with adaptive thresholds based on statistical properties of the data.

How it works:

  • The algorithm maintains a sliding window of historical p-values (probability values) that indicate how likely each data point is given the historical distribution
  • When a new value arrives, it calculates its p-value based on the learned distribution
  • If the p-value falls below a threshold (derived from the confidence level), it's marked as a spike
  • Uses martingale scores for improved sequential anomaly detection (Exchangeability Martingales)

Parameters:

  • spikeConfidence (0-100): Statistical confidence level for detection as decimal. 95 means 95% confidence that the point is anomalous (5% false positive rate)
  • pValueHistoryLength: Size of the sliding window for p-value calculation. Larger values provide more stable detection but slower adaptation to pattern changes

Output values:

  • score: The anomaly score (higher = more anomalous)
  • pValue: The probability of observing this value (lower = more unlikely/anomalous)

When to use:

  • Fraud detection in financial transactions
  • Network intrusion detection
  • Sensor malfunction identification
  • Quality control in manufacturing

Further reading:

Change Point Detection (IID Change Point)

Change point detection identifies persistent shifts in the statistical properties of time series data, indicating that the underlying data generation process has fundamentally changed.

How it works:

  • The algorithm monitors the distribution parameters (mean, variance) of the time series
  • Uses a sliding window approach to compare recent data distribution with historical patterns
  • Employs martingale-based methods and cumulative sum (CUSUM) techniques to detect distribution changes
  • When the martingale value exceeds a threshold, it signals a change point
  • Unlike spikes, change points indicate lasting changes rather than temporary deviations

Parameters:

  • changePointConfidence (0-100): Statistical confidence level as decimal. 95 means 95% confidence that a significant change occurred
  • changeHistoryLength: Number of points used to establish the baseline distribution for comparison

Output values:

  • score: The magnitude of the change detected
  • pValue: Statistical significance of the change (lower = more significant)
  • martingaleValue: Cumulative evidence for change (higher = stronger evidence)

When to use:

  • Detecting seasonal transitions
  • Identifying system configuration changes
  • Market regime shifts in trading
  • Process drift in industrial monitoring
  • Customer behavior pattern changes

Mathematical foundation: The algorithm uses the Power Martingale method and CUSUM (Cumulative Sum Control Chart) for sequential change detection.

Further reading:

Output Format

The node produces an array of detected anomalies at the target path. Each anomaly object contains the following fields:

Common Fields (all anomaly types)

FieldTypeDescription
typestringAnomaly type: "spike" or "changePoint"
confidencenumberThe confidence level used for detection (0-100). Copied from the detector configuration (e.g. 95 for 95% confidence)
levelnumberDetection level indicator. Value > 0 indicates an anomaly was detected. For spikes and change points, this is typically 1.0 when detected, 0.0 when not
scorenumberAnomaly severity score. Higher values indicate stronger anomalies. For spikes: typically 0-10+, for change points: typically 0-5+
pValuenumberStatistical p-value (0-1). The probability of observing this data point given the historical distribution. Lower values indicate higher anomaly confidence. See detailed interpretation table below
timestampISO 8601UTC timestamp when the anomaly was detected (not the data point's timestamp)
seriesKeystringGroup identifier from groupByPath. Empty string if no grouping is used. Allows tracking which series the anomaly belongs to
currentValuenumberThe actual numeric value that triggered the anomaly detection
contextanyAdditional context data from contextPath (if configured). Can be string, number, or object depending on the source data

Change Point Specific Fields

FieldTypeDescription
martingaleValuenumberCumulative evidence score for change point detection. Higher values (typically > 10) indicate stronger evidence of a persistent change. This value accumulates over time until a change is detected

Example Output

[
{
"type": "spike",
"confidence": 95,
"level": 1.0,
"score": 3.2,
"pValue": 0.001,
"timestamp": "2024-01-15T10:30:00Z",
"seriesKey": "Sensor-A1",
"currentValue": 145.7,
"context": "2024-01-15T10:30:00"
},
{
"type": "changePoint",
"confidence": 95,
"level": 1.0,
"score": 2.8,
"pValue": 0.002,
"martingaleValue": 12.5,
"timestamp": "2024-01-15T11:00:00Z",
"seriesKey": "Sensor-A1",
"currentValue": 98.3,
"context": "2024-01-15T11:00:00"
}
]

Interpreting the Output

  • Empty array: No anomalies detected in the current batch
  • Multiple entries: Each entry represents a distinct anomaly. A single data point might trigger both spike and change point detection
  • seriesKey grouping: When using groupByPath, anomalies from different groups will have different seriesKey values

Understanding p-Values

The p-value represents the probability of observing this data point under normal conditions. Lower p-values indicate stronger anomalies:

p-Value RangeScientific NotationInterpretationAnomaly StrengthStandard DeviationsAction Required
> 0.1> 1E-01Normal variationNone< 1.6σNo action
0.05 - 0.15E-02 to 1E-01BorderlineWeak~1.6-2σMonitor
0.01 - 0.051E-02 to 5E-02SignificantModerate~2-2.6σInvestigate
0.001 - 0.011E-03 to 1E-02Highly significantStrong~2.6-3.3σAlert required
0.0001 - 0.0011E-04 to 1E-03Very highly significantVery strong~3.3-3.9σImmediate attention
0.00001 - 0.00011E-05 to 1E-04Extremely significantExtreme~3.9-4.4σCritical issue
< 0.00001< 1E-05Exceptionally rareExceptional> 4.4σEmergency response
< 1E-08< 1E-08Almost impossibleMaximum> 5.5σSystem failure likely

Real-world examples of p-values:

  • p = 0.05: Normal daily fluctuation in website traffic
  • p = 0.001: Unusual spike in server response time
  • p = 1E-05: Major sensor malfunction or data corruption
  • p = 1E-08: Critical system failure, hardware defect, or extreme external event

Relationship with confidence level:

  • If spikeConfidence = 95, the threshold is p < 0.05
  • If spikeConfidence = 99, the threshold is p < 0.01
  • A p-value of 1E-08 is 160,000x smaller than a 95% confidence threshold

Score Interpretation

  • Spike scores:
    • 1-2: Mild deviation
    • 2-3: Moderate anomaly
    • 3-5: Strong anomaly
    • 5: Severe anomaly

  • Change point scores:
    • 1-2: Minor shift
    • 2-3: Significant change
    • 3: Major trend change

  • Martingale values (change points only):
    • 5-10: Evidence accumulating
    • 10-20: Strong evidence
    • 20: Definitive change


## Examples

### Example 1: IoT Sensor Anomaly Detection

```yaml
transformations:
- type: MachineLearningAnomalyDetection@1
path: $.sensorReadings[*]
targetPath: $.detectedAnomalies
resetStatistics: false
detectors:
- path: $.temperature
groupByPath: $.sensorId
contextPath: $.timestamp
detectSpikes: true
detectChangePoints: true
minDataPoints: 20
maxDataPoints: 500
spikeConfidence: 95
changePointConfidence: 0.90
pValueHistoryLength: 100
changeHistoryLength: 20

Example 2: Financial Transaction Monitoring

transformations:
- type: MachineLearningAnomalyDetection@1
path: $.transactions[*]
targetPath: $.suspiciousActivities
detectors:
- path: $.amount
groupByPath: $.accountId
contextPath: $.transactionId
detectSpikes: true
detectChangePoints: false # Only detect spikes for fraud detection
minDataPoints: 50
spikeConfidence: 0.99 # High confidence to reduce false positives
pValueHistoryLength: 200

Example 3: Multi-Metric Monitoring

transformations:
- type: MachineLearningAnomalyDetection@1
path: $.metrics[*]
targetPath: $.alerts
resetStatistics: false
detectors:
# CPU usage spike detection
- path: $.cpuUsage
groupByPath: $.serverId
detectSpikes: true
detectChangePoints: false
minDataPoints: 10
spikeConfidence: 0.90
pValueHistoryLength: 50
# Memory trend changes
- path: $.memoryUsage
groupByPath: $.serverId
detectSpikes: false
detectChangePoints: true
minDataPoints: 30
changePointConfidence: 95
changeHistoryLength: 15
# Network traffic anomalies (both types)
- path: $.networkThroughput
groupByPath: $.serverId
detectSpikes: true
detectChangePoints: true
minDataPoints: 20
maxDataPoints: 1000

Example 4: Stateless Detection for Batch Processing

transformations:
- type: MachineLearningAnomalyDetection@1
path: $.batchData[*]
targetPath: $.batchAnomalies
resetStatistics: true # Each batch starts fresh
detectors:
- path: $.value
detectSpikes: true
detectChangePoints: true
minDataPoints: 10
maxDataPoints: 100 # Limit memory for batch processing
spikeConfidence: 95
changePointConfidence: 95

Notes

  • Machine Learning Models: Uses ML.NET's SSA (Singular Spectrum Analysis) based algorithms for robust anomaly detection
  • Minimum Data Requirements: The algorithms require at least minDataPoints observations before producing reliable results
  • Stateful Processing: When resetStatistics is false, the node maintains time series history across pipeline runs, improving detection accuracy over time
  • GroupBy Functionality: The groupByPath parameter enables independent time series tracking for different entities (e.g., per sensor, per user)
  • Memory Management: Use maxDataPoints to implement a sliding window and limit memory usage in long-running pipelines
  • Confidence vs Sensitivity: Higher confidence values reduce false positives but may miss subtle anomalies; adjust based on your use case
  • Real-time vs Batch: The node works for both real-time streaming (stateful) and batch processing (stateless) scenarios
  • IID Assumption: Both algorithms assume data points are Independent and Identically Distributed within their respective windows, which works well for most stationary time series but may need parameter tuning for highly seasonal or trending data