MachineLearningAnomalyDetection@1

Node MachineLearningAnomalyDetection@1 uses ML.NET algorithms to detect spikes and change points in time series data using machine learning models.

Adapter Prerequisites

Mesh Adapter

Node Configuration

For fields path, targetPath, targetValueWriteMode, and targetValueKind, see Overview.

transformations:
- type: MachineLearningAnomalyDetection@1
  path: $.Items[*]  # Path to array of items to analyze
  targetPath: $.anomalies  # Path where anomaly results will be stored
  resetStatistics: false  # Reset time series data on each run (true = stateless, false = stateful)
  detectors:
    - path: $.Attributes.Value  # JSONPath to the numeric value in time series
      groupByPath: $.Attributes.SensorId  # Optional: Group time series by this path
      contextPath: $.Attributes.Timestamp  # Optional: Include context in anomaly results
      detectSpikes: true  # Enable spike detection
      detectChangePoints: true  # Enable change point detection
      minDataPoints: 10  # Minimum data points required before detection starts
      maxDataPoints: 1000  # Maximum data points to keep in memory (0 = unlimited)
      spikeConfidence: 95  # Confidence level for spike detection (0-100, e.g. 95 = 95%)
      changePointConfidence: 95  # Confidence level for change point detection (0-100, e.g.95 = 95%)
      pValueHistoryLength: 100  # History length for spike p-value calculation
      changeHistoryLength: 10  # History length for change point detection

Detection Types

Spike Detection (IID Spike)

Spike detection identifies sudden, temporary anomalies in time series data that deviate significantly from the expected pattern. The algorithm uses the IID (Independent and Identically Distributed) assumption with adaptive thresholds based on statistical properties of the data.

How it works:

The algorithm maintains a sliding window of historical p-values (probability values) that indicate how likely each data point is given the historical distribution
When a new value arrives, it calculates its p-value based on the learned distribution
If the p-value falls below a threshold (derived from the confidence level), it's marked as a spike
Uses martingale scores for improved sequential anomaly detection (Exchangeability Martingales)

Parameters:

spikeConfidence (0-100): Statistical confidence level for detection as decimal. 95 means 95% confidence that the point is anomalous (5% false positive rate)
pValueHistoryLength: Size of the sliding window for p-value calculation. Larger values provide more stable detection but slower adaptation to pattern changes

Output values:

score: The anomaly score (higher = more anomalous)
pValue: The probability of observing this value (lower = more unlikely/anomalous)

When to use:

Fraud detection in financial transactions
Network intrusion detection
Sensor malfunction identification
Quality control in manufacturing

Further reading:

Change Point Detection (IID Change Point)

Change point detection identifies persistent shifts in the statistical properties of time series data, indicating that the underlying data generation process has fundamentally changed.

How it works:

The algorithm monitors the distribution parameters (mean, variance) of the time series
Uses a sliding window approach to compare recent data distribution with historical patterns
Employs martingale-based methods and cumulative sum (CUSUM) techniques to detect distribution changes
When the martingale value exceeds a threshold, it signals a change point
Unlike spikes, change points indicate lasting changes rather than temporary deviations

Parameters:

changePointConfidence (0-100): Statistical confidence level as decimal. 95 means 95% confidence that a significant change occurred
changeHistoryLength: Number of points used to establish the baseline distribution for comparison

Output values:

score: The magnitude of the change detected
pValue: Statistical significance of the change (lower = more significant)
martingaleValue: Cumulative evidence for change (higher = stronger evidence)

When to use:

Detecting seasonal transitions
Identifying system configuration changes
Market regime shifts in trading
Process drift in industrial monitoring
Customer behavior pattern changes

Mathematical foundation: The algorithm uses the Power Martingale method and CUSUM (Cumulative Sum Control Chart) for sequential change detection.

Further reading:

Output Format

The node produces an array of detected anomalies at the target path. Each anomaly object contains the following fields:

Common Fields (all anomaly types)

Field	Type	Description
`type`	string	Anomaly type: `"spike"` or `"changePoint"`
`confidence`	number	The confidence level used for detection (0-100). Copied from the detector configuration (e.g. 95 for 95% confidence)
`level`	number	Detection level indicator. Value > 0 indicates an anomaly was detected. For spikes and change points, this is typically 1.0 when detected, 0.0 when not
`score`	number	Anomaly severity score. Higher values indicate stronger anomalies. For spikes: typically 0-10+, for change points: typically 0-5+
`pValue`	number	Statistical p-value (0-1). The probability of observing this data point given the historical distribution. Lower values indicate higher anomaly confidence. See detailed interpretation table below
`timestamp`	ISO 8601	UTC timestamp when the anomaly was detected (not the data point's timestamp)
`seriesKey`	string	Group identifier from `groupByPath`. Empty string if no grouping is used. Allows tracking which series the anomaly belongs to
`currentValue`	number	The actual numeric value that triggered the anomaly detection
`context`	any	Additional context data from `contextPath` (if configured). Can be string, number, or object depending on the source data

Change Point Specific Fields

Field	Type	Description
`martingaleValue`	number	Cumulative evidence score for change point detection. Higher values (typically > 10) indicate stronger evidence of a persistent change. This value accumulates over time until a change is detected

Example Output

[
  {
    "type": "spike",
    "confidence": 95,
    "level": 1.0,
    "score": 3.2,
    "pValue": 0.001,
    "timestamp": "2024-01-15T10:30:00Z",
    "seriesKey": "Sensor-A1",
    "currentValue": 145.7,
    "context": "2024-01-15T10:30:00"
  },
  {
    "type": "changePoint",
    "confidence": 95,
    "level": 1.0,
    "score": 2.8,
    "pValue": 0.002,
    "martingaleValue": 12.5,
    "timestamp": "2024-01-15T11:00:00Z",
    "seriesKey": "Sensor-A1",
    "currentValue": 98.3,
    "context": "2024-01-15T11:00:00"
  }
]

Interpreting the Output

Empty array: No anomalies detected in the current batch
Multiple entries: Each entry represents a distinct anomaly. A single data point might trigger both spike and change point detection
seriesKey grouping: When using groupByPath, anomalies from different groups will have different seriesKey values

Understanding p-Values

The p-value represents the probability of observing this data point under normal conditions. Lower p-values indicate stronger anomalies:

p-Value Range	Scientific Notation	Interpretation	Anomaly Strength	Standard Deviations	Action Required
> 0.1	> 1E-01	Normal variation	None	< 1.6σ	No action
0.05 - 0.1	5E-02 to 1E-01	Borderline	Weak	~1.6-2σ	Monitor
0.01 - 0.05	1E-02 to 5E-02	Significant	Moderate	~2-2.6σ	Investigate
0.001 - 0.01	1E-03 to 1E-02	Highly significant	Strong	~2.6-3.3σ	Alert required
0.0001 - 0.001	1E-04 to 1E-03	Very highly significant	Very strong	~3.3-3.9σ	Immediate attention
0.00001 - 0.0001	1E-05 to 1E-04	Extremely significant	Extreme	~3.9-4.4σ	Critical issue
< 0.00001	< 1E-05	Exceptionally rare	Exceptional	> 4.4σ	Emergency response
< 1E-08	< 1E-08	Almost impossible	Maximum	> 5.5σ	System failure likely

Real-world examples of p-values:

p = 0.05: Normal daily fluctuation in website traffic
p = 0.001: Unusual spike in server response time
p = 1E-05: Major sensor malfunction or data corruption
p = 1E-08: Critical system failure, hardware defect, or extreme external event

Relationship with confidence level:

If spikeConfidence = 95, the threshold is p < 0.05
If spikeConfidence = 99, the threshold is p < 0.01
A p-value of 1E-08 is 160,000x smaller than a 95% confidence threshold

Score Interpretation

Spike scores:
- 1-2: Mild deviation
- 2-3: Moderate anomaly
- 3-5: Strong anomaly
- 5: Severe anomaly
Change point scores:
- 1-2: Minor shift
- 2-3: Significant change
- 3: Major trend change
Martingale values (change points only):
- 5-10: Evidence accumulating
- 10-20: Strong evidence
- 20: Definitive change

## Examples

### Example 1: IoT Sensor Anomaly Detection

```yaml
transformations:
- type: MachineLearningAnomalyDetection@1
  path: $.sensorReadings[*]
  targetPath: $.detectedAnomalies
  resetStatistics: false
  detectors:
    - path: $.temperature
      groupByPath: $.sensorId
      contextPath: $.timestamp
      detectSpikes: true
      detectChangePoints: true
      minDataPoints: 20
      maxDataPoints: 500
      spikeConfidence: 95
      changePointConfidence: 0.90
      pValueHistoryLength: 100
      changeHistoryLength: 20

Example 2: Financial Transaction Monitoring

transformations:
- type: MachineLearningAnomalyDetection@1
  path: $.transactions[*]
  targetPath: $.suspiciousActivities
  detectors:
    - path: $.amount
      groupByPath: $.accountId
      contextPath: $.transactionId
      detectSpikes: true
      detectChangePoints: false  # Only detect spikes for fraud detection
      minDataPoints: 50
      spikeConfidence: 0.99  # High confidence to reduce false positives
      pValueHistoryLength: 200

Example 3: Multi-Metric Monitoring

transformations:
- type: MachineLearningAnomalyDetection@1
  path: $.metrics[*]
  targetPath: $.alerts
  resetStatistics: false
  detectors:
    # CPU usage spike detection
    - path: $.cpuUsage
      groupByPath: $.serverId
      detectSpikes: true
      detectChangePoints: false
      minDataPoints: 10
      spikeConfidence: 0.90
      pValueHistoryLength: 50
    # Memory trend changes
    - path: $.memoryUsage
      groupByPath: $.serverId
      detectSpikes: false
      detectChangePoints: true
      minDataPoints: 30
      changePointConfidence: 95
      changeHistoryLength: 15
    # Network traffic anomalies (both types)
    - path: $.networkThroughput
      groupByPath: $.serverId
      detectSpikes: true
      detectChangePoints: true
      minDataPoints: 20
      maxDataPoints: 1000

Example 4: Stateless Detection for Batch Processing

transformations:
- type: MachineLearningAnomalyDetection@1
  path: $.batchData[*]
  targetPath: $.batchAnomalies
  resetStatistics: true  # Each batch starts fresh
  detectors:
    - path: $.value
      detectSpikes: true
      detectChangePoints: true
      minDataPoints: 10
      maxDataPoints: 100  # Limit memory for batch processing
      spikeConfidence: 95
      changePointConfidence: 95

Notes

Machine Learning Models: Uses ML.NET's SSA (Singular Spectrum Analysis) based algorithms for robust anomaly detection
Minimum Data Requirements: The algorithms require at least minDataPoints observations before producing reliable results
Stateful Processing: When resetStatistics is false, the node maintains time series history across pipeline runs, improving detection accuracy over time
GroupBy Functionality: The groupByPath parameter enables independent time series tracking for different entities (e.g., per sensor, per user)
Memory Management: Use maxDataPoints to implement a sliding window and limit memory usage in long-running pipelines
Confidence vs Sensitivity: Higher confidence values reduce false positives but may miss subtle anomalies; adjust based on your use case
Real-time vs Batch: The node works for both real-time streaming (stateful) and batch processing (stateless) scenarios
IID Assumption: Both algorithms assume data points are Independent and Identically Distributed within their respective windows, which works well for most stationary time series but may need parameter tuning for highly seasonal or trending data

Adapter Prerequisites​

Node Configuration​

Detection Types​

Spike Detection (IID Spike)​

Change Point Detection (IID Change Point)​

Output Format​

Common Fields (all anomaly types)​

Change Point Specific Fields​

Example Output​

Interpreting the Output​

Understanding p-Values​

Score Interpretation​

Example 2: Financial Transaction Monitoring​

Example 3: Multi-Metric Monitoring​

Example 4: Stateless Detection for Batch Processing​

Notes​