Project Detail · AI/ML System

Enhanced RNN-Autoencoder for Time-Series Anomaly Detection in 5G Networks

This project develops a self-attention-enhanced LSTM autoencoder for real-time anomaly detection over high-volume 5G spectrum data. Instead of relying on static signatures, the model learns the normal temporal structure of waveform-level in-phase and quadrature (I/Q) sequences, compresses them into a latent representation, reconstructs expected behavior, and flags abnormal spectral activity through reconstruction error. From an AI/ML engineering perspective, the project covers the full workflow: data generation, data collection, feature engineering, streaming pipeline design with Apache Kafka, model training, and deployment-oriented inference. The system was validated on an SDR-based 5G testbed built with srsRAN 5G, open5GS, USRPs, and configurable jammer scenarios.

Encoder
LSTM + Multi-Head Attention
Latent Space
Compressed sequence embedding
Decoder
Sequence reconstruction
Inference Rule
Error > threshold → anomaly

Data Generation, Collection, and Feature Engineering

AI/ML Data Stage

To build a realistic anomaly detection workflow, the data was not treated as an abstract benchmark. Instead, it was generated and collected from a practical SDR-based 5G environment. The setup included a machine running the 5G core and gNB stack, a separate machine acting as the UE-side receiver and analysis node, and a jammer machine used to inject controllable attack scenarios. This made it possible to observe both normal RF behavior and attack-driven spectral deviations under realistic over-the-air conditions.

From the collected I/Q streams, the data engineering stage focused on transforming raw signal captures into model-ready sequential inputs. This included windowing, sequence construction, normalization, and organization of temporal slices for training and inference. In other words, this stage plays the role of feature engineering for time-series RF learning, where the objective is to preserve temporal behavior while making the sequences consistent and learnable for the encoder-decoder model.

Experimental components

  • Machine 1: srsRAN gNB + 5G core stack
  • Machine 2: UE-side reception and analysis workflow
  • Machine 3: jammer generation with GNURadio
  • RF hardware: USRP / SDR front-end devices
  • Captured data: waveform-level I/Q streams

Feature engineering steps

  • Signal collection: normal and jammer-affected RF activity
  • Temporal slicing: conversion of raw samples into fixed sequences
  • Preprocessing: normalization and batching for training stability
  • Labeling logic: normal vs intrusion intervals for evaluation
  • Objective: preserve temporal structure for sequence modeling
5G SDR-based testbed and data generation setup for anomaly detection

Data Pipeline

Streaming + Deployment

After collection and preprocessing, the project uses a streaming-oriented data pipeline to move sequential signal data into the anomaly detection workflow. The core engineering goal here was not only model accuracy, but also deployment readiness: the system needed to support continuous ingestion, structured buffering, and model-side processing under high-volume RF data conditions.

To support this, I designed the pipeline around Apache Kafka so that incoming time-series sequences could be transported between stages in a reliable streaming format. In the portfolio context, this section highlights the AI/ML engineering layer of the project: moving data from live signal capture, through preprocessing and batching, into model inference and anomaly decision logic. The deployed workflow combines signal ingestion, temporal feature construction, model serving, and alert-oriented inference.

Pipeline flow

Stage 1 — Signal ingestion: collect waveform-level I/Q samples from the SDR testbed.
Stage 2 — Streaming transport: push data through Apache Kafka for buffering and delivery.
Stage 3 — Sequence construction: apply normalization, windowing, and temporal batching.
Stage 4 — Inference: route the processed sequence to the encoder-decoder anomaly model.
Stage 5 — Decision output: compute reconstruction error and emit anomaly flags for monitoring and intrusion detection.
Architecture and flow diagram for time-series anomaly detection system

Model Architecture

At the model level, the system follows a sequence-to-sequence reconstruction framework. The encoder maps the input signal sequence into a compressed latent representation, the attention mechanism reweights informative temporal regions, and the decoder reconstructs the expected normal behavior. The anomaly score is then computed from the distance between the original sequence and the reconstructed sequence. This makes the architecture suitable for unsupervised anomaly detection in settings where threat patterns are dynamic and not easily expressed through fixed signatures.

Core mathematical formulation

Encoder hidden state
h_t = f(W_hh h_(t-1) + W_hx x_t + b_h)

Captures temporal state progression of the input sequence through recurrent encoding.

Alignment score
e_tj = a(h_t, h_j)

Measures the relevance of one temporal state with respect to the other states.

Attention weights
α_tj = exp(e_tj) / Σ_k exp(e_tk)

Normalizes alignment scores into attention coefficients over the sequence.

Context vector
c_t = Σ_j α_tj h_j

Aggregates weighted temporal information before latent compression and decoding.

Decoder output
x̂_t = g(W_xh h_t + b_x)

Reconstructs the expected normal sequence from the learned representation.

Reconstruction loss
L(x, x̂) = (1 / T) Σ_t ||x_t - x̂_t||²

Mean-squared reconstruction error used both for optimization and anomaly scoring.

Architectural interpretation

Temporal encoder: stacked LSTM layers learn the sequential behavior of RF signals and preserve time-dependent structure across the input window.

Attention mechanism: the self-attention block assigns larger weights to more informative time steps, improving sensitivity to localized abnormal behavior.

Latent space modeling: the encoder compresses normal temporal dynamics into a compact embedding that represents expected waveform behavior.

Decoder reconstruction: the decoder maps the latent representation back to the signal domain and reconstructs the sequence under the learned normal manifold.

Anomaly inference: when abnormal spectral activity shifts the sequence away from that learned manifold, reconstruction error rises and the thresholding logic flags the sequence.

Representative configuration
LSTM units: 50 / 25
Attention heads: 4
Key dimension: 50
Optimizer: Adam
Loss: MSE
Sequence lengths: 10 / 20 / 100
Self-attention LSTM autoencoder architecture for RF anomaly detection

Inference Results and Evaluation

The evaluation stage focuses on how well the trained model separates normal and anomalous RF behavior during inference. Since the model is trained to reconstruct normal temporal patterns, the most important signal at inference time is the reconstruction error profile. When the incoming sequence follows normal behavior, the original and reconstructed I/Q trajectories remain closely aligned. When jammer-driven or abnormal spectral behavior appears, the reconstructed output diverges and the sequence produces a noticeably larger anomaly score.

From a machine learning perspective, this section demonstrates the transition from training-time representation learning to deployment-time inference. The plots below show the practical effect of the model: abnormal intervals generate stronger error excursions, while normal intervals stay near the expected reconstruction manifold.

Inference signal

Reconstruction error serves as the anomaly score for each sequence window.

Intrusion behavior

Attack-driven intervals produce larger deviations between the original and reconstructed I/Q trajectories.

Normal behavior

Normal sequences remain close to the learned manifold and are reconstructed with lower error.

Inference results for reconstruction error and original versus reconstructed I/Q behavior

Tech Stack and Deployment

Modeling

PyTorch-based LSTM autoencoder with self-attention, latent-space representation learning, and reconstruction-driven anomaly inference.

Data Engineering

Temporal sequence construction, signal preprocessing, normalization, windowing, and batching for high-volume I/Q time-series learning.

Streaming Pipeline

Apache Kafka-based transport layer for moving collected sequences from signal capture into model-facing inference stages.

Wireless Stack

srsRAN 5G, open5GS, GNURadio jammer workflows, and SDR / USRP hardware for practical RF experimentation.

Deployment

Dockerized and reproducible components supporting experiment portability, repeatable inference runs, and system integration.

Evaluation

Reconstruction-error plots, original-vs-reconstructed signal comparisons, and threshold-based anomaly analysis under jammer scenarios.

PyTorch LSTM Autoencoder Self-Attention Latent Space Modeling Apache Kafka Docker SDR / USRP srsRAN 5G open5GS I/Q Signal Processing Feature Engineering Inference Pipeline Anomaly Detection