Steven Cao
← Back to projects

EEG Biometric ID

Neural pattern recognition for biometric identification

2022Published PaperHero Lab, UC Irvine
PythonDeep LearningEEGNetInceptionTimeResNetPyTorchTensorFlowSignal Processing

Overview

This research project developed an EEG-based biometric identification system achieving 86.74% accuracy—a 3.72% improvement over previous state-of-the-art. The system uses deep learning to identify individuals from their brainwave patterns, demonstrating the feasibility of EEG as a secure biometric modality that is resilient to physical injury, extremely hard to reproduce, and impossible to capture at a distance.

Conducted at UC Irvine's Hero Lab under Prof. Hung Cao, this work represents a first approximation to a fully functional portable system for subject identification using trained deep learning models to process neural signals in real-time.

The Problem

Current EEG-based biometric systems are impractical for real-world scenarios because of low accuracy (20-30%) over long periods of time. While previous research achieved good results over 1-2 days, extending this to several weeks or more is crucial for practical deployment.

The challenge stems from EEG signal variability at different psychological and physiological conditions. For a biometric system to be useful—such as accessing a bank account or secure facility—it must reliably identify users weeks or months after initial enrollment, not just days later.

Why EEG Biometrics?

EEG signals present major advantages over traditional biometric modalities (iris, face, fingerprint):

  • Resilient to physical injuries: Unlike fingerprints that can be damaged or faces that can be altered, EEG signals remain stable despite physical trauma
  • Extremely hard to reproduce: Brain patterns are unique and cannot be forged like fingerprints or spoofed with photos like facial recognition
  • Cannot be captured at a distance: Unlike iris scans or facial recognition, EEG requires direct contact with electrodes, preventing covert surveillance
  • Continuous authentication: Can verify identity even when a person is unconscious or unable to provide active cooperation

Dataset & Hardware

The study used the BED (Brain EEG Dataset) with data from 21 subjects collected using the consumer-grade Emotiv EPOC+ headset:

  • 14-channel EEG signals at 256 Hz sampling rate (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4)
  • 12 different stimuli: affective, cognitive, visual evoked potentials, and resting-state
  • 3 chronologically disjointed sessions one week apart to test temporal persistence
  • Consumer-grade device: Used affordable hardware to demonstrate practical real-world applicability

Training dataset: 2,070 samples (230 per subject, 9 subjects total). Testing dataset: 1,033 samples from the third session.

Data Preprocessing Pipeline

Developed a sophisticated preprocessing pipeline to handle EEG signal variability:

  • CLT-based data enhancement: Leveraged Central Limit Theorem for normalization, creating 471 epochs per subject with statistical standardization
  • Epoch-based linear regression: 500 samples per epoch to remove linear trends and reduce noise
  • PREP Pipeline: 1 Hz high-pass filter → 60 Hz notch filter (line noise removal) → true mean reference → 50 Hz low-pass filter → 90% overlapping rate → StandardScaler
  • Artifact removal: Independent component analysis for heartbeat artifacts

Deep Learning Models

Compared three state-of-the-art deep learning architectures modified for time-series EEG classification:

  • EEGNet (Best: 86.74% accuracy): Compact CNN specifically designed for EEG signals using depthwise and separable convolutions. Incorporates well-known EEG feature extraction concepts including optimal spatial filtering and frequency-specific processing
  • InceptionTime (70.18% accuracy): Modified Inception-v4 architecture with multiple filter sizes for capturing features at different time scales. Added dropout layers and changed activation from ReLU to ELU
  • ResNet (63.21% accuracy): Deep residual network with skip connections to prevent vanishing gradients. Added additional residual blocks for time-series adaptation

All models were trained on GPU (3 hours) vs CPU (28 hours), using the first two weeks of data for training and the third week for testing.

Hardware Implementation

Built a real-time embedded system using Raspberry Pi 4 to simulate practical deployment:

  • Digital-to-Analog Converter (DAC): 12-bit MCP4725 chip to simulate real EEG acquisition
  • Analog-to-Digital Converter (ADC): 10-bit MCP3008 chip for signal digitization
  • Multi-threaded processing: Parallel threads for data acquisition, preprocessing, and real-time classification
  • Low latency: Processing time of 9-10ms per sample

Dataset generator efficiency: Mean Squared Error of 27.57 (0.67% of dataset range), demonstrating high fidelity between original and regenerated data.

Results & Impact

EEGNet achieved 86.74% accuracy, surpassing the previous best model's 83.51% by 3.72 percentage points. Detailed metrics:

  • Accuracy: 86.74%
  • Precision: 89.13%
  • Recall: 86.68%
  • F1 Score: 86.69%
  • Processing Time: 10.28ms per sample

This represents a significant advancement in EEG-based biometric identification over long periods (3+ weeks), demonstrating that deep learning models can automatically extract robust features that remain stable across time, outperforming traditional machine learning approaches using manually extracted features.

Key Technical Insights

Several critical learnings emerged from this research:

  • Preprocessing is crucial: Data preprocessing plays as important a role as model architecture—ResNet's initially low accuracy (6.15%) improved dramatically to 63.21% with proper preprocessing
  • Automatic feature extraction wins: Deep learning's automatic feature representation outperformed manually extracted features (MFCC, ARRC, SPEC) used in traditional ML models
  • EEG-specific architectures matter: EEGNet's domain-specific design (depthwise/separable convolutions) significantly outperformed general-purpose architectures
  • Consumer-grade hardware is viable: Achieving high accuracy with affordable Emotiv EPOC+ headset demonstrates practical real-world applicability

Applications

  • High-security access control: Government facilities, classified information systems requiring anti-spoofing biometrics
  • Medical authentication: Hospital systems where accurate patient identification is critical, even when patients are unconscious
  • Accessibility technology: Enabling people with disabilities to access computers using brain signals instead of physical input devices
  • Continuous authentication: Systems requiring ongoing identity verification without active user participation

Challenges & Solutions

The primary challenge was handling EEG signal variability across different psychological and physiological states. We addressed this through CLT-based normalization to focus on "common" signals and epoch-based regression to remove temporal trends. Another challenge was computational efficiency—training deep models on time-series data required careful hyperparameter tuning and GPU acceleration.

Weight initialization also proved critical, as different random initializations could lead to convergence at local minima versus global minima, resulting in significant accuracy variations across training runs.

What I Learned

This research taught me that successful deep learning is not just about model architecture—data preprocessing is equally crucial. I gained extensive experience in signal processing techniques including noise filtering, artifact removal, spectral feature extraction, and statistical normalization.

I also learned to read academic literature critically, track experiments systematically using GitHub and TensorBoard, and apply mathematical concepts from high school (Central Limit Theorem, least-squares regression) to solve real research problems. Working with Prof. Cao and the Hero Lab team showed me how collaborative research operates and the importance of rigorous experimental methodology.

Future Directions

  • Expand subject diversity: Include more subjects across different demographics (gender, age, race) to ensure model generalization
  • Longer temporal testing: Extend evaluation to 6+ months to validate long-term stability
  • Robustness testing: Examine system performance under various psychological and physiological states (stress, fatigue, emotion)
  • End-to-end system: Develop a complete low-cost device integrating EEG acquisition, preprocessing, and real-time identification
  • Minimal channel optimization: Determine the minimum number of EEG channels needed while maintaining high accuracy