Anila Ghazanfar

Biography

Anila Ghazanfar is a Doctoral Candidate specializing in Security and Fault Tolerance in High-Performance Computing (HPC). With a strong foundation in Information Security and Software Engineering, she is pursuing research focused on integrating reliability and security mechanisms in heterogeneous computing environments. Her goal is to develop a unified framework that addresses both reliability and security challenges in modern HPC systems.

Research Interests

  • eBPF
  • Security
  • Fault-Tolerance

Teaching

Open Thesis Topics

Checkpoint Integrity Verification in Distributed ML TrainingApply

Checkpoints are essential to reliable distributed ML training, enabling recovery from faults and fault-induced data corruption. However, as data science workloads are increasingly deployed in shared, untrusted data center environments, checkpoints stored in shared storage become vulnerable to tampering, unauthorized modification, and rollback attacks—threats that undermine both reliability and security guarantees. This work investigates the intersection of reliability and security by asking: how can we detect checkpoint tampering and ensure integrity of critical recovery data in distributed ML pipelines? We design and implement a checkpoint integrity verification system that combines cryptographic integrity verification with provenance tracking to detect unauthorized modifications and corruption in (lets say HDF5/NetCDF) checkpoint files. The system integrates with containerized environments (Apptainer) and is evaluated against adversarial fault injection attacks. We demonstrate detection efficacy under realistic threat models and provide empirical evidence of how security hardening can protect reliability mechanisms in data center ML infrastructure. Publishing the results in a peer-reviewed journal will give you the opportunity to establish yourself as a published author early in your career.

eBPF-Based Runtime Fault Injection Framework for HPC WorkloadsApply

Understanding how machine learning workloads behave under realistic hardware faults is critical to designing resilient data science systems for data centers. However, fault injection tools often require source code modification, specialized hardware, or are specific to particular I/O libraries, limiting their applicability to diverse HPC applications. This work investigates how runtime fault injection can characterize ML workload resilience without invasive instrumentation. We present an eBPF-based fault injection framework that enables controlled injection of I/O faults, network delays, and memory corruption into containerized HPC applications at runtime, targeting HDF5 parallel I/O operations. The framework integrates with checkpoint-enabled training pipelines and is evaluated on Kubernetes clusters. We demonstrate how different fault types degrade training progress and checkpoint reliability, and provide a systematic characterization of failure modes that informs both reliability and security strategies for ML workloads in shared data center environments. Your findings will form the basis of a peer-reviewed publication, giving you the chance to publish your first paper.

LLM Poisoning Detection via Training Data AttributionApply

Data poisoning attacks—where malicious training samples introduce backdoors, adversarial behaviors, or degraded model performance—represent a critical security threat to machine learning systems deployed in data centers. Unlike traditional security defenses, detecting poisoning requires understanding the influence of individual training examples on model behavior. This work investigates whether gradient-based data attribution can effectively identify poisoned training data in large language models before deployment. We develop a poisoning detection system that uses influence scoring and attribution methods to rank training examples by their impact on model outputs, enabling flagging of suspicious samples that could introduce backdoors or adversarial behaviors. The system is prototyped on smaller language models and systematically evaluated against benchmark poisoning attacks including backdoor injection, trojan insertion, and adversarial fine-tuning. We provide empirical evidence on detection accuracy, false positive rates, and robustness to adaptive attacks, contributing to the understanding of how data integrity verification can be embedded into ML training pipelines. Your findings will form the basis of a peer-reviewed publication, giving you the chance to publish your first paper.

Publications

2025

2016

  • A survey revealing path towards service life cycle management in COBIT 5 (Umara Noor, Anila Ghazanfar), In 2016 Eleventh International Conference on Digital Information Management (ICDIM), pp. 68–73, IEEE, IEEE, International Conference on Digital Information Management, 2016 BibTeX URL DOI