Aasish Kumar Sharma is a researcher/scientific employee at Göttingen University employed under Professor Dr. Julian Kunkel and is focusing in high-performance computing performance optimization. His work includes developing efficient task scheduling algorithms for HPC systems, published in leading journals. Aasish is particularly interested in scalable solutions for heterogeneous architectures and has collected some publications and received the NHR Research Scholarship Award for his contributions. He is interested in optimization applying different smart algorithms, and emerging technologies like Artificial Intelligence/ Machine Learning (AI/ML) and Quantum Computing (QC) techniques. His previous work includes data engineering and big data analysis, and SQL query optimization while working as a Database Administrator. He is also a Microsoft Certified Trainer for Microsoft SQL Query Optimization for year 2025.
ResearchGate
Graph transformer models, which combine attention mechanisms with graph structure, consistently outperform standard message-passing GNNs on combinatorial optimization tasks in recent benchmarks (2024-2026). Applied to workflow scheduling, graph transformers can capture global task-dependency patterns and long-range interactions that locality-limited GNNs miss, a limitation documented in the Grapheon RL benchmark at large scales (rnc5000, DOI 10.5281/zenodo.20432418). This thesis replaces the GNN backbone of the Grapheon RL architecture with a graph transformer encoder (e.g., GraphGPS or Exphormer) and evaluates whether attention-based representations improve scheduling quality on rnc300-rnc1000 workflow instances from the published STG dataset. The student will compare objective gap, inference speed, and training convergence against the published GNN-RL baselines under the same homogeneous and heterogeneous system configurations. The thesis contributes a systematically evaluated architectural extension to the open benchmark.
Data centers in Europe are now subject to the EU Energy Efficiency Directive (2023/1791) and are under growing pressure from funders and institutions to report and minimize operational carbon emissions. HPC schedulers that factor in carbon intensity of the grid alongside performance are emerging as a key tool, but dedicated benchmarks combining workflow-level quality metrics with carbon cost remain rare. This thesis extends the Grapheon RL framework to a three-objective scheduler minimizing makespan, energy consumption, and carbon cost simultaneously. The student will integrate synthetic carbon intensity traces (modeled after real-world grid data from the European Energy Exchange or carbon-aware open APIs), reformulate the RL reward to include a carbon penalty term, and retrain and evaluate on the published STG workflow dataset. The evaluation reports Pareto frontier trade-offs and compares carbon savings against performance-only Grapheon RL and HEFT baselines. The thesis outcome serves as a reproducible carbon-aware scheduling baseline for the group's benchmark.
A practical requirement for production GNN-RL schedulers is the ability to generalize beyond the workflow family used for training. The Grapheon RL model (DOI 10.1109/COMPSAC65507.2025.00341) is trained on Standard Task Graph (STG) instances. It is an open question how well it transfers to other workflow families such as Pegasus CyberShake, Montage, or synthetic BLAST pipelines, which differ in graph structure, depth, parallelism ratio, and task heterogeneity. This BSc thesis systematically evaluates Grapheon RL transfer to at least two non-STG workflow families without any fine-tuning, with lightweight fine-tuning (10-50 additional episodes), and with full retraining. The student will report normalized objective gap, schedule feasibility, and inference speed under the homogeneous 3-node configuration from the benchmark. The thesis produces a transfer learning guide: which workflow structural properties predict successful generalization, and how many fine-tuning samples are sufficient for a new family.
Production HPC clusters experience workload distribution drift over time: new workflow types appear, system configurations change, and peak load periods vary by season or funding cycle. A GNN-RL scheduler trained on a fixed dataset degrades as the deployment distribution shifts away from training, requiring costly full retraining. Continual reinforcement learning methods (Elastic Weight Consolidation, PackNet, progressive networks) address this stability-plasticity dilemma by enabling an agent to learn new tasks without catastrophic forgetting. This thesis implements an online continual learning wrapper for the Grapheon RL scheduler that updates model weights on a rolling window of recent scheduling decisions from the GWDG SCC cluster workload logs. The student will evaluate forgetting rate on previous workflow sizes, adaptation speed to new workflow families, and compute cost per update step. A comparison with periodic full retraining quantifies the practical case for online adaptation. Access to anonymized SCC job logs is available through the GWDG HPS group.
Machine-readable modeling of HPC system resources and workflow task dependencies is a prerequisite for both exact optimization and learning-based scheduling. Current approaches use flat JSON or ad-hoc formats that lack semantic expressiveness. Knowledge graphs and linked-data ontologies offer a structured alternative that enables reasoning, constraint checking, and integration with LLM-based planners via graph query languages (SPARQL) or schema retrieval. This thesis designs and implements a knowledge graph schema for heterogeneous HPC systems and Standard Task Graph workflows, building on the node and workflow JSON definitions from the Stage 2 benchmark (DOI 10.5281/zenodo.20432418). The student will populate a reference knowledge graph instance, evaluate its expressiveness against the scheduling constraints used in Grapheon RL, and demonstrate two end-use cases: constraint validation at submission time and semantic query for suitable node allocation. The outcome is a reusable schema with open-source tooling for KG population from existing JSON workflow descriptions.
All publications as BibTex