Yikun (Aiden) Han
I am a second-year Master's student in Data Science at the University of Michigan.
My research interests span the intersection of geometric deep learning, natural language processing, and AI for healthcare.
I am fortunate to be part of the CASI Lab at the University of Michigan, supervised by Prof. Ambuj Tewari. Additionally, I closely collaborate with the AI Health Lab at the University of Texas at Austin.
Feel free to contact me via email: yikunhan [at] umich.edu
Email /
GitHub /
Curriculum Vitae /
Google Scholar /
LinkedIn
|
|
|
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
Yijun Tian*, Yikun Han*, Xiusi Chen*, Wei Wang, Nitesh V. Chawla
arXiv, 2024
arxiv
/
code
We present TinyLLM, a knowledge distillation approach that transfers reasoning abilities from multiple large language models (LLMs) to smaller ones. TinyLLM enables smaller models to generate both accurate answers and rationales, achieving superior performance despite a significantly reduced model size.
|
|
Mapping from Meaning: Addressing the Miscalibration of Prompt-Sensitive Language Models
Kyle Cox, Jiawei Xu, Yikun Han, Abby Xu, Tianhao Li, Chi-Yang Hsu, Tianlong Chen, Walter Gerych, Ying Ding
arXiv, 2024
code
We explore prompt sensitivity in large language models (LLMs), where semantically identical prompts can yield vastly different outputs. By modeling this sensitivity as generalization error, we improve uncertainty calibration using paraphrased prompts. Additionally, we propose a new metric to quantify uncertainty caused by prompt variations, offering insights into how LLMs handle semantic continuity in natural language.
|
|
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing*, Yongye Su*, Yikun Han*, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, Min Zhang
arXiv, 2024
arxiv
We survey the integration of Large Language Models (LLMs) and Vector Databases (VecDBs), highlighting VecDBs’ role in addressing LLM challenges like hallucinations, outdated knowledge, and memory inefficiencies. This review outlines foundational concepts and explores how VecDBs enhance LLM performance by efficiently managing vector data, paving the way for future advancements in data handling and knowledge extraction.
|
|
A Community Detection and Graph-Neural-Network-Based Link Prediction Approach for Scientific Literature
Chunjiang Liu*, Yikun Han*, Haiyun Xu, Shihan Yang, Kaidi Wang, Yongye Su
Mathematics, 2024
paper
We integrate the Louvain community detection algorithm with various GNN models to improve link prediction in scientific literature networks. This approach consistently boosts performance, with models like GAT seeing AUC increases from 0.777 to 0.823, demonstrating the effectiveness of combining community insights with GNNs.
|
|
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge
Yikun Han, Chunjiang Liu, Pengfei Wang
arXiv, 2023
arxiv
We review key algorithms for solving approximate nearest neighbor search in vector databases, categorizing them into hash-based, tree-based, graph-based, and quantization-based methods. Additionally, we discuss challenges and explore how vector databases can integrate with large language models for new opportunities.
|
|
DREAM Olfactory Mixtures Prediction Challenge
Yikun Han, Zehua Wang, Stephen Yang, Ambuj Tewari
RECOMB/ISCB Conference on Regulatory & Systems Genomics with DREAM Challenges, 2024
writeup
/ code
/ website
/ news
We use pre-trained graph neural networks and boosting techniques to enhance odor mixture discriminability, transforming single molecule embeddings into mixture predictions with improved robustness and accuracy.
|
|
Advisor: Prof. Ambuj Tewari
Research Topics:
[1] Graph Neural Networks
[2] Molecular Property Prediction
[3] Protein-Ligand Affinity Prediction
|
|
Advisor: Prof. Ying Ding, Prof. Jiliang Tang
Research Topics:
[1] Graph Retrieval-Augmented Generation
[2] Medical AI
[3] Collaborator Recommendation
|
|
Advisor: Prof. Nitesh V. Chawla
Research Topics:
[1] Knowledge Distillation
[2] Multi-Teacher Collaboration
[3] In-Context Learning
|
|
Advisor: Prof. Gang Chen
Research Topics:
[1] LAPACK Optimization
[2] Parallel Computation for Large-Scale Matrices
[3] High-Performance Matrix Factorization and Back Substitution
|
|
Master
Data Science
GPA: 3.894/4.0
|
|
Bachelor
Information Resources Management
GPA: 3.87/4.0
Rank: 2/76
|
RSGDREAM Travel Award, 2024
Outstanding Graduate, 2023
Second Prize Scholarship 2022
Outstanding Student, 2021
Outstanding Student, 2020
|
|