About me

Hello! My name is Runkang Yang from Jiaozuo, Henan in China 🐾, and I am currently a senior at ShanghaiTech University, majoring in Computer Science and Technology🖥️.

During my undergraduate studies, I had the honor of being a research intern at the LINs Lab, where I focused on data-efficient AI 🚀 and generative AI 🎨 under the guidance of Prof. Tao Lin at Westlake University. Additionally, I was privileged to work as a research intern at TeleAI, focusing on visual understanding for Multimodal Large Language Models (MLLM) 🤖, supervised by Prof. Dell Zhang.

I am also a candidate for the joint Ph.D. program between Fudan University and BYD. Currently, I am a research intern at the FD-LAMT Lab, supervised by Prof. Wei Li at Fudan University. My research topics include Computer Audition 🔊 and Music AI 🎶.

News

[Dec. 2025] Awarded the Outstanding Student for the 2024-2025 academic year (Top 10%).🦾
[Sep. 2025] Started my research internship at the FD-LAMT Lab! 🥳
[Jul. 2025] Started my research internship at TeleAI! 🎉
[Jul. 2025] Won the Third Prize in the 2025 China Physiological Signal Challenge.

Academic Background

As a student at ShanghaiTech University, I have had the privilege of engaging with a diverse array of courses and projects that have solidified my foundation in computer science. 🐳 My coursework has spanned topics such as Computer Programming (A), Probability and Statistics (A), Artificial Intelligence (A-), Machine Learning (A+), Nature Language Processing (A-), Computer Architecture (A-), Software Engineering (A), Database (A+), and so on, providing me with a broad and comprehensive understanding of the field. 📚

Publications

[Preprint] Equally Critical: Samples, Targets, and Their Mappings in Datasets.Runkang Yang, Peng Sun, Xinyi Shang, Yi Tang, Tao Lin*.2025.05.17 [Arxiv] [GitHub]
- Proposes a unified view of training efficiency from samples, targets, and their mapping relationships for data-efficient learning and knowledge distillation, showing how sample-target correspondence shapes optimization dynamics.
- Organizes supervised learning paradigms into three sample-to-target mapping strategies, and shows that mapping multiple augmented views of one sample to a shared soft target can balance target informativeness and label consistency while improving final accuracy.
[Preprint] VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial ReconstructionShaobo Wang, Tianle Niu, Runkang Yang, Deshan Liu, Xu He, Zichen Wen, Conghui He, Xuming Hu, Linfeng Zhang*.2025.11.24 [Arxiv] [GitHub]
- Proposes VideoCompressa for large-scale video understanding, modeling video compression as a joint temporal compression and spatial reconstruction process to address high storage cost, training cost, and severe temporal redundancy.
- Develops an end-to-end video synthesis pipeline using a lightweight ConvNet with Gumbel-Softmax for differentiable keyframe selection, together with a frozen pretrained VAE for compact latent reconstruction.
- On UCF101, achieves a 2.34-point improvement over full-data training with only 0.13\% of the original data and more than 5800$\times$ speedup over conventional synthesis methods; on HMDB51, reaches full-data performance with only 0.41\% training data when fine-tuning Qwen2.5-VL-7B, outperforming the zero-shot baseline by 10.61\%.

Research Experience

Westlake University, Learning and Inference System (LINs) Lab 🧐 Aug. 2024 - Aug. 2025
- Revisited how samples, supervision targets, and their mappings jointly affect training efficiency in data-efficient learning, addressing the noise issue brought by soft-target training in conventional knowledge distillation and proposing a strategy that improves both early convergence and final accuracy.
- Studied inference-time and test-time scaling for diffusion models by treating initial noise and random seeds as optimizable variables, and analyzed their effects on composition, detail quality, and prompt alignment.
- Explored seed selection and recommendation mechanisms, showing that multi-seed candidate generation with lightweight quality evaluation can improve first-round generation quality without changing model parameters.

Internship Experience

Institute of Artificial Intelligence, China Telecom (TeleAI) 🧑‍💻 Jun. 2025 - Sep.2025
- Multimodal Real-Time Interaction Pipeline Optimization for TeleAI Glasses
  - Built and analyzed the end-to-end pipeline for AI glasses, covering speech input, Qwen3-0.6B intent recognition, MiniCPM-o-7B visual understanding, SSE streaming output, TTS over WebSocket, and real-time audio playback; conducted performance evaluation and bottleneck analysis for photo and video recognition scenarios.
  - Optimized the visual-understanding-to-speech pipeline by converting MiniCPM-o from one-shot output to SSE token streaming and triggering TTS from the first token with sentence-level segmentation, enabling streaming generation and playback; reduced latency from photo capture to first-token audio playback to 1.09–1.32s, improving baseline responsiveness by about 1s.
- Long-Memory Conversational System for AI Toys Based on FoloToy
  - Integrated FoloToy self-hosted services for smart-toy scenarios, completed protocol adaptation and connectivity between devices and custom servers, and organized the full MQTT-ASR-LLM-TTS invocation pipeline; independently built a Flask-based Mem0 proxy service compatible with the OpenAI API, with one-command deployment via Docker Compose and persistent storage volumes.
  - Improved memory management for LLM agents by introducing retrieval-based deduplication, threshold-based update control, and strong-model verification for memory updates, substantially improving long-term memory quality.
  - Developed a TeleMem visualization prototype supporting chat history, memory item browsing, login/registration, and SQLite storage for debugging and demo purposes.
- Preliminary Research and Experimental Support for TeleMem
  - Supported early-stage investigation and baseline experiments for the TeleMem direction by deploying Qwen3-8B with vLLM and building evaluation code for RAG baselines, Mem0, and other memory systems on the LoCoMo dataset; wrote evaluation scripts to compare semantic accuracy and generation quality across long-dialogue memory tasks.
Investigated Mem0’s memory types, custom categorization, timeline management, and Graph Memory capability, and summarized optimization directions for memory writing, updating, and conflict resolution to support later engineering deployment of TeleMem.

Exchange Experience

National University of Singapore (NUS) - School of Computing(SoC) 👣 July 2024
- Served as the primary lead of the project team in the Deep Learning course labs and final project), earning an A+ overall grade (Top 5\%).
- In the baseline project, collected images and built the dataset using Python and Selenium, then designed a two-stage YOLO + Inception pipeline for object detection followed by transfer-learning-based classification of five cat breeds, ranking first in both accuracy and runtime.
- In the advanced project, helped design an intelligent delivery system for a pet community using InceptionV3 for fruit and animal recognition, OpenCV-based box and corner detection to simplify visual recognition, and coordination with a robotic delivery cart for pickup, delivery, and queue management based on pet preferences; also contributed to Raspberry Pi deployment, Flask interfaces, path following, obstacle avoidance, and color-block localization.

Awards and Honors

Outstanding Student for the 2024-2025 academic year in ShanghaiTech (Top 10%).
The Third Prize in the 2025 China Physiological Signal Challenge.
Outstanding Student for the 2023-2024 academic year in ShanghaiTech (Top 10%).
The third prize in the 2024 China Undergraduate Mathematical Contest in Modeling (CUMCM).
The third prize in the 2024 National Undergraduate Electronic Design Contest (NUEDC).
The third prize in the 15th Chinese Mathematics Competitions (CMC).

Personal Skills

Technical Skills
- Programming Languages: Python, C, C++, RISC-V
- Tools and Frameworks: PyTorch, TensorFlow
- Databases: SQL
- Data Analysis: Pandas, NumPy
- Data Visualization: Matplotlib
- Version Control: Git, GitHub
- Hardware Skills: Multisim, Logisim
Other Skills
- Documentation: Microsoft Office, LaTeX
- Language: CET-4, CET-6

Student Work Experience

Teaching assistant
- CS150A (Database) Fall 2024
Student assistant
- [Sep. 2022 - present] News Center at ShanghaiTech University. Primarily work on editing and managing content for the university’s official WeChat account, promoting online campus communication.

Personal Interests

Beyond academics, I am an avid reader and enjoy staying updated with the latest research and technological developments.🐬 In my free time, I enjoy singing, dancing, photography, traveling, watching dramas like Descendants of the Sun (태양의 후예) (Viki), and studying Korean (한국어) & Japanese (にほんご), which provide a creative outlet and help me maintain a balanced lifestyle. 🥸

Last updated: Apr. 2026