Hello, Academic World!

Self Introduction

I’m Zhangyi Hu, a Bachelor of Engineering student majoring in Computer Science and Technology at Wuhan University’s School of Computer Science. Currently holding a high GPA of 3.87/4.0 with an average score of around 91/100, I’m deeply committed to my academic pursuits and have been recognized with several scholarships, including the prestigious National Scholarship.

Education

Bachelor of Engineering in Computer Science and Technology, Wuhan University, School of Computer Science, September 2021 - June 2025

Research Interest

My research primarily revolves around the application of foundation models like Language Vision Models (LVM)s and Large Language Models (LLM)s in Visible-Infrared Person Re-Identification. This independent project is supervised by Prof. Mang Ye at the Multimedia Analysis and ReaSoning (MARS) LAB.

Future Plans:

Moving forward, I am eager to delve deeper into foundational research on large models and their application in downstream tasks, with a particular focus on enhancing their capabilities. My goal is to translate my research findings into practical products, such as advanced vision systems for cameras and drones, addressing real-world needs effectively.
The advent of foundation models holds significant promise for the development of Artificial General Intelligence (AGI). In pursuit of an intelligent system that truly embodies the original definition of AI, I plan to engage in research on multimodal agents. These agents will be capable of comprehensively perceiving complex environments, performing analysis, planning, reasoning and possessing long-term memory. Moreover, these systems will be able to utilize extern tool APIs, pushing closer to the realization of AGI.
By integrating the power of foudation models with multimodal data processing, I aim to build robust systems that can not only perceive and interact with their surroundings more effectively but also contribute to the broader goals of AI, making substantial progress towards intelligent, adaptive systems that benefit society at large.

key words: Computer Vision, Multimodal, Foundation Models, AGI, Agents