Xiaoxu Zhu - Speech AI Researcher

About Me

Xiaoxu Zhu

I am Xiaoxu Zhu (朱晓旭), a Speech AI researcher focusing on large speech models and speech generation. I have been working on speech algorithm research and development at SenseTime since 2021. I have also worked or interned at Siemens, Cheetah Mobile, and Shanghai AI Lab.

I am expected to receive my Master's degree in Information Management from Tsinghua University in 2025 while working. I received my Bachelor's degree from Harbin Institute of Technology in 2016, and my Master's degree in Informatics and Computing Technology from Peter the Great St. Petersburg Polytechnic University in 2019.

Education

  • Tsinghua University Logo

    Tsinghua University

    Master's Degree in Engineering Management (Information Management)

    2023 - 2025
  • SPbSTU Logo

    Peter the Great St. Petersburg Polytechnic University

    Master's Degree in Informatics and Computing Technology

    2017 - 2019
  • HIT Logo

    Harbin Institute of Technology

    Bachelor's Degree in Materials Forming and Control Engineering

    2012 - 2016

Work Experience

  • SenseTime Logo

    SenseTime

    Speech AI Researcher

    Large speech models, speech generation algorithms, and TTS systems.

    2021.09 - Present
  • Shanghai AI Laboratory Logo

    Shanghai AI Lab

    Algorithm Consultant

    Speech processing and machine learning technologies in speech synthesis.

    2022.06 - 2023.06
  • OrionStar Logo

    Cheetah Mobile (OrionStar)

    Speech AI Researcher

    Emotional speech synthesis and neural vocoder optimization.

    2019.10 - 2021.09
  • Siemens Logo

    Siemens (St. Petersburg)

    Algorithm Intern & Algorithm Engineer

    Deep learning-based speech synthesis systems and image recognition algorithms.

    2018.05 - 2019.09

Publications

  • Paper 1

    Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability

    Xiaoxu Zhu, Junhua Li

    arxiv, 2025

  • Paper 2

    A polyphone BERT for Polyphone Disambiguation in Mandarin Chinese

    Song Zhang, Ken Zheng, Xiaoxu Zhu, Baoxiang Li

    Interspeech, 2022

  • Paper 3

    Multimodal Sentiment Analysis via Efficient Multimodal Transformer and Modality-Aware Adaptive Training Strategy

    C. Ding, D. Zong, B. Li, S. Zhang, X. Zhu, G. Zhong, D. Zhou

    IEEE/ACM MuSe-Mimic, 2023

Contributions

Research Projects

  • 基于生成式大模型的公路路基突发性灾害预警技术与方法【国家重点课题】
    专题负责人 | 国家自然科学基金委员会高技术研究发展中心 | SQ2024YFB2600035
  • 基于语义知识图谱的建筑工程标准国际化共性关键技术【国家重点课题】
    项目骨干 | 中国21世纪议程管理中心 | SQ2024YFC3800085

Patents

  • 多音字读音预测网络的训练方法、语音生成方法及装置 [CN115273809B]
  • 残差网络的训练和语音合成方法、装置、设备及介质 [CN112562655A]
  • 模型训练和语音合成方法、装置、设备及介质[CN116206591A]
  • 一种模型训练和语音合成方法、装置、设备及介质 [CN115294955B]

Competitions & Awards

  • 2024 Intel Mini Hackathon - Excellent Work Award
  • 2023 ACM MuSe-Mimic Subchallenge - Second Place

Open Source Contributions

LPCNet - Pre-compute GRU B Conditioning

Performance Optimization: Implemented pre-computation of GRU B conditioning vectors to achieve approximately 10% speed improvement in LPCNet inference. This optimization reduces computational overhead by caching frequently used conditioning vectors, significantly improving real-time speech synthesis performance.

Contact

I am always eager to connect and exchange ideas with fellow researchers in large speech models and speech AI. Feel free to reach out if you'd like to discuss research collaborations or share insights!