News

In the world of natural language processing, foundation models have typically come in 3 different flavors: Encoder-only (e.g. BERT), Encoder-Decoder (e.g. T5) and Decoder-only (e.g. GPT-*, LLaMA, PaLM ...
The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text ...