News

Then, to fine-tune it for speech recognition, they added a projection on top of wav2vec 2.0 representing vocabulary in the form of tokens for characters and word boundaries (e.g., word spaces of ...