News
Generating long-form videos conditioned on large story based text input is a new and relatively unexplored task. Current text-to-video models are designed to generate short video clips conditioned on ...
Text-based Visual Question Answering (TextVQA) aims to produce correct answers for given questions about the images with multiple scene texts. In most cases, the texts naturally attach to the surface ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results