Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

2 years ago 7
Add to circle
Read Entire Article