Multi-modal foundation model (image-text)
- Time: from 2021~2022
- Developed state-of-the-art Multilingual ALIGN models via triple contrastive loss.
- Reproduced state-of-the-art vision-language large-scale models.
- Reproduced OpenAI’s CLIP and Google’s ALIGN.
- The Reproduced ALIGN is uploaded in HuggingFace Model Card.
- Reproduced various state-of-the-art vision-language models.
- Reproduced PixelBERT and ViLT for visual question answering research.
- Participated in molecule structure extraction competition using images in documents.