A Caltech Library Service

Sequence-to-Sequence Contrastive Learning for Text Recognition

Aberdam, Aviad and Litman, Ron and Tsiper, Shahar and Anschel, Oron and Slossberg, Ron and Mazor, Shai and Manmatha, R. and Perona, Pietro (2021) Sequence-to-Sequence Contrastive Learning for Text Recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE , Piscataway, NJ, pp. 15297-15307. ISBN 978-1-6654-4509-2.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on hand-written text and on scene text show that when a text decoder is trained on the learned representations, our method out-performs non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard hand-written text recognition benchmarks.

Item Type:Book Section
Related URLs:
URLURL TypeDescription Paper
Manmatha, R.0000-0003-2315-8583
Perona, Pietro0000-0002-7583-5809
Additional Information:© 2021 IEEE.
Record Number:CaltechAUTHORS:20210119-161639508
Persistent URL:
Official Citation:A. Aberdam et al., "Sequence-to-Sequence Contrastive Learning for Text Recognition," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15297-15307, doi: 10.1109/CVPR46437.2021.01505
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:107570
Deposited By: George Porter
Deposited On:20 Jan 2021 15:23
Last Modified:10 Jan 2022 23:00

Repository Staff Only: item control page