MLT_S

Multilingual Scene Text Segmentation (MLT_S) Dataset


The MLT_S dataset (MLT_S_labels.zip (102.0 MB)) contains 6896 (5540 from the training set and 1356 from the validation set) label maps for the MLT dataset, while the original images can be downloaded separately at the MLT dataset webpage. The supervision is obtained from the available bounding–boxes of the MLT dataset exploiting a weakly supervised algorithm. See Implementation for more details.

Annotations Description

The semantic label maps are saved as .png images with the same name as the original images of MLT. Pixel values of the provided annotation are defined as follows:

  • 0 - Background
  • 1 - Foreground (Text)
  • 255 - Uncertain

Terms of use

The annotations in this dataset are licensed under a Creative Commons Attribution 4.0 License and by downloading the annotations you confirm that you agree to this terms of use.

If you use this dataset for your research, please cite the following papers:

  • [bib] Bonechi, S., Andreini, P., Bianchini, M., & Scarselli, F. (2019, September). COCO_TS Dataset: Pixel–Level Annotations Based on Weak Supervision for Scene Text Segmentation. In International Conference on Artificial Neural Networks (pp. 238-250). Springer, Cham.
  • [bib] Bonechi, S., Andreini, P., Bianchini, M., & Scarselli, F. (2019). Weak Supervision for Generating Pixel-Level Annotations in Scene Text Segmentation. Pattern Recognition Letters, 2020, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2020.06.023