GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

Vikram V. Ramaswamy¹ Sing Yu Lin¹ Dora Zhao*² Aaron B. Adcock³ Laurens van der Maaten³ Deepti Ghadiyaram Olga Russakovsky¹
¹Princeton University ²Sony AI ³Meta AI
* Work done as a graduate student at Princeton University

We construct a geographically diverse dataset **GeoDE** that is approximately balanced across 6 world regions. We visualize the images per region, and compare our distribution (right) to that of a previously created diverse dataset GeoYFCC (left).

Paper

Code

Download dataset

Abstract

Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and Northern America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset containing 61,940 images from 40 classes and 6 world regions and contains no personally identifiable information, collected through crowd-sourcing. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, highlight shortcomings in current models, as well as show improved performances when even small amounts of GeoDE (1000 - 2000 images per region) are added to a training dataset. GeoDE is released under a CC BY license.

Citation


    @inproceedings{ramaswamy2022geode,
        author = {Vikram V. Ramaswamy and Sing Yu Lin and Dora Zhao and Aaron B. Adcock and Laurens van der Maaten and Deepti Ghadiyaram and 
                  Olga Russakovsky},
        title = {GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition},
        booktitle = {NeurIPS Datasets and Benchmarks},
        year = {2023}
    }

Crowdsourcing a geodiverse dataset

We ask participants from 6 different regions of the world to send us images of 40 different objects. This results ins a dataset comprising of 61,940 images roughly balanced across both these objects and regions. More details about the crowd-sourcing itself, along with specific objects and regions chosen are here.

Comparison to ImageNet

Shown are sample images of two object classes in different regions within GeoDE (and ImageNet in the bottom row, for comparison). Product labels on images have been blurred.

centered image

More visualizations to come!

Download the dataset.

Comparison to other geodiverse datasets

We compare GeoDE to 2 other datasets collected to be geographically diverse: GeoYFCC and DollarStreet. We see that GeoDE is more geographically diverse than GeoYFCC, and is much larger than DollarStreet.

Dataset	Size; distribution	Collection process; annotation process	Geographic coverage	Personally Identifiable Info (PII)


GeoDE	61,940 images; balanced across 40 classes and 6 regions	Crowd-sourced collection using paid workers; manual annotation	Even distribution over six geographical regions (West Asia, Africa, East Asia, Southeast Asia, Americas and Europe)	No identifiable people and no other PII

GeoYFCC	330K images; long-tailed class distribution	Flickr images subsampled to be geodiverse; noisy tags	Geographically diverse (62 countries), but concentrated in Europe	Contains people

DollarStreet	38,479 images; mostly balanced across classes	Images by professional and volunteer photographers; manual labels including household income	63 countries in four regions (Africa, America, Asia and Europe); not balanced	Yes, with permission

We attempt to quantify the difference in object appearance of images from GeoYFCC and GeoDE. Using features extracted from a ResNet50 model trained on the PASS dataset, we visualize these datasets using TSNE. Despite conditioning on both the region and object, we see that images from these two datasets have very different features.

centered image

Evaluating current models

We evaluate performance of models trained on other datasets on GeoDE to identify issues within these models. We show that there is a disparity in performance of these models based on region: below, we show the performance of CLIP and an ImageNet trained model on GeoDE. Region with the best performance is bolded, region with the worst performance is underlined.


Model	Africa	East Asia	Southeast Asia	Americas	West Asia	Europe

ImageNet	62.7	63.3	67.3	68.6	69.4	69.9
CLIP	78.7	79.9	81.9	84.4	84.0	85.8

We also visualize misclassified images from the classes that have the least accuracy for the CLIP model, along with the predicted label. We see that there are some objects that the model struggles to identify (like "medicine", which gets predicted as "hand soap" or "toothpaste/toothpowder" ) as well as some classes that contain stereotypes ("house" is often predicted to be "religious building").

Related Work

Below are some papers related to our work. We discuss them in more detail in the related work section of our paper.

Adaptive Methods for Real-World Domain Generalization Abhimanyu Dubey, Vignesh Ramanathan, Alex Pentland, Dhruv Mahajan. CVPR 2021

The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World William A Gaviria Rojas, Sudnya Diamos, Keertan Ranjan Kini, David Kanter, Vijay Janapa Reddi, Cody Coleman. NeurIPS Benchmarks and Datasets Track 2022.

No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley. NeurIPS Workshop on Machine Learning for the Developing World 2017.

Does Object Recognition Work for Everyone? Terrance DeVries, Ishan Misra, Changhan Wang, Laurens van der Maaten. CVPR Workshop 2019.

Acknowledgements

This material is based upon work partially supported by the National Science Foundation under Grant No. 2145198. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also acknowledge support from Meta AI and the Princeton SEAS Howard B. Wentz, Jr. Junior Faculty Award to OR. We thank Dhruv Mahajan for his valuable insights during the project development phase. We also thank Jihoon Chung, Nicole Meister, Angelina Wang and the Princeton Visual AI Lab for their helpful comments and feedback during the writing process.

Contact

Vikram V. Ramaswamy (vr23@cs.princeton.edu)