24 Million Images Used For Facial Recognition Were Secretly Scraped All Over The Internet

If ever you’re wondering where do facial recognition systems compare your photos with, know that it is probably compared to your picture that was secretly gathered by governments and tech companies to develop the facial recognition AI.

A database of millions of images secretly extracted by the US and Europe from people’s social media accounts, websites, social media, photo sharing, and online dating platforms and also taken by digital cameras in public places, as well as unencrypted communications, exists and is now being used facial recognition systems around the world. These systems are ubiquitously by police and state intelligence agencies without you knowing that they have a copy of your face on their system.

In research published by Megapixels.cc, a cybersecurity research firm focused on facial recognition, 24 million non-cooperative, non-consensual photos in 30 publicly available face recognition and face analysis datasets.

Out of these 24 million images, 15 million face images are from Internet search engines, over 5.8 million from Flickr.com, over 2.5 million from the Internet Movie Database (IMDb.com), and nearly 500,000 from CCTV footage.

“All 24 million images were collected without any explicit consent, a type of face image that researchers call “in the wild.” Every image contains at least one face, and many photos contain multiple faces,” reads the study.

The researchers approximated that out of all the millions of images they have found in the said datasets; there are one (1) million people who owned those photos. Furthermore, the researchers found out that the majority of the images originated from the USA and China.

Embassy photo found in the dataset. Photo: MegaPixels.cc

However, they claimed that with all the research papers they have analyzed, only 25% of the datasets originated from the USA, and most of the images are taken from Chinese IP addresses. They also highlighted that limitations in their study only allowed them to evaluate research papers written in English and the big implication for this is that there is a possibility that foreign use could be bigger than the actual number they have found out.

The images in the datasets are not only those that can be found from online databases and social media platforms. A considerable number of photos in the analyzed dataset were taken from government databases.

Related: Celebrity Photos, Composite Sketches, And Other Things The Police Feed The Facial Recognition System To Find A Match

For example, out of the 24 million images they have analyzed, at least 8,428 embassy images from at least 42 countries (with most originating from China and US embassies, as earlier mentioned) were found in face recognition and facial analysis datasets. Over 6,000 of the images were from US, British, Italian, and French embassies (mostly US embassies).

“These images were found by cross-referencing Flickr IDs and URLs between datasets to locate 5,667 images in the MegaFace dataset, 389 images in the IBM Diversity in Faces datasets, and 2,372 images in the Who Goes There dataset,” they added.

As part of their findings, the researchers said that these images were used for commercial research by Google (US), Microsoft (US), SenseTime (China), Tencent (China), Mitsubishi (Japan), ExpertSystems (Italy), Siren Solution (Ireland), and Paradigma Digital (Spain); and military research by National University of Defense Technology (China).

The facial recognition phenomenon

Facial recognition technology has been the center of public conversation as well as legislative and regulatory dialogue in the past few years. The focus of the conversation points to how law enforcement agencies, government offices, as well as a private business, use an unregulated technology.

Law enforcement has been very defensive with their use of facial technology in their operations. They argue that technology helps them keep the security of citizens against unlawful elements.

Relevant: Ethical Regulation Of ‘Facial Recognition’ Is A Shared Responsibility

However, the opposite side of the pole asserts that facial recognition technology violates people’s privacy. Human rights and privacy advocates believe that the premise behind facial recognition systems is problematic in itself and law enforcement, big brother governments, and even businesses who have access to the technology can easily track people’s movements against their consent.

They raise their fears that facial recognition may grow to be a social enemy instead of a friend as regulations governing its use is not enough to protect people’s security and privacy.

“Unless we really rein in this technology, there’s a risk that what we enjoy every day — the ability to walk around anonymous, without fearing that you’re being tracked and identified — could be a thing of the past,” said Neema Singh Guliani, the American Civil Liberties Union’s senior legislative counsel.

About the Author

Al Restar
A consumer tech and cybersecurity journalist who does content marketing while daydreaming about having unlimited coffee for life and getting a pet llama. I also own a cybersecurity blog called Zero Day.

Be the first to comment on "24 Million Images Used For Facial Recognition Were Secretly Scraped All Over The Internet"

Leave a comment

Your email address will not be published.


*