The Slippery Slope of Big Data in Policing

The Slippery Slope of Big Data in Policing

. 10 min read

In recent months, large tech companies have come under fire from individuals such as Donald Trump Jr. and Ted Cruz for their parallels to the authoritarian party of 1984. George Orwell’s novel, written over 50 years ago, describes a dystopian society with mass surveillance forcing individuals into compliance. Sales of the novel have soared since January 2021 in light of its current relevance, as all sides of the political spectrum accuse each other of being authoritarian, controlling, or restrictive of free speech.

But is the oppressive surveillance of the novel a realistic possibility today? Although government surveillance has not reached the extents described in 1984, there are growing concerns about the use of controversial technologies by law enforcement and private companies which, despite good intentions, invade personal privacy. Clearview AI, a tech company founded in 2017, is indicative of the concerns around the use of technology and illustrates the need for regulation in software. Clearview leverages artificial intelligence (AI), which seeks out patterns in data and allows computers to interpolate or extrapolate based on this, in its product. Specifically, their AI powers a facial recognition search engine: it allows authorized users to take and upload a photo of an individual, match it against a massive dataset, and identify the photographed person.

By their account, Clearview is a massive asset for law enforcement and global safety. They claim that they have “helped law enforcement track down hundreds of at-large criminals, including pedophiles, terrorists and sex traffickers,” and identify victims. They also assert that they are fully compliant with the law. As a result, Clearview software is used by numerous law enforcement agencies around the world, including in Canada, Australia, Brazil, Serbia, and the United Kingdom as well as by private companies who wish to maintain corporate security. At times, these relationships are not officially established between agencies and Clearview; rather, individual law enforcement officers can request the software online. However, it is not currently easily accessible by the general public. In these contexts, it is largely used for the identification of anonymous individuals: for example, in the case where there is footage of a crime but no identification of who is involved, Clearview AI can expedite police searches.

Many reviews of the software have been positive. For example, police in Australia were encouraged to use the software and found it highly effective. One officer, for example, was able to use Clearview on a suspect’s mugshot to identify them from their Instagram account. Similarly, a detective in a sex crimes unit called it “the best thing that has happened to victim identification in the last 10 years”. These indicate that technology can be a massive asset to law enforcement, and can make their jobs easier to perform.

Photo by mikemacmarketing / Wikimedia Commons

While some reviews have been positive, however, Clearview AI is seriously flawed. Controversies surrounding Clearview have emerged from its erosion of privacy, concerns about security, and lack of transparency. Users can take and upload a photo of an individual, and identify them from a database of nearly 3 billion photos (for reference, the FBI has 411 million). This could, in theory, allow for mass identification of any individual who is in the database, which would be a breakdown of personal privacy. The company’s database is based on scrapes of social media sites, such as Facebook, which goes against the social media site’s terms of service.

The security of the database is also unproven: their company data was breached in 2020, and little information is known about whether or not their image databases have effective protection. Furthermore, the company’s lack of transparency means that no one knows for certain how accurate their algorithm is; rather, the company provides an unverified metric of 75% accuracy. Consequently, even when presented by the company itself, there is some admission that there is a significant chance of error, which could, in turn, lead to crucial errors in suspect identification by law enforcement. Lastly, concerns have been raised about Clearview’s alleged ties to white supremacists, which could, in turn, raise questions about racial profiling and bias in its algorithm.


For many, the main problem with Clearview AI is the size of its database and the lack of consent given by individuals in the database. Especially when paired with use by police departments, there seems to be a chilling implication of guilty before proven innocent, as everyone is essentially automatically included in police databases and lineups. At the very least, there is a presumed need to collect data on all innocent people to catch a few guilty ones. While some may argue that innocent people have nothing to hide, and thus have no reason to fight data collection, individuals still have a reasonable right to privacy. This means that, without reasonable motive, the government should not be able to access personal data.

Because images are accessed in violation of social media terms of service, most if not all images in the Clearview database were collected without consent. This has led to a recent report by Canadian privacy commissioners that Clearview is acting illegally in the country. While the long-term repercussions of this finding to the future of facial recognition technologies in Canada and globally are still unknown, one of the privacy commissioners stated that Clearview is “an affront to individuals' privacy rights and inflicts broad-based harm on all members of society who find themselves continually in a police lineup.” The investigation led to Clearview’s withdrawal from Canadian law enforcement; however, it also underscores an increased need for regulation on data privacy and what private companies are allowed to do. Similar investigations underway in Australia and the United Kingdom could further entrench the idea that Clearview AI has strayed too far from an acceptable path.

“an affront to individuals' privacy rights and inflicts broad-based harm on all members of society who find themselves continually in a police lineup.”

The personal implications of AI in policing can be devastating. First, facial recognition software is never perfect and could identify individuals who are completely unrelated to an individual in a source image or source footage. A rise in unregulated data in policing could lead to more misidentifications. Furthermore, drawing back to the comparison to 1984, a decrease in personal privacy can be tied to a restriction in freedom of expression and freedom of thought, as people are often less willing to say what is on their mind if they are fearful of repercussions. While Clearview does not match the book’s absolute lack of personal freedom, it is large enough in scope to create paranoia over being in the wrong place at the wrong time or being seen on camera in the case that it is turned into incriminating evidence.

Databases have long been a part of law enforcement efforts to catch and track criminals who pose a threat to public safety. Some DNA databases, such as the FBI’s Combined DNA Index System (CODIS), have proven to be effective deterrents to repeated crimes. Specifically, DNA databases do this by storing the DNA of convicted criminals or DNA found at a crime scene, thus making repeat crime less likely because of an increased chance of getting caught. Others, like Interpol’s database, track data on a variety of crimes. While both databases are large in scope, there is little controversy surrounding them because they track crimes that have already been committed, or else only contained information that was consensually given. As a result, these databases are more crime-specific and minimize privacy concerns.

Conversely, Clearview’s database is a broad, all-inclusive database of civilians from around the world, resulting in much more criticism. Unfortunately, even though Clearview has been taken up by agencies around the planet, some countries, such as Canada, have very little modern, national legislation surrounding data use in policing, while other countries like the United Kingdom are only recently dictating the bounds of facial recognition. Without clear national regulation, international regulation may develop slowly. Following Canada’s report that Clearview’s actions are illegal, it would make sense moving forward for nations to begin implementing modern privacy laws that dictate the bounds of legal surveillance to be more in line with Interpol or CODIS.


Canadian Privacy Commissioner Ann Cavoukian also pointed out another point of contention with Clearview AI: its digital security. Having been the victim of multiple data breaches, including one into its source code and another into its client list, there is little confidence that Clearview’s database of images is secure. A breach of personal data, such as the biometric data, contained in an individual’s face could lead to problems with identity theft, which in turn could lead to financial losses with credit and bank accounts. Clearview’s data protection is untested quantitatively by an outside source, which essentially means that no one has intentionally attacked the server to determine if and where there are weaknesses to hackers. Consequently, no one is certain if the database is secure, or if unauthorized users could hack into it. As a result, Clearview lacks accountability, and individuals such as ourselves lack the security of our personal data. After all, if no one can set standards for a company like Clearview and no one tells them to improve their system, it becomes substantially easier for the company to scapegoat others if their data is breached. This will only become increasingly problematic, especially if the dataset grows to contain an increasing number of individuals, as more and more people are at risk of losses in the case of a data breach. Thus, there is a clear need for security standards on databases used in law enforcement that contain private, personal data, which also need to be worked into legislation globally.

Photo by freestocks / Unsplash


Finally, the fact that Clearview has not been properly tested is indicative of its lack of transparency as a whole, specifically with regards to how its data is used in policing. It is one thing to use Clearview solely as a victim or suspect recognition tool for crimes that have already been committed, as is currently done; it is another entirely to leverage datasets for predictive policing by tracking individuals who match the characteristics or behaviors of convicted criminals. Predictive policing is already being conducted to some degree in countries such as Australia, where activists are wary of the slippery slope it presents; however, companies with large datasets such as Clearview could further exacerbate its consequences.

Predictive policing is flawed because historical data about who has committed crimes in the past have inherent biases stemming from who and what police prioritize. For example, in Australia, most offensive language crimes are by Indigenous peoples not because they are the only ones who swear, but because the police care less about others using offensive language. As a result, crime databases include mostly Indigenous people as those who commit language crimes, which could then make it seem as though Indigenous peoples are the only ones who swear. This phenomenon could be replicated anywhere for any crime: to give another example, drug arrests in the USA disproportionally target Black individuals, which creates biased historical data. The use of this type of data in conjunction with software like Clearview’s could allow law enforcement to predictively track individuals based on historical trends. This software would then flag said Indigenous peoples, and thus only continue a history of bias in law enforcement.

By and large, big data has come under fire for the continued perpetuation of biases globally, whether for the use of AI in hiring practices or because some facial recognition softwares appear to be ineffective for some people of color. In the case of hiring, companies could store data regarding an applicant’s resume, experience, and other background information, as well as if said applicant was hired or not. Then, by feeding this information to an AI network, they can find patterns in which candidates were most likely to be hired.

Conversely for facial recognition, AI learns to recognize humans by looking for similar qualities in huge datasets of images of faces. However, these datasets have human biases ingrained within them. In hiring, men have historically been more likely to be hired by humans; when AI learned to make decisions based on a dataset of past hires, it perpetuated these biases. Similarly, facial recognition technologies are sometimes given significantly fewer images of people of color, which in turn makes it more difficult for them to recognize minorities. In order to balance both ethics and algorithmic efficacy at the intersection between law enforcement and technology, standards for accuracy and bias must also be worked into laws surrounding data in policing.

While the case of Clearview AI clearly demonstrates the need for regulation on data in policing, national legislation is not enough. Historically, international corporations have been regulated differently based on national laws, and users have had varied experiences depending on their geographic location. For example, China’s Great Firewall prevents access to Google, Facebook, and other sites from within China’s borders. Furthermore, international corporations utilize tax haven countries to minimize taxation and maximize shareholder profits at the cost of welfare for the lower class. These complications have led many, including Google’s top policy chief, to call for international standards for regulation on technology. This is also vital for software like Clearview AI; even with Canada’s recent proclamation that Clearview’s actions were clearly illegal, Canadians are restricted in their ability to limit the company. For example, while Canada has requested for the images of all Canadians to be removed from Clearview’s database, the company has refused to comply; Canada has no way to force them to.

If Clearview AI in its current form is barred in some countries but remains usable in others, crossing a border may mean that your photographs and biometric data are immediately usable by local law enforcement. But individuals should not be put into a police lineup every time they cross a border. The right to privacy should be fairly reasonably maintained across borders, as it has been ordained in numerous national and international documents. If countries continue to allow for vastly different standards of regulation for the use of data in law enforcement, especially in the context of a software as secretive as Clearview AI, these rights to privacy become inconsistent at best and shattered at worst. It, therefore, becomes necessary for an international body to be formed around the topic. Global trends indicate a rising desire for tech regulation globally, which likely increases the feasibility of institutionalizing international regulation on data use and privacy.

So where does this leave us? Users’ experiences highlight the efficacy of Clearview AI, at least for tracking individuals and finding suspects, but it’s also difficult to quantitatively prove its successes. We live in a world of data: as modern technology continues to spread, it will continue to gather and store more and more information about the individuals who use it. Consequently, it seems reasonable to assume that algorithmic and software tools will have a vital role to play in the future of policing and public safety. However, we must also recognize their clear and pressing dangers, especially if they are left completely unrestricted as Clearview AI has been. These apprehensions are especially relevant to law enforcement, their use of technology, and the scope of their operations. Furthermore, for such an ethical standard in policing to be maintained, there must exist a form of international infrastructure that regulates data collection and usage. Thankfully, we are not yet in the world of 1984, although the technologies to make it possible are developing before us. We must, then, approach the frontier of big data in policing with care and caution.