Aggregation, Emergence, and Sherlock Holmes

The UN Special Rapporteur Ben Emmerson recently released his report on the impact of mass internet surveillance to the UN General Council. The document, titled “Promotion and protection of human rights and fundamental freedoms while countering terrorism,” is clearly targeted at US intelligence agencies in response to the Snowden disclosures. And while the document is really a discussion of international law, it struck me as an excuse to discuss a significant issue in American criminal law that I find particularly interesting: is information different in the aggregate?

Allow me to explain. The 4th Amendment requires that government searches and seizures be reasonable, and this reasonableness requirement is satisfied by the issuance of a warrant from an impartial magistrate. This is the hallmark of our criminal justice system. However, there are two ways for police to get around the warrant requirement: the first is to fall into a warrant exception, (special cases where the police don’t need a warrant), and the second is to say that the search at issue wasn’t a “search” under the Fourth Amendment. We will be focusing on the latter.

Your immediate reaction is probably a mix of amusement and bewilderment: a search that isn’t a search? Thoughts of former President Clinton and the definition of “is” come to mind. However, closer inspection reveals this to be perfectly logical. A police officer walking down the street who hears a gunshot isn’t really searching: information was exposed to them. When we think of searches, we are thinking of intrusive police behavior like looking through our luggage or patting us down.

So what is the test for a search? Two parts: you need to have an expectation of privacy in the searched area, and that expectation of privacy must be reasonable (acknowledged by society). Sounds simple enough. In practice, this is murky and difficult to apply. After all, if the government tells you it monitors your phone calls, you no longer expect your phone calls to be private and their monitoring your phone calls isn’t a search, right? The circularity is challenging, but beyond the scope of this article. For our purposes, just remember – no expectation of privacy: not a search.

Returning to aggregation, the difficulty arises when something that has no expectation of privacy (not a search) can be performed ad nauseam, generating vast amounts of information about an individual. You may not have an expectation of privacy for the single search, but in the aggregate you might think differently. I think this concept is best understood with examples, and as luck would have it there are abundant relevant examples.

Example #1: Location Data

Location data is probably the easiest to grasp. The Supreme Court says that you have no expectation of privacy (read: not a search) in your public movements. If you drive around town, the police can tail your car without a warrant. In fact we expect it; TV and movies have programmed us to think of undercover cops following unsuspecting criminals in unmarked cars. But nowadays, tailing criminals is inefficient: we have GPS. All modern cars transmit location data, as do all cell phones. Even if you turn off location services, you can still be tracked by proximity to cell towers. And then there are satellites, drones, ubiquitous cameras, you name it. Tracking our location 24/7 has never been easier. I’ll pause while you think of your favorite 1984 reference. Got it? Good. Moving on.

Example #2: License Plates

This follows from location data. You have no expectation of privacy (read: not a search) in your car’s license plate. This should be fairly intuitive, why buy vanity plates if not for other people to read them? And besides, the official purpose of license plates is identification. The opposite would be like having a nametag and claiming that your name is a secret. This means a police officer can look at your license plate, run it through their system, and pull you over for an outstanding charge, all consistent with the 4th Amendment. Does this change when computer scanners on police cars automatically scan every license plate on every car around them?

Example #3: Communication

This is perhaps the most hotly debated, and requires more background: the third party doctrine. The third party doctrine says that you have no expectation of privacy (say it with me now) in information you share with a third party. (This is ignoring special privileges: lawyer, spouse, etc.) The idea is if you share information with someone else, that person can do whatever they want with the information, including giving it to the government. You cannot complain, (or at least not to the government: complain to the person who spilled the beans). This is important because companies qualify for the third party doctrine too: banks, telephone companies, the postal service, etc.

So communications. This includes mail, email, phone calls, and so forth. As you’ll notice, almost all forms of communication require an intermediary, and therefore the third party doctrine applies. For the most part, this refers to addressing information – information you give to the intermediary to facilitate them transmitting your message. UPS can look at the outside of the envelope, the phone company can look at the metadata of your call (to whom, when, how long, etc.) and the government can ask them for that information. For isolated cases, this probably doesn’t seem so bad. There isn’t much information in addressing information. But in the aggregate, such information could reveal intimate personal information about associations, relationships, and so forth. If you’ll recall, the initial NSA revelations were about ubiquitous tracking of telephone metadata; hopefully you now better understand the legal difficulty.

(I’ll pause here just to clarify that these examples are specific, and that the discussion of aggregation would be different, legally, if the information surveilled was different: such as the content of the letter as opposed to the addressing information. If you have an expectation of privacy in the individual case, the problem of aggregation doesn’t apply.)


This is the problem of aggregation. If a single piece of data has no expectation of privacy, than a million should still have no expectation of privacy. 1,000,000 x 0 = 0. When the DC Circuit faced this problem, they created a new legal doctrine called the “mosaic theory” to get around the problem of aggregation. But the Supreme Court rejected that logic. The amount of data generated was troubling, but didn’t decide the case. So to clarify, as it stands, the government can likely engage in ubiquitous surveillance in each of the above examples, without running afoul of the Constitution.

If I were to speculate as to public opinion regarding this fact, I suspect it would not be particularly supportive. There is something unsettling about ubiquity. Yet where the distinction arises is unclear. We acknowledge that the police can tail anyone they want, for as long as they want, and that is fine. Now they can actually do it, and do it for everyone. This is just taking the argument to its logical conclusion.

Naysayers argue that practical realities prevented ubiquitous surveillance in the past. There simply weren’t enough police to follow everyone, so they were forced to be judicious with their resources. This is both true and unpersuasive. To begin, this isn’t articulating a fundamental right. And if it were, what would the right be to? Inefficiency? The problem is the 4th Amendment creates an individualized right, and you as an individual never had a right to avoid this type of police surveillance. Rather fittingly, it is only when the police activity is aggregated across the entire population that it seems unlawful.

This argument really wants to say that we actually do have an expectation of privacy in these individual data points, so that police tracking has always been a search but only required a very low level of suspicion to be reasonable. Remember, police tracking doesn’t require any level of suspicion when it isn’t a search. If these minor intrusions were 4th Amendment searches, they would need to be reasonable, and therefore require some level of suspicion. This would provide an easy solution to ubiquitous surveillance, as the police wouldn’t have any suspicion to track ordinary citizens, making them unreasonable searches, and therefore unconstitutional. But that’s not what how our laws work.

And it’s worth noting, such a change would also likely have unintended consequences. The current paradigm allows police to engage in minimally intrusive investigative work without any 4th Amendment hurdles, altering this would suggest that all investigative activities are “searches,” and must satisfy an appropriate level of suspicion to make them reasonable. While this may sound desirable from a purely civil liberties perspective, it would make prosecutions even more litigious than they currently are, as every single police action could be challenged as unreasonable and would need to be evaluated retrospectively by a judge. And this paradigm would paradoxically imply that a purely fortuitous discovery of criminal activity would be barred by the 4th Amendment, as it was by definition a search that lacked any justifying suspicion.

Or perhaps the argument is that a single tailing isn’t a search, but ubiquitous tracking is. While more amenable to current law, the problem then becomes determining the line where it becomes a search. The premise is familiar: the straw that breaks the camel’s back. But privacy violations are rarely as definitive as back-breaking, so any rule we propagate would be somewhat arbitrary. The better analogy would be blood alcohol content: is there any substantive difference between .079 BAC and .08? Surely not, but the alternative would make BAC meaningless, as incremental increases could be argued ad infinitum. Arbitrary bright line rules are often frustrating, but nonetheless useful for dealing with gradations that exist on a continuum. Yet privacy differs still in that we don’t have quantitative values upon which to generate hard rules. There is no Blood Privacy Content. And perhaps more fundamentally, bright line rules are ill suited for 4th Amendment jurisprudence because the governing language is “reasonableness.” The words arbitrary and reasonable rarely coalesce.

There are much more in depth discussions of mosaic theory elsewhere online, and for the present it appears unlikely to be adopted by the Supreme Court, meaning it is of little practical significance. Which is not to say that the Supreme Court was blind to the difficulties of ubiquitous surveillance: a majority of the Court expressed concern about the possibility of GPS tracking constantly monitoring all US citizens. But this is also the kind of issue the Court may prefer to leave to the other branches of government.

Creepiness Factor

One of my initial reactions is that this may be the natural response to new technology. Technological advances are known to produce the “creepiness effect,” which is exactly what it sounds like. We’ve all experienced it: that first time you were browsing for flights and you saw an American Airline ad appear on unrelated sites; when you first realized that your web browser knew which city you were in; when Facebook altered your friends list based on the profiles you’ve been snooping. Something you didn’t understand was impacting your life in a novel way, and it’s creepy. But soon the creepiness subsides, and we expect to type “weather” into Google and have it to know exactly what city’s weather we mean. This progression from creepy to normal to expected is almost quaint; how quickly we forget the oddity of newness. It is only when the leap is too great, the creepiness outright scary, that the technology is abandoned. But I doubt this reflects a hard line: a natural maximum creepiness. Rather, creepiness creeps. It makes incremental steps, and it is only when the change is too pronounced that we reject it. Aggregation represents a substantial jump in creepiness. So the question becomes, are we ready for it?

I view creepiness to be almost universally detrimental: a relic from the ancient parts of the brain that instinctively distrusts the unknown because it is unknown. If creepiness is just a matter of time and acclimation, then it acts purely to hamper innovation. Instinctive fears absent valid substantiation should not be the foundation for national policies. Which is not to say that I think that this style of thinking will prevail; it’s hard to beat biology. Rather, the policies we craft should recognize when we are halting technology purely because “we aren’t ready for it,” and when we are halting technology because it presents a genuine problem.


Which brings me to the alternative: that aggregation is different. Consider emergence theory. Emergence is best understood as “the whole is greater than the sum of its parts.” And while mathematically frustrating, it’s easy to relate to. The mosaic is the perfect example: each individual tile is insignificant, but the combination allows something greater to emerge. And the world is replete with systems that seem to add complexity as they become more complex. Under a strong emergence framework, these systems generate complexities that cannot be understood by purely studying the components. (By contrast, a weak emergence framework can understand the whole by studying only the components.) In the context of privacy, strong emergence would manifest as an expectation of privacy that emerges from the collection of purely non-private information.

I mention this because the implications of emergence theory support the idea that aggregation is fundamentally different. If something is demonstrably different when considered in the aggregate, this would provide a principled reason for treating the aggregate differently. Unfortunately, emergence doesn’t quite go this far. Strong emergence has yet to be proven, and doing so would call into question our fundamental understanding of the universe. Weak emergence, by contrast, is just a discussion of the features our brains perceive when presented with more information than we can meaningfully comprehend. Rather than attempting to understand every interaction, we focus on patterns and superstructures. But the whole does not become truly greater than the sum of its parts; we simply failed to appreciate the complexity of the parts.

And even if aggregated data had emergent qualities that made it different, is that difference meaningful? I like to pretend that computer aggregation is really just detective work being done by Sherlock Holmes. We may be suspicious of technology, but it’s hard to complain if our secret was deduced by another human. If the intimate details of a person’s life are discovered by purely human observation, we’d call that good detective work. But somehow this changes when a computer does it. I’ve jokingly suggested that the governing principle in this area is that the police’s job cannot be too easy. Computers make detective work too easy; we want them to do things the hard way.

This post is already probably too long, so I’ll finish with one overarching point. Our 4th Amendment law is developed almost exclusively in the context of known criminals attempting to suppress evidence or otherwise overturn a conviction on a technicality. Our expectation of privacy was shaped under this framework; an expectation of privacy for criminals. Law abiding citizens like ourselves wouldn’t be affected. But computers have leveled the playing field, applying the standards we set in a uniform manner. When the expectation of privacy we set for criminals is also applied to us, it suddenly has less appeal.

As is probably clear by my characterization of the argument, I don’t find aggregation to be as problematic as most. Ubiquity is unsettling, but its benefits cannot be denied, and its potential is enormous. And while I am acutely aware of the dangers that it presents, I am of the opinion that those dangers are better controlled by other mechanisms. This is a topic I will delve into in a subsequent post.


6 thoughts on “Aggregation, Emergence, and Sherlock Holmes

  1. Pingback: Vive: A Wearable Blood Alcohol Test | The CACR Supplement

  2. Pingback: Targeted Advertising | The CACR Supplement

  3. Pingback: Tor: The Dark (Online) World | The CACR Supplement

  4. Pingback: You track me, I track you | The CACR Supplement

  5. Pingback: Individualism and Chess | The CACR Supplement

  6. Pingback: Aggregation Episode 3: Revenge of Mosaic Theory | The CACR Supplement

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s