HTTP with Accountability: It Even Rhymes

I thought I’d use this week to discuss a new internet protocol: HTTPA. HTTPA refers to hypertext transfer protocol (think the beginning of every website) with accountability. HTTPA is a new proposed web protocol that would place greater emphasis on data transparency and use restrictions for personal data. It was developed by researchers at MIT and is one of a suite of new coding protocols proposed to improve internet interoperability, security, and efficiency. First things first, this protocol would not reflect a categorical shift in how websites operate. One of the great (and terrible) things about internet governance is that standards and norms are voluntarily adopted, so even when new protocols offer distinct advantages they are not required to be adopted by all websites. Rather, HTTPA provides a voluntary platform for websites to increase data transparency and facilitate the creation of architectural restrictions on sensitive data usages, all to better ensure its users’ privacy is protected.

HTTPA works by assigning identifiers to individual data points called uniform resource identifiers (URIs) which allow the user to track the data’s movement. The researchers that developed the protocol also implemented a system for the selection of use restrictions on individual pieces of data, primarily by selecting with whom the data could be shared. I like to think of HTTPA as turning individual pieces of data into data structures, which are uniquely identifiable and articulate how they should be used. This allows the data subject to track the data’s use history through storage logs, and to potentially determine what kinds of use restrictions they would like to impose on the data. Initial tech coverage said it was like turning a website from a text file into a searchable database, and this is a decent analogy.

The logic of HTTPA is simple. People are fearful that companies that aggregate data might be invading their privacy with little or no accountability for their actions. HTTPA attempts to combat this problem by empowering users to better understand when and how their data is used, and thereby hold companies accountable for undesirable data usages. The more you know. . . With HTTPA you would know what companies do with your data. In general I think it’s a good idea. However, I have my quibbles.


Data Ownership

I’ll start with the concept of data ownership. Most discussions of data imply, if not outright assert, ownership by the data subject (person the data is about) without delving into the details of when and why data is owned. The popular logic seems to be akin to “I own the data because the data is about me.” Naturally the data subject would want ownership of the data: ownership gives you power. And privacy law lends this some support, as privacy is often defined as “control over information about oneself.” Yet control is not ownership, and that control is rarely as extensive as many would like.

Without delving too deeply into theories of property, I think it’s worth considering why anyone ever owns intangible property. Intellectual property rights typically arise from the provision of work into the information that you assert rights to. For patents, you put work into researching and developing the invention; for copyright, you put creative energy into the expressive work; for trademarks, you generate goodwill associated with your business. And while not a hard requirement to acquire rights, giving someone property rights over information in the absence of work is an unusual proposition. American law typically favors the free flow of information, and granting intellectual property rights in data simply because of its association with the data subject would serve as a powerful restraint on information freedom and the freedom of speech. Remember, data are ultimately just facts, and facts are the backbone of speech.

And even if one’s status as data subject were sufficient to generate property rights, which data subjects get those rights? For instance, if we are discussing internet browsing history data, why is that data solely the property of the person browsing? The data are just as much about the websites visited, the ISPs that made the connection, and the devices used. If there is evidence of NSA snooping in the data, do they get rights? The data are ostensibly about them too. Unless the argument is that only natural persons (non-corporate people) have this automatic right to data ownership, but then what about when multiple people interact online? Do they share data ownership? Must they vote before the data may be used?

The great irony underlying these seemingly absurd questions is that individual data points are arguably worthless. The value of data comes from its aggregation, so quibbling over the ownership of individual points isn’t worth the effort. Indeed the cost of determining ownership and acceptable use of the data would almost certainly exceed any value that it may glean. The notable exceptions to this would be things like financial data, but these have alternate safeguards that do not rely on explicit property rights in the data. And one could easily argue that even if we have rights in the data, we have bargained it away in exchange for free internet services. You let Google target ads based on data, and in return you get free web services.

I don’t think that property rights are what most people want or need. Rather, people simply don’t want information about them used in a manner that is harmful to them. But too often the dialogue isn’t approached in this manner, with the focus on harms. Data is frequently discussed as a possession: your data, our data, etc. But data policies with this focus are often inefficient, arbitrary, or outright harmful. Rather, data regulations should focus on the harmful uses of data, and how to prevent them.

Harms and Transparency

And when viewed through this lens of a harms-based approach to privacy, I think HTTPA has some interesting potential. The greatest direct benefit is transparency by design. HTTPA allows for the direct tracking of the use of data, including transfers to other parties. This emphasis on transparency means that data practices can be reliably and accurately monitored to ensure compliance with privacy regulations and best practices. It also allows for the data practices of corporations to be made known to the user, so they may select among corporations for the one that best reflects their desired data uses.

However, I’m not convinced that giving this transparency directly to the user is necessarily beneficial. For one, I doubt most internet users actually care enough to monitor the mundanities of data transfers within a corporation, or even between corporations. Most companies already provide this information through privacy policies, and even if not, the vast majority of users probably lack the technical proficiency to effectively utilize the technology, assuming they care enough to devote the time. Furthermore, transparency does not ensure understanding, and it’s easy to imagine data practices being misconstrued or misunderstood due to inappropriate context or mischaracterization. Ultimately, I’d probably still err on the side of greater information flow to consumers, as the risk of misunderstanding information is rarely a good reason for restricting it.

I bring up the potential shortcomings not to discourage transparency for consumers, but rather to emphasize transparency for federal and non-profit agencies. Although consumers occasionally are effective industry-police, far more often it is these public interest groups that ensure compliance with privacy protections and inform the public of undesirable data practices. Consumers may raise awareness on a grass-roots level, but the FTC can bring enforcement actions to punish deceptive practices. These groups are also in a better position to engage with companies in meaningful discussions on the propriety, benefits, and drawbacks of particular data uses. Often the intricacies of how data is used, when it is used, and the impact of that use are difficult to fully appreciate without in-depth analysis and experience in the field. Whereas an average user might assume that any unauthorized use of data is inherently violative of their privacy, rarely are such absolute views useful or true. It’s for this and many other reasons that regulations in tech-heavy fields must be drafted and enforced by those who specialize in the area.

But transparency is only one of the benefits HTTPA boasts. Perhaps the more interesting feature is the discussion of use restrictions. Initial news coverage gave the impression that this was an extremely powerful feature: the user can articulate restrictions for sensitive data and the system would prevent use that failed to comport with those restrictions. In practice, the system is not quite this robust. Rather, the researchers allowed users to identify certain information as sensitive, which then allowed them to select their desired information sharing practices. But the users ultimately needed to self-monitor compliance with their restrictions using HTTPA’s transparency provisions. The use restrictions were not architectural, which is to say that the restrictions imposed were not enforced through requirements in the code.

This lack of architectural protections suggests to me that the greatest shortcoming of this technology will be simple information overload. The buzzword for this is “Big Data,” which refers to the current practice of aggregating data and the immense data processing required to discern any meaning from it. And while making this data transparent is nice, it doesn’t do much good without the analytics to comprehend it. This point should be immediately apparent to anyone who has attempted to re-locate a website by looking through their browsing history. It feels like digging through your trash can – there’s no way I actually go through this much stuff in one day! And this is just a list of websites you visited. Now multiply that by every data point conceivably collected by those websites, and then multiply that by each of the various uses they are putting it through.

But perhaps this argument isn’t fair. After all, most people aren’t looking to protect all of their data; they are only concerned with particularly sensitive data. This would clearly reduce the amount of data that needed to be tracked. And while I agree with this to a certain degree, I’m unconvinced it will make the data manageable. Data overload will ultimately be commensurate with one’s personal degree of privacy protectiveness: if you only care about a single piece of data on one website, it will be reasonable to keep track of. But as you increase the volume of data and the number of places monitored, the system rapidly becomes unworkable. And the data that most people would consider private is still entirely too much to effectively monitor. Health data, financial data, and location data are all easy examples of things most people value, but each of these entails unmanageable quantities of data.

And perhaps even more fundamentally, data is complicated. Consider the researcher’s test case: they created a medical records database where the patient could identify health data as sensitive (the example they used was a diagnosis of HIV-AIDS). But saying that the data is sensitive doesn’t fully explain how it should be handled. Surely the doctor can still see it. And all of the nurses. And probably the hospital pharmacists. Then there are insurance companies, credit card companies, and possibly family members. Restricting disclosure is a complicated process, as even private information is necessarily seen by a large number of people. And what data is included? Presumably there was a medical record that stated the diagnosis explicitly, but what about prescriptions for antiretroviral drugs? Or treatment by an HIV-AIDS specialist? Or even just admission to the hospital? This increasing data granularity already makes privacy a herculean task for covered entities under HIPAA; the idea that individuals will be able to meaningfully track this information is dubious.

Not only are consumers poorly equipped to monitor the complexities of data, they often aren’t the appropriate party to determine allowable uses. Notions of privacy vary substantially between individuals, and requiring companies to comply with the idiosyncrasies of each person’s personal privacy policy is unworkable. And notwithstanding this interindividual variation, individuals often would impose policies that are frankly unreasonable. Suppose a person didn’t want the hospital to share data with their insurance company, which needs that data to process their claim, do we let the individual’s preferences prevail? The clear answer is no. Although we want to respect individuals privacy choices when possible, ultimately we need objective policies that address the practical realities these companies must manage.

Despite my criticisms, I actually like the idea of HTTPA. Especially with regard to health records, there is an increasing demand among consumers to have access to protected data, and increasing this information access is hard to argue with. And although data is likely expanding beyond the bounds of consumer oversight, transparency by design is a useful tool for more sophisticated actors to audit the data practices of large companies. Yet I am unconvinced that the benefit it provides is truly meaningful as compared to the system we currently have. While HTTPA would certainly make it more difficult to obfuscate deceptive practices, as a voluntary protocol, it is only likely to be adopted by those who are already striving to ensure privacy protections. As much as I like privacy by design, I doubt this technology will have any substantial impact. But I always look forward to being proven wrong!


Until next time


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s