Curation

A few weeks ago I wrote about what appears to be the changing nature of obscurity. Under the now nearly ubiquitous model, actively thwarting data collection is no longer likely to obscure you from surveillance, but rather to highlight you. Extending the concept of the security of the majority to data collection suggests that the majority one wants to join is no longer the mass of the unsurveiled. Ubiquitous collection and the wild-west-like data-use market strongly suggest that we’re well over the hill in terms of big data, and we need to update our doctrine to reflect that reality. I imagine there are many clever ways one might mitigate this issue. I propose weaponizing the means of data production through curation.

Recently it’s become vogue to remark upon how the so-called digital age is one characterized by data replacing the famous ‘means of production’ as the most valuable global asset. I like this idea, and one feature of it in particular: its proximity to the great tradition of collective labor movements over the course of the industrial age. Where collective labor movements have traditionally sought to manipulate their relationships through the actions of their member’s bodies (i.e. “going slow”, marching, arming in the Sierra Maestra), in today’s data game, agents must instead take action to carefully manage, or curate, their production of data. When during pre-digital struggle you needed to actively “put your bod[y] upon the gears and upon the wheels”, you now need to take responsibility for the data you produce. The task differs only in its mechanism of action. Both situations require widespread personal agency, involve the same basic evils of short termism and greed, and require dedicated organizational effort. Widespread, well-organized, weaponized physical agency is a force to be reckoned with; let’s digitize it.

First, it should be noted that the ‘other’ doesn’t matter beyond its existence. Whether you’re worried about interference in a democratic process, the canonical basement dwelling password thief or creeping socialism, the benefits of data curation remain. Anecdotal evidence for this can be found in Tor. As an anonymity tool originally designed for intelligence agency use, Tor was made public after it was determined that using the same, identifiable mask for only a single segment of a population was in fact highly deanonymizing. Today, Tor is used by criminals, dissidents, freedom fighters, intelligence operatives and kids trying to circumvent content blocking. Despite the very different threats faced by each, all of those groups took agency with regard to their data by installing and using Tor.

Second, while Tor is great, it’s mostly used to protect active processes. A user decides she’ll do something, decides she’d rather do it anonymously, and therefor fires up Tor to do it. One of the things that makes the modern security landscape so caustic however, is the prevalence of passive data collection. It’s not single a action that must be protected, but general existence. Protecting active processes is good, but not enough. Further, you can’t throw out all of your data-leaking devices because even if it didn’t significantly disrupt your life, you’ll have succeeded only in drawing attention to yourself.

The only remaining solution then, is to constantly curate your data, and because you can’t simply start and stop this process without compromising yourself, this requires a change in doctrine. We must, en masse, develop an attitude toward security which assumes abuse will occur. Institutionalizing this mindset will be difficult, but it is crucial if we want to play any kind of meaningful role in shaping our future.

Let’s zoom in for an example. Over the last 10 years, it’s become very popular in Computer Science (CS) to apply machine learning (ML) of some kind to each and every thing imaginable. User fingerprinting (i.e. differentiating users within a pool) and segmenting (i.e. assigning users to feature-specific groups) are no exceptions. Works like this, this and this have demonstrated the discriminatory power of data sets like what apps you have installed on your smartphone and how often you use those apps. This work appears to be relatively underdeveloped relative to other big-data areas at the moment due mostly to the difficulty in acquiring data sets, and that’s worrisome. It’s not that the datasets don’t exist, but instead of being in the hands of researchers, they seem to be in the hands of entities like Cambridge Analytica and the Chinese government. Scary. But what to do? Curate your app usage!

If you want to appear to be a member of a particular demographic, one might simply use the demographics research - like those linked to above - to determine how you’re being targeted, and modify your usage such that the schemes they use to segment you place you where you want to be placed. We’re largely in control over the inputs, since we produce them. What remains to be done is curation. I’ve mocked up a tiny proof-of-concept tool which, based on a single piece of research, produces a list of applications you should install on your smartphone and use once a month in order to appear to be a member of a specific demographic. Mathematically, it doesn’t hold water, but for the purposes of demonstration, it suffices. It’s classically scientific for a field to begin with specifics, and then generalize. In the case of fuzzing for instance, early research was highly specific, but over time there evolved a sophisticated field which now gives birth to tools which can look at their targets like black-boxes (which an ISP’s, or Gestapo’s internal data manipulation might very well be) and deliver results comparable to those which have knowledge of the system’s inner workings.

Finally, one could be forgiven for imagining that whoever wrote this must wear a several-layer thick tin-foil hat. I counter that Mao’s protracted people’s war suggested to western militaries, and many of his own commanders, a very similar thing. Giving up their capital seemed like lunacy, but Mao was about doctrinal change, and history proved him quite correct. While I don’t believe the future is all doom and gloom - many big-data enabled technologies may serve incredible humanitarian, as well as dystopian, purposes - it’s imperative that we begin considering the potentially horrific outcomes of unregulated, unmitigated (ab)use of massive data collection as soon as possible.

You can find the curation tool here.

# Reads: 3176

Comments:

No Comments Yet!