Traditionally, data gathered for electronic recordkeeping was in the same paradigm as files in a filing cabinet. Data was recorded by a human at some point, filed away, and retrieved (and perhaps updated) as needed. Data that was no longer relevant would be discarded to make room for new data.

Early digital systems were similar: data was input by human beings, created by computer systems, or sensed within an environment, and then more or less filed away to be retrieved later, when needed.

Modern data management can be mapped to three key stages:

1. Disclosing / Sensing—humans or machines that gather and record data.

2. Manipulating / Processing—aggregation, transformation, and/ or analysis that turns data into useful information.

3. Consuming / Applying—a person or machine uses information to derive insights that can then be used to affect change.


Data Disclosure

Data at Rest

Data may be sourced from archives or other backups


Guideline: Ensure the context of original consent is known and respected; data security practices should be revisited on a regular basis to minimize risk of accidental disclosure. Aggregation of data from multiple sources often represents a new context for disclosure; have the responsible parties made a meaningful effort to renew informed consent agreements for this new context?

Data in Motion

Data is collected in real-time from machine sensors, automated processes, or human input; while in motion, data may or may not be retained, reshaped, corrupted, disclosed, etc.


Guideline: Be respectful of data disclosers and the individuals behind the data. Protect the integrity and security of data throughout networks and supply chains. Only collect the minimum amount of data needed for a specific application. Avoid collecting personally identifiable information, or any associated meta-data whenever possible. Maximize preservation of provenance.

 

Data Manipulation

Data at Rest

Data is stored locally without widespread distribution channels; all transformations happen locally


Guideline: Set up a secure environment for handling static data so the risk of security breaches is minimized and data is not mistakenly shared with external networks. Data movement and transformation should be fully auditable.

Data in Motion

Data is actively being moved or aggregated; data transformations use multiple datasets or API calls which might be from multiple parties; the Internet may be used


Guideline: Ensure that data moving between networks and cloud service providers is encrypted; shared datasets should strive to minimize the amount of data shared and anonymize as much as possible. Be sure to destroy any temporary databases that contain aggregated data. Are research outcomes consistent with the discloser’s original intentions?

 

Data Consumption

Data at Rest

Data analytics processes do not rely on live or real-time updates


Guideline: Consider how comfortable data disclosers would be with how the derived insights are being applied. Gain consent, preferably informed consent, from data disclosers for application-specific uses of data.

Data in Motion

Data insights could be context-aware, informed by sensors, or might benefit from streamed data or API calls


Guideline: The data at rest guidelines for data consumption are equally important here. In addition, adhere to any license agreements associated with the APIs being used. Encrypt data. Be conscious of the lack of control over streamed data once it is broadcast. Streaming data also has a unique range of potential harms—the ability to track individuals, deciphering network vulnerabilities, etc.


Historically, digital systems were not as interoperable and networked as they are now, and so data could be thought of as being “at rest”—stored statically, just like the files in filing cabinets of the past. But today, some data is in near-constant motion. When we access social media sites, we’re not just pulling static data from some digital filing cabinet— we are accessing data which is in constant transformation. For example, algorithms shift which news stories are displayed to us based on an ever-evolving model around our tastes and the tastes of other users. An action taken in an app, like an online retailer, connected to a user’s social media account could change the content delivered to a user. Managing the complexity of consent and potential harms in this environment is much harder than the connection between traditional market research and mass-purchase, broadcast advertising.

This data in motion is much harder to comprehend at scale. The chaotic effects of multiple, interoperable systems and their data playing off each other makes it difficult for design and development stakeholders to see the big picture of how data might affect their users—much less communicate to those users for the purposes of informed consent or doing no harm. Data in motion can be relatively straightforward in the context of the flow of interactions through each stage of disclosing, manipulating, and consuming data. However, although it can be tempting to think of data as a file moving from one filing cabinet to another, it is, in fact, something more dynamic, which is being manipulated in many different ways in many different locations, more or less simultaneously. It becomes even more ambiguous when a lack of interaction with a piece of data could still be used to draw conclusions about a user that they might otherwise keep private.

This data in motion is much harder to comprehend at scale. The chaotic effects of multiple, interoperable systems and their data playing off each other makes it difficult for design and development stakeholders to see the big picture of how data might affect their users.

For example, ride-sharing apps need to collect location information about drivers and passengers to ensure the service is being delivered. This makes sense in “the moment” of using the app. However, if the app’s consent agreement allows location data to be collected regardless of whether or not the driver or rider is actually using the app, a user may be passively providing their location information without being actively aware of that fact. In such cases, the application may be inferring things about that passenger’s interest in various goods or services based on the locations they travel to, even when they’re not using the app.

Given that location data may be moving through mapping APIs, or used by the app provider in numerous ways, a user has little insight into the real-time use of their data and the different parties with whom that data may be shared. For users of the ride-sharing app, this may cause concern that their location data is being used to profile their time spent outside the app—information that could be significant if, for example, an algorithm determines that a driver who has installed the app is also driving with a competing ridesharing provider.¹ Without clear consent agreements, interpretation of where data moves and how it is used becomes challenging for the user and can erode the trust that their best interests are being served.