By Kaiser Fung
In data science, a pattern of scandals has emerged. Volkswagen’s gaming of emissions data is the latest example.
In July the CEO of Whole Foods Markets issued a mea culpa after the supermarket was found to have manipulated product data, over-stating the weight of prepackaged produce and meats.
Over the summer, controversy engulfed Ashley Madison, the social network for married people seeking other partners, as hackers managed to extract a huge amount of private data from the company’s servers. General Motors was also reported to have hidden information about a faulty ignition switch linked to over 100 deaths.
While top managers take the fall for these scandals, none of the dubious activities could have happened without the active participation of technical teams. Besides engineers, software developers and product managers, the burgeoning community of data scientists is also complicit in developing the concepts, algorithms and software to enable the deception.
This narrative keeps recurring because the industry treats it as a technological problem requiring a technological solution. Business managers are missing the real issue: The people who collect, store, manage and process our data are not being held to any ethical standards. The emerging data science discipline is expanding so fast that few workers are thinking about the ethical implications of their everyday actions.
In these scandals, the cause of the problem is more human than technical. For example, credit-card data do not show up uninvited to an enterprise data warehouse. After the data are stored, software is written to establish the link between a user’s pseudonym and his name and address. The technical staff is involved both in designing the linkage algorithms and developing the code for implementation.
At various companies I have been a part of these conversations. Business and technical managers debate topics, such as product innovation, user experience, resource requirements, competitive strategies and return on investment. Except in rare cases, the ethics of these decisions are never broached.
This neglect is typically due to lack of attention, awareness or sensitivity. Sometimes, ethical concerns are arrogantly dismissed: If customers don’t like what we’re doing, they don’t have to use our service!
The recent scandals should bring a serious conversation in the business community about the ethics of data. People can hold different ethical standards, but ignoring the issue altogether is no longer viable.
So what can be done? To start, every technical and data team should have training that covers the ethics of using data. Exposing engineers and data scientists to the legal obligations set forth in various terms and conditions is a good place to start, but ethical practices need to go beyond that. A culture needs to be developed in which team members feel comfortable to bring up discussions about ethics.
Kaiser Fung directs the master’s program in applied analytics at Columbia University and is the author, most recently, of Number Sense: How to Use Big Data to Your Advantage.