Commit 3a27f6d6 authored by O'Reilly Media, Inc.'s avatar O'Reilly Media, Inc.

Initial commit

parents
## Example files for the title:
# Ethics and Data Science, by Mike Loukides
[![Ethics and Data Science, by Mike Loukides](http://akamaicovers.oreilly.com/images/9781492043874/cat.gif)](https://www.safaribooksonline.com/library/view/title/9781492043898//)
The following applies to example files from material published by O’Reilly Media, Inc. Content from other publishers may include different rules of usage. Please refer to any additional usage rights explained in the actual example files or refer to the publisher’s website.
O'Reilly books are here to help you get your job done. In general, you may use the code in O'Reilly books in your programs and documentation. You do not need to contact us for permission unless you're reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from our books does not require permission. Answering a question by citing our books and quoting example code does not require permission. On the other hand, selling or distributing a CD-ROM of examples from O'Reilly books does require permission. Incorporating a significant amount of example code from our books into your product's documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN.
If you think your use of code examples falls outside fair use or the permission given here, feel free to contact us at <permissions@oreilly.com>.
Please note that the examples are not production code and have not been carefully tested. They are provided "as-is" and come with no warranty of any kind.
Case Studies
============
To help us think seriously about data ethics, we need case studies that
we can discuss, argue about, and come to terms with as we engage with
the real world. Good case studies give us the opportunity to think
through problems before facing them in real life. And case studies show
us that ethical problems aren't simple. They are multifaceted, and
frequently there's no single right answer. And they help us to
recognize there are few situations that don't raise ethical questions.
Princeton's [Center for Information Technology
Policy](https://citp.princeton.edu/) and [Center for Human
Values](https://uchv.princeton.edu/) have created four anonymized [case
studies](https://aiethics.princeton.edu/case-studies/) to promote the
discussion of ethics. (More are in the pipeline, and may be available by
the time you read this.) The first of these studies, [Automated
Healthcare App](http://bit.ly/2LaqfUJ), discusses a smartphone app
designed to help adult onset diabetes patients. It raises issues like
paternalism, consent, and even language choices. Is it OK to "nudge"
patients toward more healthy behaviors? What about automatically
moderating the users' discussion groups to emphasize scientifically
accurate information? And how do you deal with minorities who don't
respond to treatment as well? Could the problem be the language itself
that is used to discuss treatment?
The next case study, [Dynamic Sound
Identification](http://bit.ly/2mv6x7q), covers an application that can
identify voices, raising issues about privacy, language, and even
gender. How far should developers go in identifying potential harm that
can be caused by an application? What are acceptable error rates for an
application that can potentially do harm? How can a voice application
handle people with different accents or dialects? And what
responsibility do developers have when a small experimental tool is
bought by a large corporation that wants to commercialize it?
The [Optimizing Schools](http://bit.ly/2uCZQVn) case study deals with
the problem of finding at-risk children in school systems. Privacy and
language are again an issue; it also raises the issue of how decisions
to use data are made. Who makes those decisions, and who needs to be
informed about them? What are the consequences when people find out how
their data has been used? And how do you interpret the results of an
experiment? Under what conditions can you say that a data experiment has
really yielded improved educational results?
The final case study, [Law Enforcement Chatbots](http://bit.ly/2JFm3a4),
raises issues about the trade-off between liberty and security,
entrapment, openness and accountability, and compliance with
international law.
None of these issues are simple, and there are few (if any) "right
answers." For example, it's easy to react against perceived paternalism
in a medical application, but the purpose of such an application is to
encourage patients to comply with their treatment program. It's easy to
object to monitoring students in a public school, but students are
minors, and schools by nature handle a lot of private personal data.
Where is the boundary between what is, and isn't, acceptable? What's
important isn't getting to the correct answer on any issue, but to make
sure the issue is discussed and understood, and that we know what
trade-offs we are making. What is important is that we get practice in
discussing ethical issues and put that practice to work in our jobs.
That's what these case studies give us.
Of Oaths and Checklists
=======================
> "Oaths? We don't need no stinkin' oaths." (With apologies to
> Humphrey Bogart in *Treasure of the Sierra Madre*.)
Over the past year, there has been a great discussion of data ethics,
motivated in part by discomfort over "fake news," targeted advertising,
algorithmic bias, and the effect that data products have on individuals
and on society. Concern about data ethics is hardly new; the
[ACM](http://ethics.acm.org/code-of-ethics),
[IEEE](https://www.ieee.org/about/compliance.html), and the [American
Statistical Association](http://bit.ly/2zWQAk0) all have ethical codes
that address data. But the intensity with which we've discussed ethics
shows that something significant is happening: data science is coming of
age and realizing its responsibilities. A better world won't come about
simply because we use data; data has its dark underside.
The recent discussion frequently veers into a discussion of [data
oaths](http://bit.ly/2zRvEe8), looking back to the ancient [Hippocratic
Oath](http://bit.ly/2A4QiYA) for doctors. Much as we appreciate the work
and the thought that goes into oaths, we are skeptical about their
value. Oaths have several problems:
- They're one-shots. You take the oath once (if at all), and that's
it. There's no reason to keep it in the front of your
consciousness. You don't recite it each morning. Or evaluate
regularly whether you're living up to the ideals.
- Oaths are a set of very general and broad principles. Discussions of
the Hippocratic Oath begin with the phrase "First, do no harm,"
words that don't actually appear in the oath. But what does "do no
harm" mean? For centuries doctors did very little but harm (many
people died because doctors didn't believe they needed to wash their
hands). The doctors just didn't know they were doing harm. Nice
idea, but short on the execution. And data science (like medicine)
is all about execution.
- Oaths can actually give cover to people and organizations who are
doing unethical work. It's easy to think "we can't be unethical,
because we endorsed this oath." It's not enough to say "don't be
evil." You have to not be evil.
- Oaths do very little to connect theories and principles to practice.
It is one thing to say "researchers must obtain informed consent";
it's an entirely different thing to get informed consent at
internet scale. Or to teach users what "informed consent" means.
We are not suggesting that the principles embodied in oaths aren't
important, just that they don't get us to the endpoint we want. They
don't connect our ideas about what's good or just to the practices
that create goodness and justice. We can talk a lot about the importance
of being fair and unbiased without knowing about how to be fair and
unbiased. At this point, the oath actually becomes dangerous: it becomes
a tool to convince yourself that you're one of the good guys, that
you're doing the right thing, when you really don't know.
Oaths are good at creating discussion---and, in the past year, they have
created quite a lot of discussion. The discussion has been tremendously
helpful in making people aware of issues like algorithmic fairness. The
discussion has helped software developers and data scientists to
understand that their work isn't value-neutral, that their work has real
impact, both good and bad, on real people. And there has been a vigorous
debate about what self-government means for data scientists, and what
guiding principles would last longer than a few years. But we need to
take the next step, and connect these ideas to practice. How will we do
that?
In 2009, Atul Gawande wrote *The Checklist Manifesto* (Macmillan), a
short book on how not to make big mistakes. He writes a lot about his
practice as a surgeon. In a hospital, everyone knows what to do.
Everyone knows that you're supposed to scrub down before the surgery.
Everyone knows that you're not supposed to amputate the wrong leg.
Everyone knows that you're not supposed to leave sponges and other
equipment in patients when you close the incision.
But mistakes are made, particularly when people are in stressful
environments. The surgeon operates on the wrong leg; the sponge is left
behind; and so on. Gawande found that, simply by creating checklists for
basic things you shouldn't forget, these mistakes could be eliminated
almost completely. Yes, there were some doctors who found the idea of
checklists insultingly simple; they were the ones who continued making
mistakes.
Unlike oaths, checklists connect principle to practice. Everyone knows
to scrub down before the operation. That's the principle. But if you
have to check a box on a form after you've done it, you're not likely
to forget. That's the practice. And checklists aren't one-shots. A
checklist isn't something you read once at some initiation ceremony; a
checklist is something you work through with every procedure.
What would a checklist for data science and machine learning look like?
The [UK Government's Data Ethics Framework](http://bit.ly/2NvJ0ik) and
[Data Ethics Workbook](http://bit.ly/2O69wjh) is one approach. They
isolate seven principles, and link to detailed discussions of each. The
workbook asks a number of open-ended questions to probe your compliance
with these principles. Our criticism is that their process imposes a lot
of overhead. While anyone going through their entire process will
certainly have thought carefully about ethical issues, in practice,
asking developers to fill out a workbook with substantive answers to 46
questions is an effective way to ensure that ethical thought doesn't
happen.
We believe that checklists are built around simple, "have we done this?"
questions---and they are effective because they are simple. They don't
leave much room to wiggle. Either you've analyzed how a project can be
abused, or you haven't. You've built a mechanism for gathering consent,
or you haven't. Granted, it's still possible to take shortcuts: your
analysis might be inadequate and your consent mechanism might be flawed,
but you've at least gone on record for saying that you've done it.
Feel free to use and modify this checklist in your projects. It covers
most of the bases that we've seen discussed in various data oaths. Go
over the checklist when starting a project so the developers know what's
needed and aren't surprised by a new set of requirements at the last
minute. Then work through it whenever you release software. Go through
it, and actually check off all the boxes before your product hits the
public.
Here's a checklist for people who are working on data projects:
❏ Have we listed how this technology can be attacked or abused?
❏ Have we tested our training data to ensure it is fair and
representative?
❏ Have we studied and understood possible sources of bias in our data?
❏ Does our team reflect diversity of opinions, backgrounds, and kinds of
thought?
❏ What kind of user consent do we need to collect to use the data?
❏ Do we have a mechanism for gathering consent from users?
❏ Have we explained clearly what users are consenting to?
❏ Do we have a mechanism for redress if people are harmed by the
results?
❏ Can we shut down this software in production if it is behaving badly?
❏ Have we tested for fairness with respect to different user groups?
❏ Have we tested for disparate error rates among different user groups?
❏ Do we test and monitor for model drift to ensure our software remains
fair over time?
❏ Do we have a plan to protect and secure user data?
Oaths and codes of conduct have their value. The value of an oath isn't
the pledge itself, but the process you go through in developing the
oath. People who work with data are now having discussions that would
never have taken place a decade ago. But discussions don't get the hard
work done, and we need to get down to the hard work. We don't want to
talk about how to use data ethically; we want to use data ethically.
It's hypocritical to talk about ethics, but never do anything about it.
We want to put our principles into practice. And that's what checklists
will help us do.
Doing Good Data Science
=======================
The hard thing about being an ethical data scientist isn't understanding
ethics. It's the junction between ethical ideas and practice. It's doing
good data science.
There has been a lot of healthy discussion about data ethics lately. We
want to be clear: that discussion is good, and necessary. But it's also
not the biggest problem we face. We already have good standards for data
ethics. The [ACM's code of ethics](http://bit.ly/2zUY4E7), which dates
back to 1993, and is currently being updated, is clear, concise, and
surprisingly forward-thinking; 25 years later, it's a great start for
anyone thinking about ethics. The [American Statistical
Association](http://bit.ly/2mzaMPw) has a good set of ethical guidelines
for working with data. So, we're not working in a vacuum.
And we believe that most people want to be fair. Data scientists and
software developers don't want to harm the people using their products.
There are exceptions, of course; we call them criminals and con artists.
[Defining "fairness" is
difficult](http://bit.ly/problem-build-fair-sys), and perhaps
impossible, given the many crosscutting layers of "fairness" that we
might be concerned with. But we don't have to solve that problem in
advance, and it's not going to be solved in a simple statement of
ethical principles, anyway.
The problem we face is different: how do we put ethical principles into
practice? We're not talking about an abstract commitment to being fair.
Ethical principles are worse than useless if we don't allow them to
change our practice, if they don't have any effect on what we do
day-to-day. For data scientists, whether you're doing classical data
analysis or leading-edge AI, that's a big challenge. We need to
understand how to build the software systems that implement fairness.
That's what we mean by doing good data science.
Any code of data ethics will tell you that you shouldn't collect data
from experimental subjects without informed consent. But that code won't
tell you how to implement "informed consent." Informed consent is easy
when you're interviewing a few dozen people in person for a psychology
experiment. Informed consent means something different when someone
clicks an item in an online catalog (hello, Amazon), and ads for that
item start following them around *ad infinitum*. Do you use a pop-up to
ask for permission to use their choice in targeted advertising? How many
customers would you lose if you did so? Informed consent means something
yet again when you're asking someone to fill out a profile for a social
site, and you might (or might not) use that data for any number of
experimental purposes. Do you pop up a consent form in impenetrable
legalese that basically says "we will use your data, but we don't know
for what"? Do you phrase this agreement as an opt-out, and hide it
somewhere on the site where nobody will find it?
That's the sort of question we need to answer. And we need to find ways
to share best practices. After the ethical principle, we have to think
about the implementation of the ethical principle. That isn't easy; it
encompasses everything from user experience design to data management.
How do we design the user experience so that our concern for fairness
and ethics doesn't make an application unuseable? Bad as it might be to
show users a pop-up with thousands of words of legalese, laboriously
guiding users through careful and lengthy explanations isn't likely to
meet with approval, either. How do we manage any sensitive data that we
acquire? It's easy to say that applications shouldn't collect data about
race, gender, disabilities, or other protected classes. But if you don't
gather that data, you will have trouble testing whether your
applications are fair to minorities. Machine learning has proven to be
very good at figuring its own proxies for race and other classes. Your
application wouldn't be the first system that was unfair despite the
best intentions of its developers. Do you keep the data you need to test
for fairness in a separate database, with separate access controls?
To put ethical principles into practice, we need space to be ethical. We
need the ability to have conversations about what ethics means, what it
will cost, and what solutions to implement. As technologists, we
frequently share best practices at conferences, write blog posts, and
develop open source technologies---but we rarely discuss problems such
as how to obtain informed consent.
There are several facets to this space that we need to think about.
Foremost, we need corporate cultures in which discussions about
fairness, about the proper use of data, and about the harm that can be
done by inappropriate use of data can be considered. In turn, this means
that we can't rush products out the door without thinking about how
they're used. We can't allow "internet time" to mean ignoring the
consequences. Computer security has shown us the consequences of
ignoring the consequences: many companies that have never taken the time
to implement good security practices and safeguards are now paying with
damage to their reputations and their finances. We need to do the same
when thinking about issues like fairness, accountability, and unintended
consequences.
We particularly need to think about the unintended consequences of our
use of data. It will never be possible to predict all the unintended
consequences; we're only human, and our ability to foresee the future is
limited. But plenty of unintended consequences could easily have been
foreseen: for example, Facebook's "Year in Review" that [reminded people
of deaths and other painful events](http://bit.ly/2JJBaPI). Moving fast
and breaking things is unacceptable if we don't think about the things
we are likely to break. And we need the space to do that thinking: space
in project schedules, and space to tell management that a product needs
to be rethought.
We also need space to stop the production line when something goes
wrong. This idea goes back to Toyota's
[Kanban](https://en.wikipedia.org/wiki/Kanban): any assembly line worker
can [stop the line](https://en.wikipedia.org/wiki/Autonomation) if they
see something going wrong. The line doesn't restart until the problem is
fixed. Workers don't have have to fear consequences from management for
stopping the line; they are trusted, and expected to behave responsibly.
What would it mean if we could do this with product features? If anyone
at Facebook could have said "wait, we're getting complaints about Year
in Review" and pulled it out of production until someone could
investigate what was happening?
It's easy to imagine the screams from management. But it's not hard to
imagine a Toyota-style "stop button" working. After all, Facebook is the
poster child for continuous deployment, and they've often talked about
how new employees push changes to production on their first day. Why not
let employees pull features out of production? Where are the tools for
instantaneous undeployment? They certainly exist; continuous deployment
doesn't make sense if you can't roll back changes that didn't work. Yes,
Facebook is a big, complicated company, with a big complicated product.
So is Toyota. It worked for them.
The issue lurking behind all of these concerns is, of course, corporate
culture. Corporate environments can be hostile to anything other than
short-term profitability. That's a consequence of poor court decisions
and economic doctrine, particularly in the US. But that inevitably leads
us to the biggest issue: how to move the needle on corporate culture.
Susan Etlinger has suggested that, in a time when public distrust and
disenchantment is running high, [ethics is a good
investment](http://bit.ly/2O4Iuc1). Upper-level management is only
starting to see this; changes to corporate culture won't happen quickly.
Users want to engage with companies and organizations they can trust not
to take unfair advantage of them. Users want to deal with companies that
will treat them and their data responsibly, not just as potential profit
or engagement to be maximized. Those companies will be the ones that
create space for ethics within their organizations. We, the data
scientists, data engineers, AI and ML developers, and other data
professionals, have to demand change. We can't leave it to people that
"do" ethics. We can't expect management to hire trained ethicists and
assign them to our teams. We need to live ethical values, not just talk
about them. We need to think carefully about the consequences of our
work. We must create space for ethics within our organizations. Cultural
change may take time, but it will happen---if we are that change. That's
what it means to do good data science.
logo.png

6.86 KB

Of Oaths and Checklists
=======================
> "Oaths? We don't need no stinkin' oaths." (With apologies to
> Humphrey Bogart in *Treasure of the Sierra Madre*.)
Over the past year, there has been a great discussion of data ethics,
motivated in part by discomfort over "fake news," targeted advertising,
algorithmic bias, and the effect that data products have on individuals
and on society. Concern about data ethics is hardly new; the
[ACM](http://ethics.acm.org/code-of-ethics),
[IEEE](https://www.ieee.org/about/compliance.html), and the [American
Statistical Association](http://bit.ly/2zWQAk0) all have ethical codes
that address data. But the intensity with which we've discussed ethics
shows that something significant is happening: data science is coming of
age and realizing its responsibilities. A better world won't come about
simply because we use data; data has its dark underside.
The recent discussion frequently veers into a discussion of [data
oaths](http://bit.ly/2zRvEe8), looking back to the ancient [Hippocratic
Oath](http://bit.ly/2A4QiYA) for doctors. Much as we appreciate the work
and the thought that goes into oaths, we are skeptical about their
value. Oaths have several problems:
- They're one-shots. You take the oath once (if at all), and that's
it. There's no reason to keep it in the front of your
consciousness. You don't recite it each morning. Or evaluate
regularly whether you're living up to the ideals.
- Oaths are a set of very general and broad principles. Discussions of
the Hippocratic Oath begin with the phrase "First, do no harm,"
words that don't actually appear in the oath. But what does "do no
harm" mean? For centuries doctors did very little but harm (many
people died because doctors didn't believe they needed to wash their
hands). The doctors just didn't know they were doing harm. Nice
idea, but short on the execution. And data science (like medicine)
is all about execution.
- Oaths can actually give cover to people and organizations who are
doing unethical work. It's easy to think "we can't be unethical,
because we endorsed this oath." It's not enough to say "don't be
evil." You have to not be evil.
- Oaths do very little to connect theories and principles to practice.
It is one thing to say "researchers must obtain informed consent";
it's an entirely different thing to get informed consent at
internet scale. Or to teach users what "informed consent" means.
We are not suggesting that the principles embodied in oaths aren't
important, just that they don't get us to the endpoint we want. They
don't connect our ideas about what's good or just to the practices
that create goodness and justice. We can talk a lot about the importance
of being fair and unbiased without knowing about how to be fair and
unbiased. At this point, the oath actually becomes dangerous: it becomes
a tool to convince yourself that you're one of the good guys, that
you're doing the right thing, when you really don't know.
Oaths are good at creating discussion---and, in the past year, they have
created quite a lot of discussion. The discussion has been tremendously
helpful in making people aware of issues like algorithmic fairness. The
discussion has helped software developers and data scientists to
understand that their work isn't value-neutral, that their work has real
impact, both good and bad, on real people. And there has been a vigorous
debate about what self-government means for data scientists, and what
guiding principles would last longer than a few years. But we need to
take the next step, and connect these ideas to practice. How will we do
that?
In 2009, Atul Gawande wrote *The Checklist Manifesto* (Macmillan), a
short book on how not to make big mistakes. He writes a lot about his
practice as a surgeon. In a hospital, everyone knows what to do.
Everyone knows that you're supposed to scrub down before the surgery.
Everyone knows that you're not supposed to amputate the wrong leg.
Everyone knows that you're not supposed to leave sponges and other
equipment in patients when you close the incision.
But mistakes are made, particularly when people are in stressful
environments. The surgeon operates on the wrong leg; the sponge is left
behind; and so on. Gawande found that, simply by creating checklists for
basic things you shouldn't forget, these mistakes could be eliminated
almost completely. Yes, there were some doctors who found the idea of
checklists insultingly simple; they were the ones who continued making
mistakes.
Unlike oaths, checklists connect principle to practice. Everyone knows
to scrub down before the operation. That's the principle. But if you
have to check a box on a form after you've done it, you're not likely
to forget. That's the practice. And checklists aren't one-shots. A
checklist isn't something you read once at some initiation ceremony; a
checklist is something you work through with every procedure.
What would a checklist for data science and machine learning look like?
The [UK Government's Data Ethics Framework](http://bit.ly/2NvJ0ik) and
[Data Ethics Workbook](http://bit.ly/2O69wjh) is one approach. They
isolate seven principles, and link to detailed discussions of each. The
workbook asks a number of open-ended questions to probe your compliance
with these principles. Our criticism is that their process imposes a lot
of overhead. While anyone going through their entire process will
certainly have thought carefully about ethical issues, in practice,
asking developers to fill out a workbook with substantive answers to 46
questions is an effective way to ensure that ethical thought doesn't
happen.
We believe that checklists are built around simple, "have we done this?"
questions---and they are effective because they are simple. They don't
leave much room to wiggle. Either you've analyzed how a project can be
abused, or you haven't. You've built a mechanism for gathering consent,
or you haven't. Granted, it's still possible to take shortcuts: your
analysis might be inadequate and your consent mechanism might be flawed,
but you've at least gone on record for saying that you've done it.
Feel free to use and modify this checklist in your projects. It covers
most of the bases that we've seen discussed in various data oaths. Go
over the checklist when starting a project so the developers know what's
needed and aren't surprised by a new set of requirements at the last
minute. Then work through it whenever you release software. Go through
it, and actually check off all the boxes before your product hits the
public.
Here's a checklist for people who are working on data projects:
❏ Have we listed how this technology can be attacked or abused?
❏ Have we tested our training data to ensure it is fair and
representative?
❏ Have we studied and understood possible sources of bias in our data?
❏ Does our team reflect diversity of opinions, backgrounds, and kinds of
thought?
❏ What kind of user consent do we need to collect to use the data?
❏ Do we have a mechanism for gathering consent from users?
❏ Have we explained clearly what users are consenting to?
❏ Do we have a mechanism for redress if people are harmed by the
results?
❏ Can we shut down this software in production if it is behaving badly?
❏ Have we tested for fairness with respect to different user groups?
❏ Have we tested for disparate error rates among different user groups?
❏ Do we test and monitor for model drift to ensure our software remains
fair over time?
❏ Do we have a plan to protect and secure user data?
Oaths and codes of conduct have their value. The value of an oath isn't
the pledge itself, but the process you go through in developing the
oath. People who work with data are now having discussions that would
never have taken place a decade ago. But discussions don't get the hard
work done, and we need to get down to the hard work. We don't want to
talk about how to use data ethically; we want to use data ethically.
It's hypocritical to talk about ethics, but never do anything about it.
We want to put our principles into practice. And that's what checklists
will help us do.
Preface
=======
We've seen an explosion of interest in data ethics. Why now? Well, we
know: fake news, data, and all that. But concern about data ethics
started well before the 2016 election. It started well before Google's
automatic photo tagging misidentified some black people as gorillas.
Concern for data ethics has been growing ever since we first started
talking about data science, and possibly before.
Why indeed? Because data has been integrated into every aspect of our
life: the friends and business connections we're asked to make, the
shopping circulars we receive in the mail, the news we see, and the
songs we've played. Data is collected from us at every turn: every trace
of our online presence, and sometimes even traces of our physical
presence. We've gained some advantages from data, but we've also seen
the damage that the misuse of data has caused. And many of these
concerns were highlighted in multiple reports on data and AI from the
White House including the call to the United States that all training
programs for data science and technology include ethics and security.
It's been great to see people gathering to discuss ethics at events
like [D4G](https://www.bloomberg.com/company/d4gx/) and
[FAT\*](https://fatconference.org/). It's been great to watch the
lively discussions of ethical principles on the [Data For
Democracy](https://datafordemocracy.slack.com/) Slack. And it's been
great to read the many bloggers and commentators writing about ethics.
But what we're still missing is an understanding for how to put ethics
into practice in data as well as the overall product development
process. Ethics really isn't about agreeing to a set of principles.
It's about changing the way you act. To take one very simple example:
it's one thing to say that you should get permission from users before
using their data in an experiment. It's quite another thing to get
permission at web scale. And it's yet another thing to get permission
in a way that explains clearly how the data will be used, and what the
expected consequences are. That's what we need to explore.
It's also important to realize that ethics isn't about a fixed list of
do's and don'ts. It's primarily about having a discussion about how
what you're doing will affect other people, and whether those effects
are acceptable.
That's what this book is all about: putting ethics into practice. That
means making room for discussion, making room for dissent, making sure
that you think through the consequences at every stage of a project, and
much more. What does ethics mean for hiring? How do you teach ethics in
an academic setting? These are all big questions, and not questions that
can be answered in a short book like this, but they're questions that
we need to talk about.
Data science is a team sport and we need you on the team. Given the pace
of technology and evolution of thinking that we expect on data, we
consider this work an iterative project. Just like technology releases
that move from a 0.1 release to a 1.0 release to a 3.1 release, our hope
is that others will contribute new sections and existing sections will
be modified. To enable that, we're making this free for download and
also on [GitHub](https://github.com/oreillymedia/ethics-datascience/) so
it can be a community effort.
Thanks to our reviewers: Ed Felton, Natalie Evans-Harris, François
Chollet, and Casey Lynn Fiesler.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment