Syllabus for a course on Data Science Ethics


Originally published at:


There are readings on Nuremburg and Tuskegee towards the end of the semester, but I’m not seeing the context. The context is that medical and epidemiologic research are done in a strict ethical context called the IRB, or CPHS, Institutional Review Board or Committee for the Protection of Human Subjects. This context, while often very frustrating to work within, is necessary and useful for keeping all, or at least 99.999% of research in the health sciences on the side of what is good. And THEN, after that we still have the annoying, ever present issues of publication bias and other things that fuck up the validity of the work. I mention this because I think it’s vague to just talk about ethics without also talking about the existing legal and cultural structures that have worked in our favor and could serve as models for how to behave in other realms of big data. Nobody has mentioned an IRB for scraped data, or an IRB for marketing research. Why not? Discussing the pros, cons and practicalities would be perfect for a course like this and seems to be lacking.


I think the most important thing to teach young people entering scientific and engineering careers today is “if they won’t put it in writing and sign it, don’t do it.”

That’s harder than it sounds. Lots of people can’t assert themselves to the necessary degree without significant training and practice. The majority of humans do what they are told; getting people to get up on their hind legs and say “boss, I can’t do that without signed, explicit direction” isn’t simple.



Write it up and have your boss sign it.
And if he doesn’t want to sign it, suggest talking it over with his boss.
Not an easy thing to do, I know. But it might well be the very thing that makes sure you’ve got a chair when the music stops.


This topic was automatically closed after 5 days. New replies are no longer allowed.