Introducing Python#
This chapter covers the material for about the first month of the course, which is a basic introduction to Python. Our focus is on how to use Python as a tool for working with data. As such, our focus is on:
learning the fundamentals of Python
learning the fundamentals of programming logic
using Python for data science, including:
reading data
manipulating/processing data (e.g., extracting specific data, splitting data according to variables, applying functions, combining data)
exploratory data analysis
basic statistical analyses of data sets
Each section of this chapter is a lesson, which will probably take you about 1-1.5 hours to complete. Each lesson is a Jupyter notebook, which will also be available to you on JupyterHub. The best way to work through this material is to type in the code to follow along with the instructor. The copies of these chapters in this textbook serve more as a reference that you can check back with later.
After working through the lessons, you can get more practice with DataCamp. Refer to the course schedule to see which DataCamp lessons go with which course lessons. This chapter relates to the material taught in the first two DataCamp lessons, Introduction to Python and Intermediate Python.
Origins of this material#
The lessons introducing Python are adapted from a workshop created by the Software Carpentry foundation. It uses freely-available open source data from Gapminder, an independent Swedish foundation whose mission is to “fight devastating ignorance with a fact-based worldview everyone can understand.” Gapminder is perhaps most famous for the TED talks given by its co-founder, the late Dr. Hans Rosling (the other founders were Ola Rosling and Anna Rosling Rönnlund). Dr. Rosling’s TED talk, The best statistics you’ve ever seen, became one of the most watched TED talks ever (nearly 15.6m views as of August 23, 2023). The Gapminder data we will be working with here are a subset of those used in Dr. Rosling’s talk. Specifically, the data are gross domestic product (GDP) of countries from around the world, over a time period from 1952 – 2007.
While we could introduce Python with virtually any data (including neuromechanics-specific data), the Gapminder data are relatively easy to understand without deep technical knowledge of a domain (GDP is a measure of a country’s wealth, and the data reflect how this changes over time), the data files are open source and so support open sharing and transparency in science, and Dr. Rosling’s talks provide a colourful and interesting introduction to the data. We have adapted the Software Carpentry (SC) version of the workshop based on experience teaching it, and finding that some parts of the SC workshop were unclear, or assumed certain background knowledge (particularly mathematics) that was not universally held by our target audience, or that concepts in the SC workshop were presented in a sequence that was not intuitive to us. We have great respect for the SC organization and its aims, and adapting and customizing the workshop is in the spirit of the open source movement and open educational resources.
This section was adapted from Aaron J. Newman’s Data Science for Psychology and Neuroscience - in Python.