Python

../_images/python_logo.png

Python#

This class will teach you to understand and use the Python programming language, along with a set of libraries commonly used in data science broadly. I do not expect that you’ve ever written a line of code in any programming language before, so in learning to use Python, you will also be learning to program. Using Python for data science, and programming, are not exactly the same thing — programming describes a broader range of skills than using a programming language to do data science. As well as learning the “words” (commands) and “grammar” (how to define and combine commands), of a programming language, programming encompasses particular ways of thinking. One important programming skill is operationalization — analyzing and breaking down problems, and identifying the sequence of steps to solve them. Another is paying close attention to the details of how you write and format your code (all of the sudden, not indenting a line is not just a violation of that annoying APA Style guide, but causes your code to function in a totally different way, or not at all!).

Python was originally written by Guido van Rossum and first released in 1991. Its name has nothing to do with snakes, but rather was derived from the famous comedy sketch troupe Monty Python. Python developed as an open-source project. This means several things. Firstly, that it is made available for free, with anyone being granted the permission to use, examine, modify, and share the source code (the code that runs when you run a Python command). Secondly, that many people contributed to the development of the language, typically without receiving any payment (though some developers may have contributed to Python in the context of working for a company that relied on the language, or simply embraced values of supporting the open-source community). Van Rossum was the lead developer for the project until 2018, and now the development of the language is guided by a five-person steering council (which still includes Van Rossum). Like virtually every active programming language, Python is under continual development, to fix bugs, improve its efficiency, and extend its abilities. Python has gone through three major versions, each with many minor releases. Development is guided by officially reviewed and approved Python Enhancement Proposals (PEP). Some PEPs also serve as official guidelines. For example, PEP 20 is The Zen of Python, which espouses core values of the language, while PEP 8 is The Style Guide for Python Code, which we will return to later and often as it defines rules concerning how the code is interpreted (e.g., indents, as mentioned above), as well as guidelines that make code consistent and easy to read.

In a very general and nontechnical sense, programming languages can be characterized as falling on a continuum from “higher level” to “lower level” (or, perhaps more simply, easier to use and learn to harder to use and learn). Python falls closer to the “high level” end of this spectrum, relative to languages like C or Java. This often means it takes less code to perform a particular function, more things are baked in “for free” in Python than one might have to explicitly write code for in C. As a result, Python is simpler and more elegant to read and write. Indeed, PEP 20 enshrines certain core values of the language, such as:

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Readability counts.

These values contribute to making Python relatively easy to learn and use, compared to other programming languages. At the same time, programs written in Python (if written properly) tend to run quickly and efficiently, so there is little “overhead” relative to using a lower-level language. Python has been widely adopted by communities in many areas of science, and in data science, because of this (and the fact that it’s free). Many add-on packages (libraries) have been written to extend Python’s functionality in various ways, including a large number of libraries specifically for scientific applications.


This section was adapted from Aaron J. Newman’s Data Science for Psychology and Neuroscience - in Python.