Scientific Programming Languages

Scientific Programming Languages#

According to Wikipedia, a programming language is, “a formal language, which comprises a set of instructions that produce various kinds of output” — where “formal languages” are characterized by hierarchical organization in which letters are combined to form words, which are in turn combined into larger units according to rules called a syntax (or grammar). In general, programming languages are instructions for computers to perform. There are thousands of programming languages in existence, which Wikipedia attempts to catalogue on this page.

There is nothing special about “scientific” programming languages to distinguish them from other programming languages, except that they are used for scientific purposes. Some languages, however, have become particularly widespread in scientific applications. Below is a discussion of different languages, but first to address the cliffhanger left at the end of the preceding section: programming languages provide a way to standardize and automate data analysis that is reproducible. Since programs are written sets of instructions stored in a file, the same set of instructions can be applied to every data file in a study, and if the programs used to analyze the data are shared with others, then others should be able to reproduce the original results. There is, of course, no guarantee that the original program was free of errors (“bugs”), but the fact that the instructions are written and saved means that they can be audited by others, which makes finding errors much easier (or possible at all) relative to a manual task performed by humans.

Computer programs can also be written in ways to “batch” work, meaning that they can scale easily. For example, once a program has been written to process one data file in a desired way (e.g., compute an individual’s mean RTs for each condition, as in our example above), that program can be placed in a “loop” that applies the same process to every data file in a study. While running the program on each data file takes a certain amount of time — meaning more data will take longer to analyze — computer programs typically perform these kinds of routine tasks far faster than humans (often literally in the blink of an eye), and far more reliably. Where humans might make random errors, computer programs do not: if the program contains an error, it will systematically make the same error on every data file it processes. While errors are obviously not desirable, a systematic error is typically easier to detect and correct than random errors.


This section was adapted from Aaron J. Newman’s Data Science for Psychology and Neuroscience - in Python.