Libraries and packages

Libraries and packages#

Something that’s worth clarifying at this point, is the difference between a programming language and a “package” (usually called a library in Python). A language, as defined above, essentially provides a set of powerful, but generic, tools for performing tasks using a computer. A library is a collection of programs or tools written in a particular language, for more specific purposes. Typically a library provides an additional set of commands for the programming language, that extend it and make it easy to perform particular tasks. Thus using a Python library still requires that you write Python code, but saves you from “reinventing the wheel” to perform specific functions. This saves users considerable time over having to write all the code themselves, allows them to spend their time more effectively, and often empowers them to perform tasks that they might not otherwise have been able to do, due to a lack of technical knowledge. For example, just because someone understands conceptually what filtering an EEG signal means, this doesn’t mean they have the mathematical and engineering background to write a filter from scratch that is free of errors and performs as desired.

In many areas of movement science research, there are mature software packages written by scientists (often grad students) for specific types of data or analysis. Usually a scientist, or a group of them, start writing a package because they need to solve a problem for which no solution currently exists (or not in their preferred language). These packages often develop a large base of users, which results in several economies of scale. Firstly, more users mean more people who can develop and share applications of the software, identify bugs, and even contribute to developing the software and fixing bugs. Thus the software may gain more features, as the original developers are encouraged by the popularity, and more developers start contributing. Larger user bases also tend to lead to better documentation, both through demand placed on the developers, and creation of crowd-sourced documentation. As the popularity of a software package grows, more scientists become familiar with it, which can lead to exponential growth of the user base, more extensive validation of the algorithms, and even the development of “gold standards” for use in particular fields.

Another point to make is to distinguish between packages that require some knowledge of programming language to use, and those that rely on a graphical user interface (GUI). GUI-based packages (which could be written in MATLAB, Python, R, JavaScript, etc.) are initially easier and simpler to use. However, in gaining ease of use, they typically sacrifice advantages we’ve already talked about, like automation and reproducibility. If a human user has to sit and click through a series of operations, this is not an automated or scalable process. As well, reproducibility is sacrificed because it’s easy for the user to make errors clicking through the GUI, but hard to detect and diagnose those errors after the fact. That said, in some cases GUIs can be advantageous, such as teaching people about a data analysis process, or trying out a tool. Many scientific packages also have “hybrid” options. For example, one could perform a set of steps using the GUI, which are then recorded and stored as a program that can be re-run, or the GUI might provide the option to select all the data files from a particular study and apply the same steps to all of them.

At the end of the day, the scientist who is comfortable working in a programming language is far better-prepared, and more flexible and adaptable, than one who is only comfortable working with GUIs or using manual workflows. Importantly, although there are many programming languages, there are a lot of core concepts that you learn when you learn to program, that translate widely across different languages. Learning programming relies heavily on learning certain concepts such as how to break down a desired goal into a set of small task units, how to perform tasks using specific instructions, how to figure out what those instructions are in a particular language, how to automate, loop, and scale tasks, how to debug errors, and so on. The details may vary with the programming language, but the fundamental concepts are typically the same. Thus, once you have a comfortable working knowledge of one programming language, it is much easier to start learning and using a new language.


This section was adapted from Aaron J. Newman’s Data Science for Psychology and Neuroscience - in Python.