Which language to use?#
Although there are thousands of programming languages, a relatively small number of them are widely used. Which languages are the most popular or commonly-used depends on the discipline and area of use. Different languages are designed for very different purposes, and in many cases new work builds on older work, so the use of a particular language in a particular setting will tend to cause that language to propagate within that setting. For example, some languages are well-suited to building interactive web sites, while others may be more suitable for building apps for mobile devices, and others for writing code to be embedded in hardware devices. One of the most long-standing and representative indices of programming language popularity is the TIOBE web site, whose ratings are based on “the number of skilled engineers world-wide, courses and third party vendors. Popular search engines such as Google, Bing, Yahoo!, Wikipedia, Amazon, YouTube and Baidu are used to calculate the ratings.” As of April, 2020, the 10 most popular languages are (in order): Java, C, Python, C++, C#, Visual Basic, JavaScript, PHP, SQL, and R. (Don’t worry if you haven’t heard of all of these — there won’t be a test! This is merely to give you a sense of the “lay of the land”, and expose you to the names of languages you’re likely to come across in the future.)
In data science, a few languages are particularly widely used. The internet is rife with clickbait-y pages such as “Top languages every data scientist should know”; while these may rely on questionable methodologies, a general survey of such pages reveals a fairly consistent set of languages, including Python, R, MATLAB, C, Java, SQL, Julia, and Scala. Indeed, a recent informal survey conducted of faculty in the Department of Psychology & Neuroscience at Dalhousie University (April 2020; n=11) supports this claim, as shown in the plot below.
It is important to know that there is no one, “best,” programming language — either for programming in general, or for kinesiology in particular. Indeed, many scientists have workflows that include multiple languages. For example, some of my own lab’s research involves a robotic device that is controlled through Matlab/Simulink programs. For these studies, we collect the data through Matlab, but rely on Python to process the data and to perform the statistics. However, other neuromechanics labs may use different workflows, such as MATLAB for processing and SPSS for statistics.
So the punch line is, you should use the language that is best-suited for the task at hand. In the example of my lab’s workflow, we use Python for data processing because I prefer Python as a language to work in for some of the reasons described below. And we perform the statistics using Python as well, because I would like to keep the language consistent across as many aspects of our data analysis pipeline as possible, and I have also written an open, publicly available Python library (a library is simply a set of tools written to extend functionality and perform particular tasks) for running the sorts of statistical models we prefer.
This section was adapted from Aaron J. Newman’s Data Science for Psychology and Neuroscience - in Python.