
2 Creating a Productive Programming Environment
Just as a chef depends on sharp knives and a well-organized kitchen to prepare a great meal, a data scientist relies on a finely tuned programming environment to produce clear, reliable visualizations and analyses. In this chapter, we will set up R and RStudio—your “knife and cutting board”—so that you can focus on creativity rather than technicalities. In tandem, R and RStudio offer many useful features that support a data scientist’s workflow, from handling raw data to producing final products. With a large community of users and developers, the combination of R and RStudio has become a recommended choice for statistical analysis and data visualization (Fransham, 2024). Let’s join this community and take our first steps toward mastering these powerful tools.
Section 2.1 and Section 2.2 will guide you through the installation of R and RStudio, respectively. Section 2.3 will suggest a beginner-friendly customization of RStudio. Finally, Section 2.4 will introduce you to RStudio projects, a feature that will helps you organize your work.
2.1 Installing R
R is a programming language originally created in 1996 by Ross Ihaka and Robert Gentleman, two statistics professors from the University of Auckland in New Zealand (Vance, 2009). Originally intended only as a replacement for the statistics software used for teaching at their university department, R has become popular among statisticians and data scientists worldwide because it is free, open-source, and runs on all common operating systems. Additionally, R has an active community of developers who contribute packages that extend the language’s functionality. Consequently, R offers efficient solutions to a wide array of complex problems in statistics, data analysis, visualization, and report writing.
There are admittedly some downsides to R. Some legacy functions in R’s base installation are slow and memory-inefficient, relying on dated programming interfaces. However, these problems have been solved by modern add-on packages that offer efficient and user-friendly alternatives. This book will make extensive use of the tidyverse suite of packages, which are optimized for performance and possess consistent user interfaces. Because the tidyverse is actively developed, its packages are continually improved and expanded.
The programming language R is extensively used in statistics and data science. The tidyverse suite’s add-on R packages provide efficient and user-friendly solutions for complex tasks, such as data visualization.
The installation method for R varies depending on your operating system. The description below is up-to-date as of May 2025 (R version 4.5.0).
2.1.1 Installing R on Windows
Please download the R binaries for the base distribution from https://cran.r-project.org/bin/windows/base/. Then double-click on the downloaded file and follow the on-screen instructions to complete the installation. The default settings offered by the installer are suitable for most users.
2.1.2 Installing R on MacOS
It is advisable to install R using an installer package file tailored to your hardware. If you have a recent Mac, it is likely that it runs on a processor based on the ARM64 architecture. You can find out your processor architecture by clicking on the Apple icon in the top left corner; then select “About This Mac” from the dropdown menu. If your chip name starts with “Apple,” your processor is based on ARM64. In this case, download the most recent ARM64 (“Apple Silicon”) installer for R from https://cran.r-project.org/bin/macosx/. If you have an older Mac, you may need the R version for an Intel processor, which can be downloaded from the same website. In both cases, double-click on the downloaded file to start the installation process. Then follow the on-screen instructions to complete the installation. For most users, the default settings provided by the installer are acceptable without modification.
2.1.3 Installing R on Ubuntu
Please right-click anywhere on the Ubuntu desktop and select “Open in Terminal” from the context menu. Then type the following commands in the Terminal window and press the Return key after each command. You may be prompted to enter your Ubuntu password; note that you will not see any characters while typing:
sudo apt update
sudo apt install -y r-base
2.1.4 How to Use R?
There are various ways to write and run R programs. For instance, you can write the code using any conventional text editor, such as the pre-installed applications Notepad (on Windows) or TextEdit (on MacOS). After saving the R code, it can be executed from the Command Prompt in Windows or a Terminal in Mac and Linux. For certain applications, running code from the Command Prompt or Terminal might be the only available option (e.g., when running an R program on a remote server).
However, most of the time, it is more advantageous to use an integrated development environment (IDE). An IDE combines writing and running code into a single user interface. The next chapter will guide you through the installation of RStudio, which is currently the most popular IDE for R.
2.2 Installing RStudio
RStudio is a free and open-source IDE for statistical computing and graphics. While RStudio primarily caters to users of the R programming language, it also supports other programming languages, including SQL and Python. Additionally, RStudio has native support for the Quarto authoring framework, including a visual editor and a built-in document viewer. Furthermore, RStudio recently integrated GitHub Copilot, an artificial intelligence code completion tool that suggests code snippets based on your code’s context. Thanks to these features, RStudio is a versatile tool for data scientists and statisticians, and, hence, the IDE of choice for this book.
Like the R installation, the RStudio installation also requires different methods for various operating systems. As before, the description below is up-to-date as of May 2025 (RStudio version 2024.12.1+563).
RStudio is an IDE for statistical computing and graphics.
2.2.1 Installing RStudio Desktop on Windows
Using a web browser, go to RStudio download page. Please click on the link for your Windows version to download the most recent available installer of RStudio Desktop. Then double-click on the downloaded file and follow the on-screen instructions to complete the installation. The default settings offered by the installer are suitable for most users. Afterward, you can open RStudio by searching for “RStudio” using the Start menu.
2.2.2 Installing RStudio Desktop on MacOS
If you are using a Mac with an ARM64 processor, you need to install Rosetta 2 before you can work with RStudio. Rosetta 2 is a translation layer that enables users to run software compiled for older processors. To install Rosetta 2, open a Terminal window, for example, by searching for “Terminal” using Spotlight, which can be activated by pressing Command-SpaceCommand-Space. In the Terminal window, type the following command and press the Return key:
/usr/sbin/softwareupdate --install-rosetta
From this point onward, the installation process is the same for any processor architecture.
Please visit the RStudio download page and click on the link for the most recent installer for the MacOS version of RStudio Desktop. Next, double-click on the downloaded file to open a Finder window. Inside this window, drag the RStudio icon to the Applications folder. Then double-click on the RStudio icon to start the application.
Because this is the first time RStudio is opened, a dialog window will appear, asking if you want to install the command line tools, needed by the Git version control system. Because Git is highly useful, I recommend clicking “Install.” Once the Git installation is complete, you are ready to work with RStudio.
2.2.3 Installing RStudio Desktop on Ubuntu
Please use a web browser to navigate to the RStudio download page and click on the link for the most recent RStudio Desktop installer compatible with your Ubuntu version. After the download is complete, type the following command in a Terminal window and press EnterEnter. Please note that you may need to replace the file name with the name of the file you actually downloaded:
sudo apt install -y ~/Downloads/rstudio-2024.12.1-563-amd64.deb
Now you can start RStudio, for example, by searching for it using the “Show Apps” tool, accessible through the icon at the bottom left of the desktop.
2.2.4 Initial Peek at RStudio
When RStudio is opened, the user interface is divided into panes, which are areas of the window that can contain tabs. At this stage, the window is typically divided into three panes with the following active tabs (Figure 2.1):
The RStudio window is divided into panes, which are further divided into tabs, each for a specific purpose, such as executing R commands, displaying variables, or listing files.
- Console Tab: Located in the left half of the window, this tab is where you can enter and execute R commands.
- Environment Tab: Positioned in the top right, this tab displays variables, data sets, and other objects defined during your R session.
- Files Tab: This tab shows the files in your current project directory.
As you progress through this book, the functions of these panes and their respective tabs will become clearer and more intuitive.

If any of the panes is invisible, it may be hidden under another pane. To reveal a hidden pane, click on the “expand pane” symbol in the top right corner of the pane. Alternatively, you can adjust the size of the panes by dragging the separator between them. The mouse pointer will change to a double-sided arrow when positioned over a separator, indicating that you can click and drag to resize the panes.
2.3 Customizing RStudio
Before coding with RStudio, it is advisable to make its user interface as beginner-friendly as possible. First, I recommend changes to the general settings (Section 2.3.1). Then I suggest activating GitHub Copilot integration (Section 2.3.2).
2.3.1 Adjusting General Settings
The pre-installed general settings can be confusing when learning RStudio because they restore the state of the application from the previous session. To facilitate understanding RStudio’s behavior, I recommend changing the default settings so that RStudio will always start with a clean workspace, which is beneficial for beginners.
To adjust these settings, go to the “Tools” menu and select “Global Options…” This action opens a dialog window with a menu in the left sidebar. In the General menu, uncheck all boxes followed by the word “Restore.” Then look for the drop-down menu labeled “Save workspace to .Rdata on exit” and choose “Never” from this menu. The screenshot in Figure 2.2 shows the dialog window with the recommended settings. After making these changes, click “OK” to close the dialog window.”

.Rdata
is never saved on exit.
2.3.2 Activating GitHub Copilot Integration
GitHub Copilot brands itself as “an AI pair programmer as you code” (GitHub, Inc, 2024). Although you should not blindly trust Copilot’s suggestions, they can often accelerate code development by guessing your intended code completion. Students can sign up for a free GitHub Developer Pack, which includes Copilot, at https://education.github.com/pack. After you receive the sign-in credentials, follow the steps outlined in the RStudio User Guide to set up Copilot for your RStudio installation.
By default, Copilot only uses the currently active file to infer code suggestions. To enable Copilot to include other files in the current project for its inference, I recommend ticking the box “Index project files with GitHub Copilot” in the Copilot menu of the Global Options dialog window, as shown in Figure 2.3.

2.3.3 Section Summary: Customizing RStudio
After implementing the adjustments suggested in this chapter, you now have a beginner-friendly installation of RStudio and are ready to proceed. In the following section, you will learn how to organize your work using RStudio projects.
2.4 RStudio Projects
In this book, we will use RStudio to store data, write computer code for data analysis, and visualize the results. For a productive workflow, it is important to keep track of all input files and organize them in a structured manner. RStudio offers a convenient way to organize your work in the form of “projects.” A project is a directory that contains all the files associated with a particular topic. RStudio projects are also useful for reproducibility. For these reasons, Section 2.4.1 will explain how to set up an RStudio project using the version-control system Git and the Git-based online service GitHub. In Section 2.4.2, you will learn how to create an RStudio project by cloning a GitHub repository. Finally, in Section 2.4.3 you will learn how to record changes to the RStudio project by pushing to and pulling from the GitHub repository.
2.4.1 Git and GitHub
Git is a version-control system that allows users to track changes in their files. It uses the concept of a repository, which is a directory that contains all the files associated with a particular software project. GitHub is a web-based hosting service for Git repositories. You can use Git and GitHub to keep a track record of updates to your project and share your work with others.
Git is a version-control system that allows users to track changes in their files. GitHub is a web-based hosting service for Git repositories.
Although RStudio projects do not need to be Git repositories, I strongly recommend that you integrate Git into your workflow. In this book, you will not need advanced features of Git and GitHub, such as branching, merging, and pull requests; therefore, the details of Git and GitHub will not be introduced here. However, even if used in a basic manner, Git and GitHub are useful for backing up your data and sharing files with others. The recommended workflow follows this pattern:
- Create a repository on GitHub.
- Start an RStudio project which contains a clone of the repository.
- Work on the project using RStudio.
- After completing a portion of work, push the changes from RStudio to GitHub to backup your work and alert collaborators about the updates. Then return to step 3.
2.4.1.1 Installing Git
If you do not already have Git installed on your computer, please download and install it now. MacOS users who followed the instructions in Section 2.2.2 already have Git installed. For Windows, please download Git from https://gitforwindows.org. On Ubuntu, you can install Git by executing the sudo apt install -y git
command in RStudio’s Terminal tab.
After installing Git, restart RStudio to ensure that it is aware of Git’s presence on your machine.
2.4.1.2 Creating a GitHub Repository
Please open a web browser and navigate to the GitHub website. If you do not already have a GitHub account, please create one now. Next, log in and click on the “+” sign in the upper right corner of the screen. From the context menu, select “New repository.” The browser will now take you to a website where you can create the new repository as follows (Figure 2.4):
- Enter a name for the repository (e.g., “investments”).
- Add a description (e.g., “Data and scripts related to fictitious traded companies and shareholders.”)
- Select “Private” as the visibility of the repository.
- Check the box “Add a README file.”
- Click on the button “Create repository.”

The browser will now take you to a website where you can see the repository you just created. The repository is currently empty, apart from an automatically generated README.md
file. However, most files you will create will be generated by RStudio instead of GitHub. To set up the RStudio environment required in the following section, copy the URL of the repository from the browser’s address bar. The URL should have the following pattern: https://github.com/your-user-name/your-repo-name
.
2.4.1.3 Setting Up GitHub Authentication for RStudio
To exchange data with GitHub, RStudio must be able to authenticate with your GitHub credentials. For setting up credentials, switch to RStudio’s Console tab and run the following commands:
install.packages("usethis")
usethis::create_github_token()
A browser window will open, prompting you to log in to your GitHub account and enter a brief description of the use case for the access token, such as “RStudio authentication.” Copy the access token from the browser window. Then return to RStudio and run the following command in the Console tab. When prompted, paste the access token into the console:
gitcreds::gitcreds_set()
The access token is now stored in your Git configuration, allowing RStudio to authenticate with GitHub.
2.4.2 Creating an RStudio Project by Cloning a GitHub Repository
To create an RStudio project, please navigate to RStudio’s File menu, and select “New Project.” A dialog window will appear where you can choose between three options for creating a new project: “New Directory,” “Existing Directory,” and “Version Control” (Figure 2.5). To synchronize the files in the RStudio project with those in the investments
GitHub repository, choose “Version Control.”
An RStudio project is a directory that contains all the files associated with a particular topic.

From the next dialog window, select “Git” as version-control system, which starts a new dialog titled “Clone Git Repository” (Figure 2.6). Then paste the URL of the repository, which you copied at the end of Section 2.4.1.2, into the first text box. Next, press the tab key to insert the repository name (e.g., investments
) automatically into the second text box. Please use the Browse button to select a directory on your computer where you want to store the project. Then click on “Create Project.”
You can clone a GitHub repository to your local computer as an RStudio project.

When you create the project, RStudio will automatically change the working directory to the directory of the new project without you having to change directories manually. RStudio also placed a file with the extension .Rproj
in the project directory. When you open this file (e.g., by searching for it using the Start menu on Windows and similar tools on Mac or Linux), RStudio automatically starts a new session, setting the project directory as your working directory. If RStudio is already open but you are in another directory, you can resume a previous project using the menu item “File” followed by “Recent Projects.” If RStudio does not consider the project to be recent any longer, you can use the more general option “Open Project.”
2.4.3 Synchronizing Changes between the RStudio Project and GitHub
The next goal is to add a file to the RStudio project just created. For this purpose, please download the sample data file shareholders.csv
from the book website. Then move the downloaded file to the RStudio project directory, for example, by highlighting the RStudio Files tab, clicking on the cogwheel icon , selecting “Show Folder in New Window” from the context menu, and dragging the file icon from the Downloads directory to the new window.
While working on a Git-enabled RStudio project, Git monitors all changes to the files in the project directory and all subdirectories. You can see the affected files in the Git tab in the top right pane of the RStudio window (Figure 2.7). Note that shareholders.csv
is among the listed files.

Let us suppose that all new files should be added to the Git repository. The process of establishing a permanent record of new and changed files requires two steps, known as “staging” and “committing.” To stage, tick the relevant files in the Git tab. It is possible to tick the files individually by hand. While this approach is adequate when only a few files are to be committed, a more efficient method is to activate the Git tab (e.g., by clicking on any file name inside the tab), select all files with the keyboard shortcut Ctrl-ACtrl-A (Windows and Linux) or Command-ACommand-A (Mac). Clicking on only one of the tick boxes will then tick also all the remaining ones.
After all relevant files have been staged by placing a tick next to their name, click the menu item “Commit.” In the following dialog window, add a message that describes the changes to the committed files (Figure 2.8). Then click on the “Commit” button to complete the commit process.

The changes are now permanently recorded in the Git repository on your machine, but you still need to upload the changes to the online repository hosted by GitHub. This process is called “pushing” and can be initiated by clicking on the “Push” button in the commit dialog window (at the top of Figure 2.8) or in the Git tab. You might be asked to enter your GitHub credentials (e.g., in the form of a two-factor authentication). After the push has been completed, the changes will be visible on the GitHub website (Figure 2.9).
“Pushing” is the process of uploading changes from your local machine to GitHub.

You can add collaborators to your GitHub repository by first clicking on the “Settings” tab of the GitHub website and then selecting “Collaborators” from the left navigation menu. Collaborators will receive an email notification and will be able to clone the repository to their local machine. When a collaborator changes files in the GitHub repository, you will be able to download the changes to your local machine. This process is called “pulling” and can be initiated by clicking on the “Pull” button at the top of the Git tab.
“Pulling” is the process of downloading changes from GitHub to your local machine.
There are many more features of Git and GitHub that are beyond the scope of this introduction. For example, you can create branches to work on different versions of the same project in parallel. Additionally, you can create a pull request to suggest changes to a repository that you do not own. For more information, refer to the GitHub documentation.
2.4.4 Section Summary: RStudio Projects
This section laid the foundation for a productive RStudio workflow. You learned how to setup a GitHub repository and a Git-enabled RStudio project, enabling you to keep a track record of your project. Additionally, you discovered how to synchronize changes between your local machine and GitHub, a feature that facilitates project migration across different computers and collaboration with colleagues.
2.4.5 Exercise: Push a File to GitHub
Download traded_companies.csv
from the book website and add it to the RStudio project created in this chapter. Then commit the file to the Git repository and push the changes to GitHub. Verify that the file is visible on the GitHub website.
2.5 Conclusion
Through this preparatory chapter, you have equipped yourself with the essential tools for data analysis and visualization by installing and configuring R and RStudio. You have also learned how to manage your work using RStudio projects, ensuring that your files and data stay organized.
The next chapter will apply the knowledge gained in this chapter to generate Quarto documents using RStudio.