'Relative paths in R: how to avoid my computer being set on fire?
A while back I was reading an article about improving project workflow. The advice was not to use setwd
or my computer would burn:
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
I started using the here
package and it worked great until I started to schedule scripts using cronR
. After asking this question my laptop was again threatened with arson:
If the first line of your #rstats script is wd <- here(), I will come into your lab and SET YOUR COMPUTER ON FIRE.
Fearing for my laptop's safety I started using the method suggested in the answer to get relative file paths:
wd <- Sys.getenv("HOME")
wd <- file.path(wd, "projects", "my_proj")
Which worked for me but not people I was working with who didn't have the same projects
directory. So now I'm confused. What is the safest / best way get relative file paths so that a project can be portable?
There are quite a few options: 1, 2. My requirements are to source functions/scripts and read/write csv files. Perhaps the rprojroot
package is the best bet?
Solution 1:[1]
There are many ways to organize code and data for use with R. Given that the "arsonist" described in the OP has rejected at least two approaches for locating the project files in an R script, the best next step is to ask the arsonist how s/he performs this function, and adjust your code and file structures accordingly.
UPDATE: Since the "arsonists" appear to be someone who writes on Tidyverse.org (see Tidyverse article in OP) and an answer on SO (see additional links in OP), your computer appears to be relatively safe.
If you are sharing code or executing it with batch processes where the "user" is someone other than you, a useful approach is to place the code, data, and configuration under version control, and develop a runbook to explain how others can retrieve the components and execute them on another computer.
As noted in the comments to the OP, there's nothing wrong with here::here()
if its use can be made reliable through documentation in a runbook.
I structure all of my R code into Projects within RStudio, which are organized into a gitrepositories
directory. All of the projects can be accessed as subdirectories from the gitrepositories
directory. If I need to share a project, I make the project accessible to other users on GitHub.
In my R code I reference external files as subdirectories from the project root directory, such as ./data/gen01.csv
.
Solution 2:[2]
Create an RStudio project and then reference all files with relative paths from the project's root folder. That way, all users will open the project and automatically have the correct working directory.
Solution 3:[3]
There are two parts to this question:
- how to load data from a relative path, and
- how to load code from a relative path
For most use cases (including when invoking tools from a CRON job or similar) the location of the data should either be specified by the user (via command line arguments, standard input or environment variables) or should be relative to the current working directory (getwd()
in R).
… Unless the data is a fixed part of the project itself — more on this below.
Loading code from a path that’s relative to other code is simply not supported by base R. For example, source('xyz.r')
won’t source an xyz.r
file from the project. It will always try to load it from the current working directory, whatever that happens to be. Which is pretty much never what you want. And as you’ve noticed, the ‘here’ package also doesn’t always work.
R basically only works when code is only loaded from packages. But packages aren’t suitable for all types of projects. R has no built-in solution for those other cases. I recommend using ‘box’ modules to solve this. ‘box’ provides a modern module system for R, which means that you can have R projects consisting of multiple code files (and nested sub-projects), without having to wrap them in packages. Loading code inside the same relative path in a module is as simple as
box::use(./xyz)
This always works, as you’d expect from a modern module system, and doesn’t require ‘here’ or similar hacks.
OK, back to the point about data that’s bundled with a project itself. If your project is an R package, you’d use system.file()
to load that data. However, this once again doesn’t work for non-package projects. But if you use ‘box’ modules to structure your project, you can use box::file()
to load data that’s associated with a module.
Packages such as ‘here’ or ‘rprojroot’, while well-intended, are essentially hacks to work around limitations in R’s handling of non-package code. The proper solution is to make non-package code into a first-class citizen of the R world, and ‘box’ does that.
Solution 4:[4]
You can check docs of RSuite package (https://RSuite.io). It is working with script_path that points to currently run R script. I use it to make relative paths using 'file.path' command
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Konrad Rudolph |
Solution 4 | Wit Jakuczun |