3 Creating Functions and Datasets
In this chapter, we’ll create a couple of simple functions and an example dataframe.
Below is a screenshot of writing these functions in RStudio. Further down on this page, you will find the codes in a text format, which you can copy and paste into your editor.
3.1 Example Function 1
We’ll now create a simple function that randomly recommends a movie from
the ggplot2movies
dataset.
# Not to be included in the package but run here on this tutorial page
if (!require(ggplot2movies)) {
install.packages("ggplot2movies")
library(ggplot2movies)
}
## Loading required package: ggplot2movies
# Function to randomly recommend a movie
random_movie_recommendation <- function() {
# Load the movies dataset
data(movies, package = "ggplot2movies")
# Get the movie title at the random index
recommended_movie <- sample(movies$title, 1)
# Return the recommended movie
return(recommended_movie)
}
# Example usage
random_movie_recommendation()
## [1] "Dung che sai duk"
3.2 Example Data Set
In addition to functions, it’s often helpful to include example datasets
in your package to demonstrate how your functions can be used. We’ll now
create a small example dataset called example_data
and add it to our
package.
# Create example dataset
example_data <- data.frame(
ID = c("01", "02", "03", "04", "05"),
Age = c(25, 30, 35, 40, 45),
Likes_Coffee = as.factor(c(TRUE, FALSE, TRUE, TRUE, FALSE))
)
# Display the dataset
example_data
## ID Age Likes_Coffee
## 1 01 25 TRUE
## 2 02 30 FALSE
## 3 03 35 TRUE
## 4 04 40 TRUE
## 5 05 45 FALSE
3.3 Example Function 2
This function calculates the mean of a numeric variable (like age) for
rows where another variable (like Likes_Coffee
) has the value TRUE. In
our example dataset, it can be used to find the mean age of people who
like coffee.
calculate_group_mean <- function(df, numeric_var, factor_var) {
# Filter the dataframe for rows where the factor variable is TRUE
filtered_df <- df[df[[factor_var]] == TRUE, ]
# Calculate the mean of the numeric variable for the filtered dataframe
mean_val <- mean(filtered_df[[numeric_var]])
# Return the mean value
return(mean_val)
}
# Example usage
calculate_group_mean(example_data, "Age", "Likes_Coffee")
## [1] 33.33333
3.4 Saving Your Code
After writing these functions and testing them in an R environment like RStudio, make sure to save or copy-paste them somewhere. This way, you can introduce them to your R package once we progress further in this tutorial.
In this example, I copied the functions to Visual Studio Code (as shown in the screenshot below), but you can use any text editor, such as TextEdit on MacOS or Notepad on Windows, or simply save the code somewhere on your computer.
In the next chapter, we’ll discuss how to set up Git and GitHub for version control and sharing your package.
Creating R Packages: A Step-by-Step Guide by Ville Langén is licensed under CC BY-SA 4.0