3 Creating Functions and Datasets

In this chapter, we’ll create a couple of simple functions and an example dataframe.

Below is a screenshot of writing these functions in RStudio. Further down on this page, you will find the codes in a text format, which you can copy and paste into your editor.




3.1 Example Function 1

We’ll now create a simple function that randomly recommends a movie from the ggplot2movies dataset.

# Not to be included in the package but run here on this tutorial page
if (!require(ggplot2movies)) {
  install.packages("ggplot2movies")
  library(ggplot2movies)
}
## Loading required package: ggplot2movies
# Function to randomly recommend a movie
random_movie_recommendation <- function() {
  # Load the movies dataset
  data(movies, package = "ggplot2movies")
  # Get the movie title at the random index
  recommended_movie <- sample(movies$title, 1)
  # Return the recommended movie
  return(recommended_movie)
}

# Example usage
random_movie_recommendation()
## [1] "Dung che sai duk"

3.2 Example Data Set

In addition to functions, it’s often helpful to include example datasets in your package to demonstrate how your functions can be used. We’ll now create a small example dataset called example_data and add it to our package.

# Create example dataset
example_data <- data.frame(
  ID = c("01", "02", "03", "04", "05"),
  Age = c(25, 30, 35, 40, 45),
  Likes_Coffee = as.factor(c(TRUE, FALSE, TRUE, TRUE, FALSE))
)

# Display the dataset
example_data
##   ID Age Likes_Coffee
## 1 01  25         TRUE
## 2 02  30        FALSE
## 3 03  35         TRUE
## 4 04  40         TRUE
## 5 05  45        FALSE

3.3 Example Function 2

This function calculates the mean of a numeric variable (like age) for rows where another variable (like Likes_Coffee) has the value TRUE. In our example dataset, it can be used to find the mean age of people who like coffee.

calculate_group_mean <- function(df, numeric_var, factor_var) {
  # Filter the dataframe for rows where the factor variable is TRUE
  filtered_df <- df[df[[factor_var]] == TRUE, ]
  # Calculate the mean of the numeric variable for the filtered dataframe
  mean_val <- mean(filtered_df[[numeric_var]])
  # Return the mean value
  return(mean_val)
}

# Example usage
calculate_group_mean(example_data, "Age", "Likes_Coffee")
## [1] 33.33333

3.4 Saving Your Code

After writing these functions and testing them in an R environment like RStudio, make sure to save or copy-paste them somewhere. This way, you can introduce them to your R package once we progress further in this tutorial.

In this example, I copied the functions to Visual Studio Code (as shown in the screenshot below), but you can use any text editor, such as TextEdit on MacOS or Notepad on Windows, or simply save the code somewhere on your computer.





In the next chapter, we’ll discuss how to set up Git and GitHub for version control and sharing your package.




Creating R Packages: A Step-by-Step Guide by Ville Langén is licensed under CC BY-SA 4.0