# Calculate difference between dataframe rows by group in R

In this article, we will see how to find the difference between rows by the group in dataframe in R programming language.

**Method 1: Using dplyr package**

The group_by method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.

**Syntax:**

group_by(col1, col2, …)

This is followed by the application of mutate() method which is used to shift orientations and perform manipulations in the data. The new column name can be specified using the new column name. The difference from the previous row can be calculated using the lag() method of this library. This method finds the previous values in a vector.

Syntax:lag(x, n = 1L, default = NA)

Parameter:

- x – A vector of values
- n – Number of positions to lag by
- default (Default : NA)- the value used for non-existent rows.

A mutation is introduced in the data frame by using the lag of the column value subtracted from the specified column’s particular row. The default value is the first value of that particular group using the first(col-name).

**Example:**

## R

`# installing required libraries` `library` `(` `"dplyr"` `)` ` ` `# creating a data frame` `data_frame <- ` `data.frame` `(col1 = ` `sample` `(6:9, 9 , replace = ` `TRUE` `),` ` ` `col2 = ` `letters` `[1:3],` ` ` `col3 = ` `c` `(1,4,5,1,` `NA` `,` `NA` `,2,` `NA` `,2))` ` ` `print ` `(` `"Original DataFrame"` `)` `print ` `(data_frame)` ` ` `print ` `(` `"Modified DataFrame"` `)` ` ` `# comouting difference of each group` `data_frame%>%` `group_by` `(col1)%>%` `mutate` `(diff=col3-` `lag` `(` ` ` `col3,default=` `first` `(col3)))` |

**Output**

[1] "Original DataFrame" col1 col2 col3 1 6 a 1 2 9 b 4 3 7 c 5 4 6 a 1 5 6 b NA 6 9 c NA 7 6 a 2 8 8 b NA 9 7 c 2 [1] "Modified DataFrame" # A tibble: 9 x 4 # Groups: col1 [4] col1 col2 col3 diff <int> <chr> <dbl> <dbl> 1 6 a 1 0 2 9 b 4 0 3 7 c 5 0 4 6 a 1 0 5 6 b NA NA 6 9 c NA NA 7 6 a 2 NA 8 8 b NA NA 9 7 c 2 -3

**Method 2 : Using data.table package**

The data frame indexing methods can be used to calculate the difference of rows by group in R. The ‘by’ attribute is to specify the column to group the data by. All the rows are retained, while a new column is added in the set of columns, using the column to take to compute the difference of rows by the group. The difference is calculated by using the particular row of the specified column and subtracting from it the previous value computed using the shift() method. The shift method is used to lag vectors or lists.

**Syntax:**

data_frame[ , new-col-name := reqd-col – shift(reqd-col), by = grouping-col]

The first instance of that particular group is replaced by NA in that particular column.

**Example:**

## R

`# installing required libraries` `library` `(` `"data.table"` `)` ` ` `# creating a data frame` `data_frame <- ` `data.table` `(col1 = ` `sample` `(6:9, 9 , replace = ` `TRUE` `),` ` ` `col2 = ` `letters` `[1:3],` ` ` `col3 = ` `c` `(1,4,5,1,9,11,2,7,2))` ` ` `print ` `(` `"Original DataFrame"` `)` `print ` `(data_frame)` ` ` `# comouting difference of each group` `data_frame[ , diff := col3 - ` `shift` `(col3), by = col1]` `print ` `(` `"Modified DataFrame"` `)` `print ` `(data_frame)` |

**Output**

[1] "Original DataFrame" col1 col2 col3 1: 8 a 1 2: 8 b 4 3: 7 c 5 4: 6 a 1 5: 6 b 9 6: 8 c 11 7: 8 a 2 8: 9 b 7 9: 7 c 2 [1] "Modified DataFrame" col1 col2 col3 diff 1: 8 a 1 NA 2: 8 b 4 3 3: 7 c 5 NA 4: 6 a 1 NA 5: 6 b 9 8 6: 8 c 11 7 7: 8 a 2 -9 8: 9 b 7 NA 9: 7 c 2 -3

**Method 3 : Using ave() method**

The ave() method in base R is used to group averages over the level combinations of factors.

Syntax:ave(x, group , FUN = mean)

Parameter :

- x – the required data frame column
- group – the grouping variables
- FUN – The function to apply for each factor level combination.

The function here is to compute the difference of a particular column in that row and the difference of the previous row with it. The first instance of that particular group is replaced by NA in that particular column.

**Example:**

## R

`# creating a data frame` `data_frame <- ` `data.frame` `(col1 = ` `sample` `(6:9, 9 , replace = ` `TRUE` `),` ` ` `col2 = ` `letters` `[1:3],` ` ` `col3 = ` `c` `(1,4,5,1,9,11,2,7,2))` ` ` `print ` `(` `"Original DataFrame"` `)` `print ` `(data_frame)` ` ` `# comouting difference of each group` `data_frame$diff <- ` `ave` `(data_frame$col3, ` `factor` `(data_frame$col1), ` ` ` `FUN=` `function` `(x) ` `c` `(` `NA` `,` `diff` `(x)))` ` ` `print ` `(` `"Modified DataFrame"` `)` `print ` `(data_frame)` |

**Output**

[1] "Original DataFrame" col1 col2 col3 1 9 a 1 2 9 b 4 3 6 c 5 4 7 a 1 5 6 b 9 6 7 c 11 7 9 a 2 8 9 b 7 9 9 c 2 [1] "Modified DataFrame" col1 col2 col3 diff 1 9 a 1 NA 2 9 b 4 3 3 6 c 5 NA 4 7 a 1 NA 5 6 b 9 4 6 7 c 11 10 7 9 a 2 -2 8 9 b 7 5 9 9 c 2 -5