How to find the mode of the categorical variable from a data frame in r

AKRAM HUSSAIN KHAN
2 min readNov 7, 2022

Sometimes there are many more categories of a variable in a data frame. And we want to find the mode category or which type is more frequent.

We can easily find the mean and median by using the base function of r, but for mode, we have to create our own function to calculate the mode.

Here, is a small and easy function to calculate the mode of a variable.

mod<-function(x){t<-table(x)return(labels(t[t==max(t)]))}

Here, x can take any vector or variable of a data frame and calculate the mode.

For example suppose, the following is a vector.

vec<-c("A", "D", "G", "A", "F", "I", "C", "B", "H", "K", "A")

By using our mod function we get:

mod(vec)
[1] "A"

Which is true. It also works if there is more than one mode value. Suppose for example:

vec1<-c("A", "D", "G", "D", "F", "I", "C", "B", "H", "K", "A")

If we calculate the mode of this vector, we will find:

mod(vec1)
$x
[1] "A" "D"

Here, A and D both are twice in the vector.

And now suppose, I have a data frame with a column named country which has more than100 categories. I want to find the mode of the country.

length(unique(train$country))
[1] 163

Here, our county has 163 categories. We can find the mode easily by our function.

mod(train$country)
"Portugal"

Portugal is the mode or most frequent county in our county column of the train data frame. It also works for numeric values as well. Suppose,

x2 <- c(7, 1, 3, 8, 5, 7, 6)
mod(x2)
[1] "7"

I learned it from my boss Naimul Hasan.

I hope, you will find it easy to understand and apply.

--

--