Filter R Dataframe with atleast N number of non-NAs
In this tutorial, we will learn how to filter rows of a dataframe with alteast N number of non-NA column values.
To filter rows of a dataframe that has atleast N non-NAs, use dataframe subsetting as shown below
</>
Copy
resultDF = mydataframe[rowSums(is.na(mydataframe[ , 0:ncol(mydataframe)])) <= (ncol(mydataframe) - N), ]
where
mydataframe
is the dataframe containing rows with one or more NAsresultDF
is the resulting dataframe with rows not containing atleast one NA
Example 1 – Filter R Dataframe with minimum N non-NAs
In this example, we will create a Dataframe containing rows with different number of NAs.
</>
Copy
> mydataframe = data.frame(x = c(9, NA, 7, 4), y = c(4, NA, NA, 21), z = c(9, 8, NA, 74), p = c(NA, 63, NA, 2))
> mydataframe
x y z p
1 9 4 9 NA
2 NA NA 8 63
3 7 NA NA NA
4 4 21 74 2
Now, we will filter this dataframe such that the output contains only rows with atleast 2 non-NAs.
</>
Copy
> N = 2
> resultDF = mydataframe[rowSums(is.na(mydataframe[ , 0:ncol(mydataframe)])) <= (ncol(mydataframe) - N), ]
> resultDF
x y z p
1 9 4 9 NA
2 NA NA 8 63
4 4 21 74 2
>
Let us try with N = 3.
</>
Copy
> N=3
> resultDF = mydataframe[rowSums(is.na(mydataframe[ , 0:ncol(mydataframe)])) <= (ncol(mydataframe) - N), ]
> resultDF
x y z p
1 9 4 9 NA
4 4 21 74 2
>
Conclusion
In this R Tutorial, we have learned to filter a Dataframe based on the number of non-NAs (or ofcourse NAs) in a row.