
Estimated glomerular filtration rate (eGFR) calculation
Boris Bikbov, Scientific-Tools.Org
2025-03-31
gfr.Rmd
Estimated glomerular filtration rate (eGFR) calculation
kidney.epi R package includes functions for calculation of eGFR by different equations.
Data frames
- ckd.data - contains synthetic data for 1000 adults and 1000 children (see description in documentation).
library(kidney.epi)
#> The kidney.epi package is made with care by the research consultancy Scientific-Tools.Org.
#> Contact us at https://Scientific-Tools.Org or via 'maintainer("kidney.epi")' for data analysis or software development.
head(ckd.data)
#> cr cys age sex ethnicity height category
#> 1 108.10 1.29 73.8 Male Caucasian NA adults
#> 2 101.12 1.26 66.8 Male Caucasian NA adults
#> 3 139.99 1.63 75.9 Male Caucasian NA adults
#> 4 145.26 1.75 68.1 Female Caucasian NA adults
#> 5 148.21 1.74 51.7 Female Caucasian NA adults
#> 6 179.43 2.14 41.8 Male Caucasian NA adults
Functions to calculate eGFR by different equations
kidney.epi contains a set of functions to calculate eGFR by different equations either for a single patient or for a dataset.
The following eGFR equations are supported:
- CKD-EPI 2021 creatinine-based: function
egfr.ckdepi.cr.2021()
and its aliasegfr.ckdepi.cr()
- CKD-EPI 2021 cystatin-based: function
egfr.ckdepi.cr_cys.2021()
- CKD-EPI 2009 creatinine-based: function
egfr.ckdepi.cr.2009()
- European Kidney Function Consortium (EKFC) creatinine-based:
function
egfr.ekfc.cr()
- EKFC cystatin-based: function
egfr.ekfc.cys()
- Full age spectrum (FAS) creatinine-based: function
egfr.fas.cr()
- FAS cystatin-based: function
egfr.fas.cys()
- FAS creatinine-cystatin-based: function
egfr.fas.cr_cys()
- Revised Lund-Malmö creatinine-based: function
egfr.lm.cr()
- MDRD: function
egfr.mdrd4()
- Lund-Malmö: function
egfr.lm.cr()
- Schwartz: function
egfr.schwartz()
- Chronic Kidney Disease in Children (CKiD) U25 creatinine-based
equation
egfr.ckid_u25.cr()
- CKiD U25 cystatin-based equation
egfr.ckid_u25.cys()
- more is underway. Scientific-Tools.Org provides research and consulting services for policy makers, industry, scientists, patient organizations, and citizens. This package is developed in our free time or with your support.
If you use these functions from kidney.epi package for the data analysis and manuscript preparation, please cite the package: “Bikbov B. kidney.epi: Kidney-Related Functions for Clinical and Epidemiological Research. Scientific-Tools.Org, https://Scientific-Tools.Org. doi:10.32614/CRAN.package.kidney.epi”.
Contact us for data analysis or software development at Scientific-Tools.Org or via ‘maintainer(“kidney.epi”)’, connect with the author on LinkedIn.
Examples
The vignette demonstrates the usage of eGFR calculation by the CKD-EPI 2009 equation, but race-free CKD-EPI 2021 and other equations work in the same way.
Example for a single patient
To calculate for a single patient, use the following syntax:
# call egfr.ckdepi.cr.2009 function, and directly set parameters values
egfr.ckdepi.cr.2009(
creatinine = 1.4,
age = 60,
sex = "Male",
ethnicity = "White",
creatinine_units = "mg/dl",
label_afroamerican = c("Afroamerican"),
label_sex_male = c("Male"),
label_sex_female = c("Female")
)
#> [1] 54.22
# Definitions of the labels for sex and race are optional if you use the same labels defined as default in the function. The following also works well:
egfr.ckdepi.cr.2009(
creatinine = 1.4,
age = 60,
sex = "Male",
ethnicity = "White",
creatinine_units = "mg/dl"
)
#> [1] 54.22
# If you measure creatinine in micromol/l, it is possible to omit also 'creatinine_units' since the default value is "micromol/l":
egfr.ckdepi.cr.2009(
creatinine = 103, # creatinine is in micromol/l
age = 60,
sex = "Male",
ethnicity = "White"
)
#> [1] 67.7
Example for a cohort of patients
To calculate eGFR for a cohort of patients in a dataset, use the following syntax:
# copy as an example the internal dataframe ckd.data from R package to your dataframe
mydata <- ckd.data
# calculate eGFR by CKD-EPI equation
mydata$ckdepi <- egfr.ckdepi.cr.2009(
creatinine = mydata$cr, age = mydata$age,
sex = mydata$sex, ethnicity = mydata$ethnicity,
creatinine_units = "micromol/L",
# customize all labels for those used in the data frame if necessary
label_afroamerican = c("Black"),
label_sex_male = c("Male"), label_sex_female = c("Female")
)
#> Warning in service.check_plausibility.age(age, max_age): There is 1 patient with negative values for age. This value was substituted to NA.
#> Warning in service.check_plausibility.age(age, max_age): There is 1 patient with age >100 years. This value was substituted to NA.
#> There are 840 patients with age <18 years. These values were substituted to NA.
# show descriptive stat for the calculated values
# note that synthetic data set ckd.data contains input parameters for both adults and children, and since the CKD-EPI equation was developed and validated for adults only, the resulting eGFR values for children will be NA. Use children-specific eGFR equations when necessary.
summary(mydata$ckdepi)
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 17.78 35.44 45.19 51.80 59.85 179.11 841
Advantages of the kidney.epi package functions
There are several advantages of the kidney.epi package functions for calculating eGFR values:-
Data workflow is reproducible and based on verified
algorithms:
The kidney.epi package offers a reproducible, open-source workflow built on verified methods, reducing duplication and enabling teams to focus on what truly matters: their data and insights. Thus, every research group should not rewrite the same computational code from scratch. -
Control for input values:
If some input values are not plausible (negative values for age or creatinine, age exceeding logical limits, etc) or not suitable for a given eGFR equation (applicable only to children or only to adults) - they will be omitted, and thus in the output there will be only robust results. -
Possibility to use different measurement units for
creatinine:
There is no need to decode creatinine values in your data set. Just define in the ‘creatinine_units’ parameter whether your data contain values in micromol/L, mmol/L or mg/dL - and the rest will be processed by the function. -
Flexible label handling for enhanced usability:
The function offers a high degree of flexibility by allowing you to define custom labels that match the labels used in your data frame. This ensures consistent interpretation of data without needing to modify the original dataset. Thus, you don’t need to decode labels in your data frame, just define which of your labels correspond to males, females, and other parameters.
Take into account the following examples:- If the data frame has only label “Male” for males, you can skip the definition because this is already assumed by the function, or define for clarity label_sex_male = “Male”.
- Consider that labels are case-sensitive, and thus be attentive to “Male” and “male” or similar definitions.
- If your data frame uses different labeling conventions, you can easily adjust the labels in the function parameters to align with your data. For example, if you data frame contains labels “F” for females and “M” for males, you have to indicate the labeling in parameters of the function as label_sex_male = “M”, label_sex_female = “F”.
-
The functions support also multiple labels in non-standard or mixed
data. If you’re working with data that hasn’t been fully standardized —
where the same category might have different labels — the function
allows you to define multiple values as valid labels for the same
category.
For example, if male sex is represented by both “male” and “hombre” labels and female sex by both “female” and “mujer” labels, you can define: label_sex_male = c(“male”, “hombre”), label_sex_female = c(“female”, “mujer”).
If male sex is represented by both “male” and 1, you can define: label_sex_male = c(“male”, 1).
If male sex is represented by both “male” and “Male” (case-sensitive), you can define: label_sex_male = c(“male”, “Male”).
- As the result, there is no need to modify the original dataset — just adjust the function parameters instead. This saves time and reduces data preprocessing efforts, as well as improve the code readability.