“Flow” in bllflow refers to the process of using the Model Specification Worksheet to perform rountine data cleaning and transformation, performance reporting, and model deployment. Go to
Workflow to see bllflow’s seven steps to analysing observational data. You can pick and choose to use any steps that fit your own workflow.
Workflow vignettes use the
pbc data available in the
suvival package to replicate a survival model for people with primary biliary cirrhosis. What is the
pbc data? The name, description and other information is included in the metadata file!
A typical first step when starting a new study is applying inclusion and exclusion criteria to the study data. In our PBC survival model, we will include only participants ages 40 to 70 years.
# load libraries and pbc data (from survival) library(survival) data(pbc) library(bllflow) # read the MSW # MSW includes columns 'min' and 'max' with rows for 'age' values 40 and 70. variables <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variables.csv')) variableDetails <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variableDetails.csv')) # perform all data cleaning steps pbcModel <- BLLFlow(pbc, variables, variableDetails) cleanPbc <- clean.Min(pbcModel, print = TRUE)
##  "clean.min.BLLFlow: 418 rows were checked and 69 rows were set to delete. Reason: Rule age min at 40 "
##  "clean.max.BLLFlow: 349 rows were checked and 13 rows were set to delete. Reason: Rule age max at 70 "
PBC-variables.csv file there is a column ‘min’ and ‘max’ and a row each variable. The ‘age’ variable has the values for 40 and 70 in the ‘min’ and ‘max’ columns. This example is shown in more detail in the data cleaning and transformation vignette.
Note that executing
max criteria for all variables in the pbcModel.