In bllflow data cleaning is loosely defined as modifying rows (observations) and variable transformation as modifying columns (variables). Data preparation refers to both these procedures. Data preparation functions come in two forms:
sjmiscpackage and for all bllflow objects.
The effects of all data preparation functions can be logged and printed. Tranformations can be save as a Predictive Modelling Markup Language (PMML) file to faciliate predictive model deployment.
Three data cleaning functions are supported.
For continuous variables:
clean.Min()– deletes observations below a defined value.
clean.Max()– deletes observations abovea a defined value.
For categorical variables:
# load libraries and pbc data (from survival) library(survival) data(pbc) library(bllflow) #> Loading required package: tableone #> Loading required package: DDIwR #> Loading required package: xml2 #> Loading required package: sjlabelled #> Loading required package: haven #> #> Attaching package: 'haven' #> The following objects are masked from 'package:sjlabelled': #> #> as_factor, read_sas, read_spss, read_stata, write_sas, #> zap_labels #> #> Attaching package: 'bllflow' #> The following object is masked from 'package:tableone': #> #> CreateTableOne # read the Model Specification Workbook variables <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variables.csv')) variableDetails <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variableDetails.csv')) # initialize BLLFlow - create the BLLFlow object pbcModel <- BLLFlow(pbc, variables, variableDetails) cleanedPbcModel <- clean.Max(pbcModel, print = TRUE) #>  "clean.max.BLLFlow: 418 rows were checked and 13 rows were set to delete. Reason: Rule age max at 70 " cleanedPbcModel <- clean.Min(cleanedPbcModel, print = TRUE) #>  "clean.min.BLLFlow: 405 rows were checked and 69 rows were set to delete. Reason: Rule age min at 40 " cleanedPbcModel$metaData$log # to print the entire log. #>  "Data cleaning and trandformation log" #>  "2 steps performed" #> Step Function Variable Label Value Rows Type #> 1 clean.max.BLLFlow age Age (years) 70 13 delete #> 2 clean.min.BLLFlow age Age (years) 40 69 delete
Supported functions: TODO: describe these functions. - centring - normalization and standardization - restrictive cubic splines - interaction terms - dummy variables - recoding