Introduction

In bllflow data cleaning is loosely defined as modifying rows (observations) and variable transformation as modifying columns (variables). Data preparation refers to both these procedures. Data preparation functions come in two forms:

  1. Functions where the data and variables are specfified as attributes within the function. bllflow add to the functions that already exist in the sjmisc package.
  2. bllflow object functions where the Model Specification Workbook specifies which data is cleaned and variables transformed. These bllflow ‘wrapper’ functions are available for selected functions in the sjmisc package and for all bllflow objects.

The effects of all data preparation functions can be logged and printed. Tranformations can be save as a Predictive Modelling Markup Language (PMML) file to faciliate predictive model deployment.

Data cleaning

Three data cleaning functions are supported.

For continuous variables:

  1. clean.Min() – deletes observations below a defined value.
  2. clean.Max() – deletes observations abovea a defined value.

For categorical variables:

  1. recodeWithTable() TODO.

Data tranformation

Supported functions: TODO: describe these functions. - centring - normalization and standardization - restrictive cubic splines - interaction terms - dummy variables - recoding

Logging

TODO

Transformations to PMML

TODO