The Model Specification Workbook (MSW) is used to specify your model. The MSW is a series of four worksheets (CSV files) that describe different model components. You can use bllflow without a MSW but we recommend using the
modelDescription— the name of the model, date created and other information about the study.
variables— all model variables, including data cleaning and transformations.
variablesis the most important sheet and is helpful even if you don’t use other parts of bllflow
variableDetails— information on factors (categories) and how to transform final variables from their starting variables.
summaryVariables– Identify variables that are used in model reporting, such as for
You have your study data. Great. The first step is specifying which variables you need in your model, as well as variables for study cohort creation, data cleaning and variable transformation.
bllflow has an example Model Specification Workbook for the
pbc data. The Model Specification Sheets describe the specifications to recreate a survival model for primary biliary cirrhosis.
Pre-specified analyses is emphasized, but variables can be added as you perform study. Additional variables and transformation are added to the Model Specification Workbook to ensure reproducibilty and transparency. The Model Specification Workbook is a record of how you created your model and what analyses you performed. As well, bllflow uses metadata throughout the workflow, including reporting the results of your model.
The model the we will develop has six variables: age, sex, bili, albumin, protime, edema.
variables.csv contains each variable as row. The sheet includes additional information such as variable labels. There are instructions for data cleaning that are discussed in step 3. For example, the model is restricted to ages 40 to 70 years. So, for
age there there are
There are two approaches to creating the Model Specification Workbook. We usually start the Model Specification Workbook as a CSV file to facilate collaboration between study colloborators. Alternatively, you can create the MSW as an R dataframe.
bllflow supports importing metadata into the workbook from:
attrlabel (using hmisc, sjlabelled or similar packages); or,
variableDetails.csv contains additional variable details. For categorical variables there are rows for each category (factor). Included for each row are factor levels and lablels. Again, this information can be added through helper functions if there is a DDI file or the labels are already in your data.
variableDetails sheet also includes transformed variables used throughout the study. In our example model, we use age as a non-linear predictor (3 knot restricted cubic spline). However,
Table 1 and other tables report
age categories. We added the transformed
age_cat4 variable to the
variableDetails.csv file, along with labels and infomiation on the age range for each category.
There are 16 rows in
variableDetails. We included
age as the first example, with the remaining rows representing only the newly transformed variables – variables that do not existing our orginal
pbc data. The information for variables in the original
pbc data are in the
pbcDDI.xml file. That metadata can be added with the DDI utility functions describe later Helper and utility functions
Model Specification Workbook is imported and added to a bllflow object that can be used instructions for data cleaning. Once read into the bllflow, the Model Specification Workbook are objects that are accessed and used to provide instructions to clean data and transform variables.
In the following example, the MSW
variableDetails sheets are read and then added with the
pbc data into our
library(survival) data(pbc) # read MSW variables and variableDetails sheet for the PBC model variables <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variables.csv')) variableDetails <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variableDetails.csv')) library(bllflow) #> Loading required package: tableone #> Loading required package: DDIwR #> Loading required package: xml2 #> #> Attaching package: 'bllflow' #> The following object is masked from 'package:tableone': #> #> CreateTableOne # create a bllflow object and add labels. pbcModel <- BLLFlow(pbc, variables, variableDetails)
variableDetails along with three additional objects used to support model building.