The Github repository has the full code base for CRAML:

CRAML software enables non technical experts in any domain to build training datasets and classify niche text classification machine learning models.

Replicate and classify unstructured text of franchise no poach clauses

The data analyzed in the paper Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML) and the intermediate files needed to replicate the analysis, include proprietary data (job advertisement text) and public data (franchise disclosure documents). By extracting only context surrounding keywords and producing ML models, CRAML allows transparent, replicable construction and shaping of training data for Machine Learning even in cases where the full underlying text corpus is unavailable.

Unstructured Text Data for no poach project
The original PDF files will be made available in text searchable format upon publication. A corpus of cleaned text files will also be published to an academic repository.

Methods to replicate no poach analysis
A user wishing to replicate the analysis should download and install the CRAML software and the intermediate files to replicate analysis.

  1. Install CRAML

  1. Replication materials

  1. Configure and Run Analysis in CRAML