Software

The Github repository has the full code base for CRAML: https://github.com/sjmeis/CRAML_Beta

CRAML software enables non technical experts in any domain to build training datasets and classify niche text classification machine learning models.

Replicate and classify unstructured text of franchise no poach clauses

The data analyzed in the paper Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML) and the intermediate files needed to replicate the analysis, include proprietary data (job advertisement text) and public data (franchise disclosure documents). By extracting only context surrounding keywords and producing ML models, CRAML allows transparent, replicable construction and shaping of training data for Machine Learning even in cases where the full underlying text corpus is unavailable.

Unstructured Text Data for no poach project
The original PDF files will be made available in text searchable format upon publication. A corpus of cleaned text files will also be published to an academic repository.

Methods to replicate no poach analysis
A user wishing to replicate the analysis should download and install the CRAML software and the intermediate files to replicate analysis.

Install CRAML

Download CRAML from Github.
Designed in Python for Linux operating systems, the software is cross-platform compatible, but some functions may encounter unexpected bugs in Windows or Mac environments.
Install dependencies in requirements.txt
Execute the file CRAML_Tool.py, and CRAML will appear in a browser window.

Replication materials
- See https://zenodo.org/record/7454758

Configure and Run Analysis in CRAML

Setup settings will need to be adjusted to point to the location of files in each user’s machine.
Additional step-by-step instructions are in a pdf.

Peter D. Norlander, PhD

Data and Software

Software

Replicate and classify unstructured text of franchise no poach clauses