The Github repository has the full code base for CRAML: https://github.com/sjmeis/CRAML_Beta
CRAML software enables non technical experts in any domain to build training datasets and classify niche text classification machine learning models.
Replicate and classify unstructured text of franchise no poach clauses
The data analyzed in the paper Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML) and the intermediate files needed to replicate the analysis, include proprietary data (job advertisement text) and public data (franchise disclosure documents). By extracting only context surrounding keywords and producing ML models, CRAML allows transparent, replicable construction and shaping of training data for Machine Learning even in cases where the full underlying text corpus is unavailable.
Unstructured Text Data for no poach project
The original PDF files will be made available in text searchable format upon publication. A corpus of cleaned text files will also be published to an academic repository.
Methods to replicate no poach analysis
A user wishing to replicate the analysis should download and install the CRAML software and the intermediate files to replicate analysis.
- Install CRAML
- Download CRAML from Github.
- Designed in Python for Linux operating systems, the software is cross-platform compatible, but some functions may encounter unexpected bugs in Windows or Mac environments.
- Install dependencies in requirements.txt
- Execute the file CRAML_Tool.py, and CRAML will appear in a browser window.
- Replication materials
- Configure and Run Analysis in CRAML
- Setup settings will need to be adjusted to point to the location of files in each user’s machine.
- Additional step-by-step instructions are in a pdf.