JsonLogParser

The JsonLogParser.py script parses job information from a JSON file that contains an array of dicts that contain the required job information. This allows you to run the Job Analyzer on jobs from an unsupported scheduler as long as you can export the job information in the required JSON format. The parser also requires you to provide a schema mapping file that maps the field names in your json file to the field names used by the job analyzer.

JSON Jobs File Format

The expected JSON format looks like the following example.

[
    {
        "Job_Id": "1",
        "ncpus": "1",
        "mem_bytes": 13843545600,
        "node_count": 1,
        "ctime": "2024-06-28T01:00:00",
        "stime": "2024-06-28T01:00:00",
        "walltime": "2024-06-28T01:00:00",
        "etime": "2024-06-28T01:00:00",
        "queue": "regress",
        "project": "project1",
        "Exit_status": "0",
        "ru_mem_bytes": 7334404096
    },
    {
        "Job_Id": "2",
        "ncpus": "1",
        "mem_bytes": 13843545600,
        "node_count": 1,
        "ctime": "2024-06-28T01:00:00",
        "stime": "2024-06-28T01:00:00",
        "walltime": "2024-06-28T01:00:00",
        "etime": "2024-06-28T01:00:00",
        "queue": "regress",
        "project": "project1",
        "Exit_status": "0",
        "ru_mem_bytes": 7334404096
    }
]

The field names aren't fixed, but must be the same for each job record.

JSON Schema Map File Format

The schema map is a JSON file that maps the standard field names used by the parser to field names of your JSON file. It looks like the following and must contain the following required fields. For numeric fields like max_mem_gb, you can specify the units and the number will be scaled. For example if you log contains memory request in bytes, then you can specify that and the number will be converted to GB.

    "job_id": "Job_Id",
    "num_cores": "ncpus",
    "max_mem_gb": {"mem_bytes": {"units": "b"}},
    "num_hosts": "node_count",
    "submit_time": "ctime",
    "start_time": "stime",

    "run_time": "walltime",

    "eligible_time": "etime",
    "queue": "queue",
    "project": "project",

    "exit_status": "Exit_status",

    "ru_maxrss": "ru_mem_bytes"

Parsing the accounting JSON log files

First you must source the setup script to make sure that all required packages are installed.

source setup.sh

This creates a python virtual environment and activates it.

The script will parse a JSON file that contains the job information. The parsed data will be written in the output directory which will be created if it does not exist. The parsed job data will be written out in csv format.

./JsonLogParser.py \
    --input-json <json-jobs-file> \
    --json-schema <json-schema-file>
    --output-csv jobs.csv

Full Syntax

usage: JsonLogParser.py [-h] --input-json INPUT_JSON --json-schema-map
                        JSON_SCHEMA_MAP [--output-csv OUTPUT_CSV]
                        [--starttime STARTTIME] [--endtime ENDTIME]
                        [--disable-version-check] [--debug]

Parse JSON file with job results.

optional arguments:
  -h, --help            show this help message and exit
  --input-json INPUT_JSON
                        Json file with parsed job info. (default: None)
  --json-schema-map JSON_SCHEMA_MAP
                        Json file that maps input json field names to
                        SchedulerJobInfo field names. (default: None)
  --output-csv OUTPUT_CSV
                        CSV file where parsed jobs will be written. (default:
                        None)
  --starttime STARTTIME
                        Select jobs after the specified time. Format YYYY-MM-
                        DDTHH:MM:SS (default: None)
  --endtime ENDTIME     Select jobs before the specified time. Format YYYY-MM-
                        DDTHH:MM:SS (default: None)
  --disable-version-check
                        Disable git version check (default: False)
  --debug, -d           Enable debug mode (default: False)

What's next?

Once completed, you can run step #3 (cost simulation) by following the instructions in for the JobAnalyzer Since you already generated a CSV file, you will use JobAnalyzer with the csv option.