Understanding the CSV format
The CSV files generated by parsing LSF / Slurm / Accelerator job completion records all have the same structure. This simplifies the cost simulation, as only 1 input type is processed.
You can also generate this CSV file yourself from any other scheduler and use JobAnalyzer.py to parse that, but you need to adhere to the structure below (The order of the fields in the CSV file do not matter, but the names of the fields do).
CSV structure
There is no standard for CSV files, but the JobAnalyzer.py expects the files to be written in the "Microsoft Excel" dialect, adn field names must be in the first row.
Required Fields
Field | Type | Description |
---|---|---|
job_id | string | Job id |
num_cores | int | Total number of cores allocated to the job |
max_mem_gb | float | Total amount of memory allocated to the job in GB |
num_hosts | int | Number of hosts. In Slurm this is the number of nodes. The number of cores should be evenly divisible by the number of hosts. For LSF this is typically 1. |
submit_time | datetime string | Time that the job was submitted. All times are expected to be in the format YYYY-MM-DDThh:mm:ss |
start_time | datetime string | Time that the job started. |
finish_time | datetime string | Time that the job finished. |
Optional Fields
Field | Type | Description |
---|---|---|
resource_request | string | Resources requested by the job. For LSF this is the effective resource request. For Slurm it is the job constraints. |
ineligible_pend_time | timedelta string | Duration that the job was ineligible to run. This is what LSF writes. Defaults to max(0, eligible_time - start_time). Format for timedelta strings is h:mm:ss. |
eligible_time | datetime string | Time that the job became eligible to run. Defaults to the start_time + ineligible_pend_time. |
requeue_time | datetime string | Time that the job was requeued. Currently not used. |
wait_time | timedelta string | Time that job was pending after it was eligible to run. Default start_time - eligible_time. |
run_time | timedelta string | Time that the job ran. Default: finish_time - start_time |
exit_status | int | Effective return code of the job. Default: 0 |
ru_majflt | float | Number of page faults |
ru_maxrss | float | Maximum shared text size |
ru_minflt | float | Number of page reclaims |
ru_msgsnd | float | Number of System V IPC messages sent |
ru_msgrcv | float | Number of messages received |
ru_nswap | float | Number of times the process was swapped out |
ru_inblock | float | Number of block input operations |
ru_oublock | float | Number of block output operations |
ru_stime | float | System time used |
ru_utime | float | User time used |