Andreas T. Ernst

This download contains the AP (Australia Post) data set for p-hub median and capacitated hub location problems. The files contain the following:


  1. APdata200  - Data file for a full 200 node problem with 8 hubs
  2. generate.c - C program for generating smaller data sets
    USAGE: generate n p < APdata200 > newdata
    This creates a new problem with n nodes and p hubs
  3. 20.3         - A sample data set with 20 nodes produced by   
    generate 20 3 < APdata200 > 20  
  4. FcostX.NN  - Fixed cost file for NN nodes where X = 'T' or 'L'
    (the T problems tend to be more difficult). Fcost files are not relevant to the p-hub location problems
  5. CapY.NN    - Node capacity file for NN nodes where Y = 'T' or 'L'
    (the T problems are more tightly constraint) . Capacity files are not relevant to the p-hub location problems
  6. Solutions-*.txt  - Optimal solutions for all combinations of capacitated (CSAHLP) and uncapacitated single allocation (USApHMP) and multiple allocation (UMApHMP) hub median problems, with n in {10,20,25,40,50} and multiple options for hubs or capacities. In the result files we use the nomenclature "NNXY" to refer to the problem generatedfrom data files NN, FcostX.NN and CapY.NN. The objective is given as well as the allocation of hubs. Eg an allocation vector 2,2,4,4,4 means that nodes 2 & 4 are hubs, node 1 is allocated to hub node 2 and nodes 3 & 5 to hub node 4.


Data file format for nodes file:

<n>                                     Number of nodes

<x[1]> <y[1]>                   x & y coordinates of node 1



<x[n]> <y[n]>                           x & y coordinates of node n

<w[1][1]> <w[1][2]> ... <w[1][n]>       flow from node 1 to all others

 :         :             :

 :         :             :

<w[n][1]> <w[n][2]> ... <w[n][n]>       flow from node n to all others

<p>                                     Number of hubs (for p-hub median problems)

<c>                                     Collection cost

<t>                                     Transfer cost

<d>                                     Distribution cost


All of the costs are per unit (euclidean) distance, per unit flow volume.


The costs and and capacity files contain one number for each node (in the same order as in the nodes file). For FcostX.NN this represents the cost of making the node a hub. For CapY.NN the numbers are the capacity on incoming commodities (including from the node itself) if that node is made a hub.



There are currently 137 data files with number of workers and jobs in the range 22 <=|W|<= 420 and 40 <=|J|<= 2105; and with tightness 90%.


These data files are from M. Krishnamoorthy, A.T. Ernst and D. Baatar "Algorithms for Large Scale Shift Minimisation Personnel Task Scheduling Problems" European Journal of Operational Research Volume 219, Issue 1, 2012, Pages 34–48.


The format of all of these data files is:


   multi–skilling level

   type =1 (minimizing the number of shifts/workers)

   number of jobs/tasks

   job start and end times

   number of shifts(qualifications)

   number of qualified jobs for each shift and the qualified job indices


The name of the data file presents:


   the file number

   number of workers

   number of jobs

   the multi–skilling level


For example, Data_10_51_111_66.dat is the 10th data file which presents a problem instance with 51 workers, 111 jobs and with a multi–skilling level of 66%.


Data generation


A program has been provided for generating additional data sets in the same

format. This has been used for example by Pieter Smet to generate more

difficult instances.

(though in some cases the jobs are mean 100 -j 0 -J 200)



==                   datagen useage                                        ==


USAGE: datagen <tightness> <type> <#people> <job length> <% overtime>

Optional arguments (after mandatory ones):

     [-q qualified] [-r|-R seed] [-j <Minimum job time>][-J <max job time>]

     [-s <min shift time>] [-S <max shift time>] [-p]

tightness  = % of shift time taken up by jobs

type       = 1 - Start & end times for jobs + qualifications

            2 - Start & end times for jobs & people

            3 - Start & end times for both + qualifications

            4 - Start & end times, qualifications & costs

# people   = Number of staff available

job length = Median job length

% overtime = Percentage of overtime staff

-q %       = % of jobs (on average) that each person can do

            default = <tightness>, 0% ==> only 1 person/job

-r|-R seed = Seed generator using time-based or given seed

-j|-J      = min/max job length defaults = 15/150

-s|-S      = min/max shift length defaults = 240/600

-p         = produce gnu plot files of shifts & feasible soln

-o  FN.txt = output file name is FN.txt (default data.txt)



Test data for the paper "Mathematical Models for the Berth Allocation Problem in Dry Bulk Terminals"

by Andreas T. Ernst, Ceyda Oguz, Gaurav Singh and Gita Taherkhani  

For details of the problem and benchmark solution information please refer to the above paper.


Problems are defined in terms of

A - Arrival time of each vessel (hours since start of planning period)

P - Processing time for the vessel (hours)

L - Length of the vessel

B - Beginning of each tidal window

E - End of each tidal window


The folder of data sets in inputData/ contains

- .xlsx files: each parameter is in a separate sheet, with 10 instances in 10 columns

- .dat files: for use with OPL studio files for running the integer programming solution method

- .csv files: simpler text format files with each vector of data on one line


In addition the OPL studio source code is provided in .mod files