Overview
One of the most important problems in combinatorial optimization is the assignment problem, in which a group of workers has to perform a set of tasks. For each worker and task, there is a fixed cost for that worker to perform the task. The problem is to assign each worker to a distinct task so as to minimize the total cost.
Here's an example. Suppose that a taxi company has four customers waiting for rides, and four cab drivers (the workers) who can pick them up. The tasks are for each driver to pick up one customer. The cost of a task is the time it would take the driver to pick up a specific customer. The problem: assign drivers to customers so as to minimize the total wait time.
You can visualize this problem in the bipartite graph below. The nodes on the left represent workers (drivers). The nodes on the right represent tasks (customers). The edges represent all possible ways to assign a worker to a task.
Cost matrix
The costs (wait times) for drivers to pick up customers are given in the table below, called the cost matrix.
0 | 1 | 2 | 3 | |
0 | 90 | 76 | 75 | 70 |
1 | 35 | 85 | 55 | 65 |
2 | 125 | 95 | 90 | 105 |
3 | 45 | 110 | 95 | 115 |
The following sections present a Python program that solves the assignment problem.
Create the data
The data for the problem is shown below.
def create_data_array(): cost = [[90, 76, 75, 70], [35, 85, 55, 65], [125, 95, 90, 105], [45, 110, 95, 115]] return cost
The array is the cost matrix, whose i, j entry is the cost for worker i to perform task j. There are four workers, corresponding to the rows of the matrix, and four tasks, corresponding to the columns.
Create the solver
The program uses the linear assignment solver, a specialized solver for the assignment problem. The following code creates the solver.
from ortools.graph import pywrapgraph import time def main(): cost = create_data_array() rows = len(cost) cols = len(cost[0]) assignment = pywrapgraph.LinearSumAssignment()The program imports the Python wrapper
pywrapgraph
, and then uses
the method LinearSumAssignment
to create the solver.
Note: The linear assignment solver only accepts integer values for the weights and values. The section Using a solver with non-integer data shows how to use the solver if your data contains non-integer values.
To compare how long different solvers take to solve the same problem,
all the programs in this section have a
timer (which uses the time
package). The following code creates the timer.
if __name__ == "__main__": start_time = time.clock() main() print() print("Time =", time.clock() - start_time, "seconds")
Add the costs to the solver
The following code adds the costs to the solver by looping over workers and tasks.
for worker in range(rows): for task in range(cols): if cost[worker][task]: assignment.AddArcWithCost(worker, task, cost[worker][task])
Invoke the solver
The following code invokes the solver and displays the solution.
solve_status = assignment.Solve() if solve_status == assignment.OPTIMAL: print('Total cost = ', assignment.OptimalCost()) print() for i in range(0, assignment.NumNodes()): print('Worker %d assigned to task %d. Cost = %d' % ( i, assignment.RightMate(i), assignment.AssignmentCost(i))) elif solve_status == assignment.INFEASIBLE: print('No assignment is possible.') elif solve_status == assignment.POSSIBLE_OVERFLOW: print('Some input costs are too large and may cause an integer overflow.')The output below shows the optimal assignment of workers to tasks.
$ python my_projects/linear_sum_assignment.py Total cost = 265 Worker 0 assigned to task 3. Cost = 70 Worker 1 assigned to task 2. Cost = 55 Worker 2 assigned to task 1. Cost = 95 Worker 3 assigned to task 0. Cost = 45 Time = 0.000147 secondsThe following graph shows the solution as the dashed edges in the graph. The numbers next to the dashed edges are their costs. The total wait time of this assignment is the sum of the costs for the dashed edges, which is 265.
In graph theory, a set of edges in a bipartite graph that matches every node on the left with exactly one node on the right is called a perfect matching.
The entire program
Here is the entire program.
from __future__ import print_function from ortools.graph import pywrapgraph import time def main(): cost = create_data_array() rows = len(cost) cols = len(cost[0]) assignment = pywrapgraph.LinearSumAssignment() for worker in range(rows): for task in range(cols): if cost[worker][task]: assignment.AddArcWithCost(worker, task, cost[worker][task]) solve_status = assignment.Solve() if solve_status == assignment.OPTIMAL: print('Total cost = ', assignment.OptimalCost()) print() for i in range(0, assignment.NumNodes()): print('Worker %d assigned to task %d. Cost = %d' % ( i, assignment.RightMate(i), assignment.AssignmentCost(i))) elif solve_status == assignment.INFEASIBLE: print('No assignment is possible.') elif solve_status == assignment.POSSIBLE_OVERFLOW: print('Some input costs are too large and may cause an integer overflow.') def create_data_array(): cost = [[90, 76, 75, 70], [35, 85, 55, 65], [125, 95, 90, 105], [45, 110, 95, 115]] return cost if __name__ == "__main__": start_time = time.clock() main() print() print("Time =", time.clock() - start_time, "seconds")
Solution when workers can't perform all tasks
In the previous example, we assumed that all workers can perform all tasks. But this not always the case — an worker might be unable to perform one or more tasks for various reasons. However, it is easy to modify the program above to handle this.
As an example, suppose that worker 0 is unable to perform task 3. To modify the program to take this into account, make the following changes:
- Change the 0, 3 entry of the cost matrix to the string
'NA'
. (Any string will work.)cost = [[90, 76, 75, 'NA'], [35, 85, 55, 65], [125, 95, 90, 105], [45, 110, 95, 115]]
- In the section of the code that assigns costs to the solver, add the line
if cost[worker][task] != 'NA':
, as shown below.for worker in range(0, rows): for task in range(0, cols): if cost[worker][task] != 'NA': assignment.AddArcWithCost(worker, task, cost[worker][task])
The added line prevents any edge whose entry in the cost matrix is'NA'
from being added to the solver.
After making these changes and running the modified code, you see the following output:
Total cost = 276 Worker 0 assigned to task 1. Cost = 76 Worker 1 assigned to task 3. Cost = 65 Worker 2 assigned to task 2. Cost = 90 Worker 3 assigned to task 0. Cost = 45
Notice that the total cost is higher now than it was for the original problem. This is not surprising, since in the original problem the optimal solution assigned worker 0 to task 3, while in the modified problem that assignment is not allowed.
To see what happens if more workers are unable to perform tasks, you can replace
more entries of
the cost matrix with 'NA'
, to denote additional workers who can't perform
certain tasks:
cost = [[90, 76, 'NA', 'NA'], [35, 85, 'NA', 'NA'], [125, 95, 'NA','NA'], [45, 110, 95, 115]]When you run the program this time, you get a negative result:
No assignment is possible.
This means
there is no way to assign workers to tasks so that each worker performs
a different task. You can see why this is so
by looking at the graph for the problem
(in which there are no edges corresponding to values of 'NA'
in the cost
matrix).
Since the nodes for the three workers 0, 1, and 2 are only connected to the two nodes for tasks 0 and 1, it not possible to assign distinct tasks to these workers.
The Marriage Theorem
There is a well-known result in graph theory, called The Marriage Theorem, which tells us exactly when you can assign each node on the left to a distinct node on the right, in a bipartite graph like the one above. Such an assignment is called a perfect matching. In a nutshell, the theorem says this is possible if there is no subset of nodes on the left (like the one in the previous example) whose edges lead to a smaller set of nodes on the right.
More precisely, the theorem says that a bipartite graph has a perfect matching if and only if for any subset S of the nodes on the left side of the graph, the set of nodes on the right side of the graph that are connected by an edge to a node in S is at least as large as S.