# Multi Agent Pickup and Delivery: A short overview

This is a sort of a review of the current state of the art on Multi Agent Pickup and Delivery. I recently stumbled upon some papers in this area, and this is my way of trying to get a grasp on the topic and its difficulties and aproaches to solve them.

# Problem setting

Multi agent pickup and delivery (MAPD) is closely related to Multi Agent Path Finding (MAPF). In some settings, MAPD could be seen as en extension to MAPF.

MAPF deals with path finding for multiple agents, typically it is the same agents, in a shared workspace. MAPD extends this problem to the assignment of tasks to agents, and finding paths for it. MAPD is also typically interested in settings with more tasks than agents, such that a planner needs to decide how to assign tasks to agents and in which order. Often, it is assumed that tasks are not know a priori, but have a ‘release time’, i.e., they only become known at a certain time during execution of other tasks. This is a realistic setting in warehouse scenarios.

Summarizing, the assumptions in most MAPD papers are:

- same agents, shared workspace
- not all tasks known initially

Before discussing some MAPD solvers, we will have a brief look at MAPF solvers, since many MAPD approaches use a MAPF solver for planning.

#### MAPF

As done in MAPD, MAPF assumes that there is only one type of robot, and they all act in a shared workspace. It is also assumed that there are no kinodynamic constraints (although that could be incorporated in most solvers).

the most common approaches for solving the MAPF problem are

- Priority Based Search: The plans are copmuted one-by-one for each agent, and are fixed once they are planned, and act as constraints for the following agents. In the generic case, this is incomplete, and suboptimal.
- Conflict Based Search: In conflict based search, one starts out by planing paths for each agent, without caring about the paths of other agents. Collisions are ignored! Then, the paths are analyzed for collisions with each other and constraints are introduced that lead to avoiding the collisions. That is, we can see the initial solution (where all collisions are ignored) as root node of a tree. If a collision in a path occurs with another path, two child-nodes are introduced that contain the constraints ‘agent i should not be at this location at time x’ and ‘agent j should not be here at time x’. Then one computes new plans with these new constraints, and repeats the process until no collisions occur. This approach is complete, and can find the optimal solution.

#### General approach to solving MAPD problems

For reasons that become clear later, I am much more interested in the planning part of the MAPD problem, and less interested in the assignment (for now).

As for **planning**:
As said above, MAPF methods play a crucial role in the MAPD setting.
The main difference to the single-goal case is that there are more goals coming after the forst motion planning problem is solved.
There are some relatively simple examples where an agent should deliberately go slow to reach the overall optimal makespan.

With the assumption of equal agents, and a shared workspace, one can build a graph and plan on this graph, while adding the constraints that no vertex can be doubly assigned, and edges can not be crossed by two agents at the same time.

# Papers

Some of the papers that I found are (ordered chronologically):

- Lifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks: Introduces the MAPD problem as such, and proposes two decoupled approaches based on MAPF solvers to solve ‘well formed’ MAPD instances. The core idea for both algorithms is a prioritized search that avoids previously planned paths. There is also a comparison to a centralized algorithm.
- Task and Path Planning for Multi-Agent Pickup and Delivery: Considers the setting where all tasks are known a priori, i.e., the ‘offline MAPD’ problem. Two solvers are proposed, a prioritized planning approach, and a hybrid path planning approach.
- A Multi-Label A* Algorithm for Multi-Agent Pathfinding: Tackles the online MAPD setting. An A* based search is intoduced (MLA*) that allows for path planning with multiple ordered goals. This is then used to replace the sequential A* calls in some of the previous works. This reduced the makespan on the benchmarks substantially.
- MS*: A New Exact Algorithm for Multi-agent Simultaneous Multi-goal Sequencing and Path Finding: M* for multi agent multi goal settings. Assumes a shared workspace, and equal agents, is able to compute the optimal solution.
- Lifelong Multi-Agent Path Finding in Large-Scale Warehouses*: Rolling horizon planning for MAPD, proposes two different approaches. The core idea is replanning regularly, and resolving collisions only for some (other) horizon length. MLA* is generalized to multi-goal sequences (compared to the pickup-delivery-only setting from above). The bounded horizon approach is implemented in a bunch of search algorithms (PBS/CBS/ECBS), and tested.
- Multi-Goal Multi-Agent Pickup and Delivery*: Uses priority based search.

Of course, there are many more papers around tackling the same problems, or slight variations thereof. All settings in the papers discussed are limited to the case of a 2d environment with equal agents, which is an assumption that is relatively limiting, but allows for efficient solvers in many cases.

# Connection to Multi Agent Task and Motion Planning

Similar to MAPD, in multi agent task and motion planning, tasks have to be assigned to robots as well, and paths have to be planned. There should be an obvious overlap betweeen the two areas of research.

I am first going to do a bit of self promotion here: In the work here, we proposed a greedy multi agent TAMP solver, and in the follow up paper here, we showed a route towards optimizing the resulting plans.

The approach is based on decomposing the task sequence, planning them separately (i.e. per robot, in the order they should be done), and take previously planned paths into account. While that works well, it is clearly suboptimal. I am interested in is how suboptimal this is, and to figure that out, I want to have an optimal planner (no matter how slow, or if it only works for a specific subset of TAMP settings). It turns out that the MAPD people do not have an optimal planner either.

The other very interesting thing would be the application of findings from the MAPD/MAPF community to the MA-TAMP case, and speed up the solution time. One of the difficulties here is that in the TAMP case, it might occur that something is infeasible if it is done in the wrong order, which does not happen in the MAPD setting.

# Outlook

I am interested in developing an optimal planner for a given task assignment. One possible approach would be using sampling based planners, similar to TMIT*. However, the approach taken there does not straightforwardly work for the MA-TAMP case for various reasons.