This post is for me to organize some of my thoughts on kinodynamic motion planning, and might serve as an entry to ‘parallels of motion planning to RL’.


Kinodynamic motion planning has the goal to find a trajectory that steers a (possibly nonlinear) system to a specified goal, ideally not colliding with the environment. Along the way, we might want to minimize some objective - common examples are minimizing time, or minimizing energy.

The problem can be written as:

\[\begin{aligned} \tau = &{\arg\min}_{x, u} J(x, u) \\ \text{with} \ \ & x_0 \ \text{given}\\ \text{s.t.} \ \ & x_{t+1} = f(x, u)\\ & x_t \in C_\text{free}\\ & u_t \in \mathcal{U}\\ & x_t \in \mathcal{X} \end{aligned}\]

where \(\tau\) is the trajectory, consisting of the states of the system \(x\) and the input \(u\).

There are various approaches to solve this problem: The ones that I know of, and am most familiar are RL, optimization, sampling based planning, and search.

Which method work best generally depends on the assumptions we make (i.e. what do we assume to know?), and what we really want as a result, as the output of the methods above is not exactly the sameRL gives us a policy (possibly a goal conditioned policy) for the whole space, Planning and Search give us trajectories, Optimization approaches can give us both policies (if we run it as a controller (e.g. MPC)), but also simply trajectories. . In this set of notes, I am focussing on sampling based methods.

Sampling based approaches

A simple approach to tackle the kinodynamic problem above, taking lots of inspiration from geometric RRTs is the one described in this paper:

In essence, a tree is grown by randomly sampling a state \(x_\text{rand}\) in the state space, and then extending the tree towards it. This ‘extend’ step consists of finding the ‘closest’ vertex \(x_\text{near}\) that is part of the tree, and applying a random input \(u\) for some timestep \(\Delta t\) that hopefully gives us a state \(x_\text{new}\) that is closer to the originally sampled state than the vertex we started with.

A typical approach for this is sampling multiple control inputs \(u\), and times \(\Delta t\), apply all of them, and take the one that gives the minimum distance to the state we sampled.

A crucial detail here is that we do only assume access to a function \(f\) which tells us how the system evolves for a given start state, input, and timestep, and not much elseThis is very similar to the assumptions that RL makes. .

Some work that followed this initial paper:

Most other methods are modifications and extensions of this simple approach.

As it turns out, the typically used euclidean distance metric is not a great match for RRT in kinodynamic settings (there are exceptions, of course). In the work here, a different way to measure the distance, and thus find the nearest neighbour(s) was proposed.

The works in here and here linearize the systems around the operating point to provide a better ‘extend’-functionality, and provide better optimality properties. Particularly, the distance function was improved, i.e., some principled way of computing the distance between two points in a space with differential constraints was proposed.

This work shows how to build a version of the algorithm above that works well for constrained systems, e.g. a parallel manipulator. In this case, it is not straightforward how to find valid control inputs.

The works here deals with moving obstacles, the one here with a dynamic environment, and the specific case of planning for a quadrotor.

The work in here works on the specific case of planning for a car, and is thus able to work with b-splines, instead of the ‘black-box’ function \(f\) for state-propagation. With this simplification, we can also extend the single tree-approach from the beginning to a bidirectional tree search. In the other methods, we were not able to this, since we only assume to have access to the forward propagationThere are systems where the reverse function can be computed from the forward propagation, but e.g. a system that involves pushing objects around, getting this function is close to impossible. .

Short side note: Control

Going a bit more to the control side, we can also build tree structures for planning with kinodynamic systems. LQR-trees give us a policy to steer an agent towards a goal by constructing a series of funnels that are linked together, and eventually steer the system towards the goal (and stabilize it there if so desired). This work is not only giving us a path, but a control policy.

The previous work is extended to graphs instead of trees here. This allows for reuse of previous controllers and computations.


The work presents a possible approach to extend this forward-propagation-of-states to graphs instead of only trees.

This would allow a multiquery planning approach. However, as done, since the edges need to be re-created for every start-state, and it was not evaluated on true multi-query problems, it is unclear how much of a benefit this brings. Further, there was no bidirectional search.

Allowing discontinuities

All the approaches above assume that we do not have a dicontinuity in the path itself. This limits us to applying an input to a specific start state (i.e. growing directly from the tree), or we would need to use an optimizer to solve a Two-Value-Boundary-Problem, which is generally computationally expensive.

db-A* allows for discontinuities in the solution, and runs an optimizer over the resulting trajectory to fix it. The implicit assumption is that if the discontinuity that we allow is small enough, we can fix this (‘steer a bit to the left’, ‘go a little faster’) to actually go to the starting position of the next state, respectively to move the next state to the point that we reach.


This here seems to be the first kinodynamic sampling based planner that runs in realtime.