Based on the OpenCabinet project by Holden Grissett, this project is a variation of the default final project that attempts to use the PandaOmron robot in the RoboCasa environment to open cabinet doors.
The project tries to improve on the imitation learning model that is trained by the neural network using an incremental approach.
Opening cabinet doors is a difficult task for a few reasons:
★ The state representation is small, even in the augmented dataset
★ There are over 2500 possible kitchen layouts the robot could encounter
★ The reward is sparse (either 0 or 1, no partial credit)
For this reason, the goal and success criteria was not to open the cabinet door, but to see how much we could improve using a low-dimensional state representation just by tweaking the policy, and how much improvement came from each step.
The baseline MLP has a success rate of 0%, so I wanted to improve it.
The improvement plan consisted of:
★ Augmenting the dataset with additional state information
★ Adding temporal context, so the model sees previous states during training (6a)
★ Adding action chunking, so the model predicts a sequence of future actions instead of one action (6b)
★ Adding diffusion to the MLP policy (6c)
★ Switching the policy from MLP to a 1-D convolutional U-Net (6d)
Not good. Even U-Net couldn't pass a single test case.
Training this policy takes around THREE HOURS so making small tweaks and trying again is very infeasible.
Using a low-dimensional state representation is not so good for this task.
Modifications in the input state such as using visual data may have helped the robot gain much better results.
But we at least learned what model works best, and that is definitely the U-Net with Diffusion (6d), though MLP with action chunking (6b) had comparable results.
U-Net with Diffusion moves towards the handle at first, but then goes off task entirely.
There is not much we can do to help without completely overhauling the input state structure!
Note that policy 6a has a bug that makes it move backwards. This is because when first adding temporal context, the prior states were clones of the first state, and not zero-padded. In future policies, they are zero-padded.
These are the results of the project. Policies 6d and 6b seem to show the most improvement, even if none of them was able to open the door.
Random Actions
Baseline MLP
MLP with Temporal Context (6a)
MLP Temporal + Action Chunking (6b)
MLP with Diffusion (6c)
1D U-Net with Diffusion (6d)