Optimising MiniGPT with Supervised Learning and RL Constraints