Towards Flexible Multi-modal Document Models