DM2S2: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention