In this quarter project, we are interested in a paper that has not been accepted by conference but available on ArXiv: Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. Its goal is to achieve better visual-inertial odometry given sequential visions from two camera. In terms of implementation, we need to embed some TensorFlow codes in Keras in order to combine the inputs from two cameras as well as Inertial data. For more details, this paper uses the conclusion from FlowNet to combine two sequential camera inputs, and an LSTM unit to deal with Inertial data. We basically follow all the instructions in the paper; unfortunately, our performance is still worse than what original authors claimed.