Datasets
Sentiment Analysis Dataset
Large Movie Review DatasetThis is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for …
Stanford Question Answering Dataset
A reading comprehension dataset, consisting of questions posed on a set of Wikipedia articles, where the answer to every question is a span of text.
OpenSubTitles
Contains translated movie subtitles for over 60 languages62 languages, 1,782 bitextstotal number of files: 3,735,070total number of tokens: 22.10Gtotal number of sentence fragments: 3.35G
MURA (musculoskeletal radiographs) bone X-rays
MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal.Musculoskeletal conditions affect more than 1.7 billion people …
Open Images V4
Open Images is a dataset of ~9M images that have been annotated with image-level labels and object bounding boxes.The training set of V4 contains 14.6M bounding boxes for 600 object …
Stanford Cars Dataset
The Cars dataset contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images, where each class has been split roughly …