9 min readJul 23, 2021
Top 350 Open Datasets for Data Science and Machine Learning
Below is a list of 350 Open Datasets and Data Visualization for Data Science, AI, ML.
- Latest complete Netflix movie dataset
- Common Crawl
- Dataset on protein prices
- CPOST dataset on suicide attacks over four decades
- Credit Card Dataset — Survey of Consumer Finances (SCF) Combined Extract Data 1989–2019
- Drone imagery with annotations for small object detection and tracking dataset
- NOAA High-Resolution Rapid Refresh (HRRR) Model
- Registry of Open Data on AWS
- Textbook Question Answering (TQA)
- Harmonized Cancer Datasets: Genomic Data Commons Data Portal
- The Cancer Genome Atlas
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
- Genome Aggregation Database (gnomAD)
- SQuAD (Stanford Question Answering Dataset)
- PubMed Diabetes Dataset
- Drug-Target Interaction Dataset
- Pharmacogenomics Datasets
- Pancreatic Cancer Organoid Profiling
- Africa Soil Information Service (AfSIS) Soil Chemistry
- Dataset for Affective States in E-Environments
- NatureServe Explorer Dataset
- Flight Records in the US
- Worldwide flight data
- 2019 Crime statistics in the USA
- Yahoo Answers DataSets
- History of America 1400–2021
- Persian words phonetics dataset
- Historical Air Quality Dataset
- Stack Exchange Dataset
- Awesome Public Datasets
- Agriculture Dataset
- Biology Dataset
- Climate and Weather Dataset
- Complex Network Dataset
- Computer Network Dataset
- CyberSecurity Dataset
- Data Challenges Dataset
- Earth Science Dataset
- Economics Dataset
- Education Dataset
- Energy Dataset
- Entertainment Dataset
- Finance Dataset
- GIS Dataset
- Government Dataset
- Healthcare Dataset
- Image Processing Dataset
- Machine Learning Dataset
- Museums Dataset
- Natural Language Dataset
- Neuroscience Dataset
- Physics Dataset
- Prostate Cancer Dataset
- Psychology and Cognition Dataset
- Public Domains Dataset
- Search Engines Dataset
- Social Networks Dataset
- Social Sciences Dataset
- Software Dataset
- Sports Dataset
- Time Series Dataset
- Transportation Dataset
- eSports Dataset
- Complementary Collections
- Categorized list of public datasets: Sindre Sorhus /awesome List
- Platforms
- Programming Languages
- Front-End Development
- Back-End Development
- Computer Science
- Big Data
- Theory
- Books
- Editors
- Gaming
- Development Environment
- Entertainment
- Databases
- Media
- Learn
- Security
- Content Management Systems
- Hardware
- Business
- Work
- Networking
- Decentralized Systems
- Higher Education
- Events
- Testing
- Miscellaneous
- Related
- US Department of Education CRDC Dataset
- Nasa Dataset: sequencing data from bacteria before and after being taken to space
- All Trump’s twitter insults from 2015 to 2021 in CSV.
- Data is plural
- Global terrorism database
- The dolphin social network
- Dataset of 200,000 jokes
- The Million Song Dataset
- Cornell University’s eBird dataset
- UFO Report Dataset
- CDC’s Trend Drug Data
- Health and Retirement study: Public Survey data
- RAND HRS Data
- Gateway Harmonized Data
- Contributed and Replication Data
- Restricted/Sensitive Data
- The Quick Draw Dataset
- Air Quality Dataset
- UK Water Industry Chemical Investigations dataset
- M3 and M4 Dataset Time Series Data
- Protein Data Bank (PDB)
- Dataset of Games
- DonorsChoose.org Application Screening DataSet
- Dataset of all the squirrels in Central Park
- Google BigQuery Public Datasets
- IMDb Dataset
- PHOnA: A Public Dataset of Measured Headphone Transfer Functions
- Sports Data Set
- Kaggle DataSets
- Coronavirus Datasets
- Natural History Museum in London
- TSA Throughput Dataset (alternate source)
- Data Planet
- Chess datasets
- ML Dataset to practice methods of regression
- ManyTypes4Py: A benchmark Python Dataset for Machine Learning-Based Type Inference
- Quadrature magnetoresistance in overdoped cuprates
- The UMA-SAR Dataset: Multimodal data collection from a ground vehicle during outdoor disaster response training exercises
- Child Mortality from Malaria
- Quora Question Pairs at Data.world
- MIMIC Critical Care Database
- Data.Gov: The home of the U.S. Government’s open data
- Tidy Tuesday Dataset
- US Census Bureau: QuickFacts Dataset
- Classical Abstract Art Dataset
- Interactive map of indigenous people around the world
- DataOhio
- National Household Travel Survey (US)
- National Travel Survey (UK)
- National Travel Survey (NTS)[Canada]
- ENTUR: NeTEx or GTFS datasets [Norway]
- The Swedish National Forest Inventory
- Large data sets from finance and economics applicable in related fields studying the human condition
- Our world in Data: International Trade
- International Historical Statistics (by Brian Mitchell)
- World Input-Output Database
- Correlates of War Bilateral Trade
- World Bank Open Data — World Development Indicators
- World Trade Organization — WTO
- SMOKA Science Archive
- Graph Datasets
- Multi-Domain Sentiment Dataset
- A Global Database of Society
- The Yahoo News Feed: Ratings and Classification Data
- Other Datasets
- Power and Energy Consumption Open Datasets
- The Million Playlist Dataset (Spotify)
- Regression Analysis Cheat Sheet
- Hotel Reviews Dataset from Yelp
- Motorcycle Crash data
- Natural Disasters — Free News Intelligence Dataset
- World Population Data by Country and Age Group
- Investment-Related Dataset with both Qualitative and Quantitative Variables
- National Obesity Monitor
- The World’s Nations by Fertility Rate 2021
- Total number of deaths due to Covid19 vis-à-vis Population in million
- Google searches for different emotions during each hour of the day and night
- Where do the world’s CO2 emissions come from? This map shows emissions during 2019. Darker areas indicate areas with higher emissions
- Global Linguistic Diversity
- Where in the world are the densest forests? Darker areas represent higher density of trees.
- Likes and Dislikes per movie genre
- Global Historical Climatology Network-Monthly (GHCN-M) temperature dataset
- Python Cheat Sheet
- Data Sciences Cheat Sheet
- Panda Cheat Sheet
- Electric power consumption (kWh per capita)
- Alcohol-Impaired Driving Deaths by State & County [US]
- % change in life expectancy from 2020 to 2021 across the globe
- How Many Years Till the World’s Reserves Run Out of Oil?
- Which energy source has the least disadvantages?
- Human development index (HDI) by world subdivisions
- US Streaming Services Market Share, 2020 vs 2021
- Number of tweets deleted by month
- Football/Soccer Leagues with the fairest distributions of money have seen the most growth in long-term global interest.
- How Much Does Your Favorite Fast Food Brand Spend on Ads?
- Historical population count of Western Europe
- Results from survey on how to best reduce your personal carbon footprint
- Where does the world’s non-renewable energy come from?
- Recorded Music Industry Revenues from 1997 to 2020
- US Trade Surpluses and Deficits by Country (2020)
- Facebook Monthly Active Users
- Heat map of the past 50,000 earthquakes pulled from USGS sorted by magnitude
- Where do the world’s methane (CH4)emissions come from?
- Earth Surface Albedo (1950 to 2020)
- Wealth of Forbes’ Top 100 Billionaires vs All Households in Africa
- 20 years of Apple sales in a minute
- The Price of Dogecoin $DOGE Since 2020 in 34 seconds.
- Metro areas in the EU by GDP (Post Brexit)
- Racial Diversity of Each State (Based on US Census 2019 Estimates)
- A curated, daily feed of newly published datasets in machine learning
- Machine Learning: CIFAR-10 Dataset
- A curated, daily feed of newly published datasets in machine learning
- Machine Learning: ImageNet
- Machine Learning: The MNIST Database of Handwritten Digits
- The Massively Multilingual Image Dataset (MMID)
- Capitol insurrection arrests per million people by state
- How have cryptocurrencies done during the Pandemic?
- Share of US Wealth by Generation
- Top 100 Cryptocurrencies by Market Cap
- Crypto race: DOGE vs BTC, last 365 days
- 12,000 years of human population dynamics
- Countries with a higher Human Development Index (HDI) than the European Union (EU)
- Countries with a higher Human Development Index (HDI) than the United States (US)
- Child marriage by country, by gender
- Wars with greater than 25,000 deaths by year
- Population Projection for China and India till 2050
- Relative cumulative and per capita CO2 emissions 1751–2017
- Formula 1 Cumulative Wins by Team (1950–2021)
- Countries with the most nuclear warheads. A couple of days ago I posted this with a logarithmic scale.
- Using machine learning methods to group NFL quarterbacks into archetypes
- 2M rows of 1-min S&P bars (12 years of stock data) — 2008–2021
- A global database of COVID-19 vaccinations
- A list of available datasets for machine learning in manufacturing
- Predictive Maintenance and Condition Monitoring
- Process Monitoring
- Predictive Quality and Quality Inspection
- Process Parameter Optimization
- Data Analytics Certification Questions and Answers Dumps
- Datasets needed for Crop Disease Identification using image processing
- Survival Analysis datasets for machines
- Survival Analysis datasets for machines
- English alphabet organized by each letter’s note in ABC
- Create, maintain, and contribute to a long-living dataset that will update itself automatically across projects.
- Human Rights Measurement Initiative Datasets
- World Wide Energy Production by Source 1860–2019
- Project Sunroof — Solar Electricity Generation Potential by Census Tract/Postal Code
- Carbon emission arithmetic + hard v. soft science
- What Does 1GB of Mobile Data Cost in Every Country?
- Key Concepts of Data Science
- Project CodeNet is a large dataset aimed at teaching AI to code.
- NSRDB: National Solar Radiation Database
- Cheat Sheet for Machine Learning, Data Science.
- Emigrants from the UK by Destination
- US Rivers and Streams Dataset
- Bubble Chart that compares the GDP of the G20 Countries
- Desktop OS Market Share 2003–2021
- National Parks of North America
- Inflation of Bitcoin and DogeCoin vs. Federal Reserve target
- Percentage of women who experienced physical or sexual violence since the age of 15 in the EU
- Canadian Interprovincial Migration
- Covid-19 Vaccination Doses Administered per 100 in the G20
- Import/Export of Conventional Arms by Different Countries over past 2 decades
- Aggregated disease comparison dataset — Ensemble de données agrégées de comparaison des maladies
- Trending Google Searches by State Between 2018 and 2020 — Tendances des recherches Google par État
- Market capitalization in billion dollars of Top 20 Cryptocurrencies in 2021–05–20 — crypto-monnaies
- Top Chess Players From 2000–2020 — Meilleurs joueurs d’échecs — Лучшие шахматисты с 2000 по 2020 год
- Comparing Emissions Sources — How to Shrink your Carbon Footprint More Effectively
- Oil and gas-fired power plants in the world –
- Top 100 Reddit posts of all time
- Fastest routes on land (and sometimes, boat) between all 990 pairs of European capitals
- Pokemon Dataset
- 30×30 m Worldwide High-Resolution Population and Demographics Data
- Gridded global datasets for Gross Domestic Product and Human Development Index over 1990–2015
- Decrease in worldwide infant mortality from 1950 to 2020
- Countries of the world sorted by those that have warmed the most in the last 10 years, showing temperatures from 1890 to 2020
- Climate change concern vs personal spend to reduce climate change
- The Illusion of Choice in Consumer Brands
- Yearly Software Sales on PlayStation Consoles since 1994
- Yearly Hardware Sales of PlayStation Consoles since 1994
- Mass Transit Use in America
- Cybertruck vs F150 Lightning pre-orders, by time since debut
- Top 100 Most Populous City Proper in the world
- Tax data for different countries
- What do Europeans feel most attached to — their region, their country, or Europe?
- Cost of 1gb mobile data in every country
- Frequency of all digrams in 18 languages, diacritics included
- Mapped: The World’s Nuclear Reactor Landscape
- Database of 999 chemicals based on liver-specific carcinogenicity
- SMS Spam Collection Data Set
- Open Datasets for Autonomous Driving
- Open Dataset people are looking for [Help if you can]
- Cars for sale in Germany from 2011 to 2021
- Percentage of female students in higher education by subject area
- All the passes: A visualization of ~1 million passes from 890 matches played in major football/soccer leagues/cups
- Global “Urbanity” Dataset (using population mosaics, nighttime lights, & road networks
- Percentage of students with disabilities in higher education by subject area
- Arrests for Hate Crimes in NYC by Category, 2017–2020
- The Most Successful U.S. Sports Franchises
- Adult cognitive skills (PIAAC literacy and numeracy) by Percentile and by country
- G7 Corporate Tax rate 1980–2020
- Euro 2020 (played in 2021) Group Stage Predictions Based of a Bayesian Linear Item Response Model
- Animated demographic pyramid of Italy 1982–2021
- The 15 most shared musicians on Reddit
- Spam vs. Legitimate Email, Average Global Emails per Day
- Falling Fertility, 1800–2016
- Europe Covid-19 waves
- Population Density of Canada 2020
- The portion of a country’s population that is fully vaccinated for COVID (as of June 2021) scales with GDP per capita.
- Dataset of Chemical reaction equations
- Maths datasets
- SQL Queries Dataset
- Countries of the world, ranked by population, with the 100 largest cities in the world marked
- Top 10 World’s most livable cities in 2021
- What businesses in different countries search for when they look for a marketing agency — “creative” or “SEO”?
- Is the economic gap between new and old EU countries closing?
- Reddit r/wallstreetbets posts and comments in real-time
- Global NO2 pollution data visualization June 2021
- Shopify App Store Report: 2021
- The Chrome Webstore Report: 2021
- Percentage of Adults with HIV/AIDS in Africa
- Recorded CDC deaths (2014 — June 16, 2021) from Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)
- What are the long term gains on cryptocurrencies?
- Life Expectancy and Death Probability by Age and Gender
- Daily Coronavirus cases in Canada vs % of Population Vaccinated
- Google Playstore Apps with 2.3million app data on Kaggle
- African languages dataset
- Daily Temperature of Major Cities Dataset
- Do stricter gun laws reduce firearms homicides?
- Relative frequency of words in economics textbooks vs their frequency in mainstream English (the Google Books corpus)
- Hours per day spent on mobile devices by US adults
- Environmental Impact of Coffee Brewing Methods
- Murders in major U.S. Cities: 2019 vs. 2020
- New Harvard Data (Accidentally) Reveal How Lockdowns Crushed the Working Class While Leaving Elites Unscathed
- Support for same-sex marriage by religious group
- Daily chance of dying for Americans
- Mapping Global Carbon Emission Intensity (Dec 2020)
- IPO Returns 2000–2020
- IPO Returns 2000–2020
- IPO Returns 2000–2020
- Number of Miss Americas by U.S. State
- The World’s Nuclear Warheads
- The population of Las Vegas over time
- The Alpha to Omega of Wikipedia
- Glacial Inter-glacial cycles over the past 450000 years
- Top Companies Contributing to Open Source — 2011/2021
- Crime Rates in the US: 1960–2021
- A network visualization of privacy research (83k nodes, 462k edges)
- GDP (at purchasing power parity) per capita in international dollars
- Phone Call Anxiety dataset for Millennials and Gen Z
- Hate Crime Statistics in New York State 2019–2021
- Net FDI (Foreign Direct Investment) Flows of different countries (across 1970–2019 )
- The “Face Image Meta-Database” (fIMDb)
- Dataset on (historical) books by year of writing
- Trash Dataset for waste detection
- Social Security Administration (SSA) Open Government Select Datasets
- Reddit Self-reported Depression Diagnosis (RSDD) dataset
- BMI (Body Mass Index) dataset
- China’s CO2 emissions almost surpass the G7
Source: enoumen.com