Serious Data Science datasets
--
Excellent data sets for Data Science. Run away from flowers and sank boats!
· Intro..
· Papers with code
· Google Cloud Public Datasets
· In a nutshell..
Intro..
In this post I want to explore 2 sources of public datasets, which can be used for many purposes. For example, we can we them to prepare examples and Proof-of-concepts for data consulting projects, preparing trainings, and promoting data literacy.
This is very useful, as most of the datasets you can find there have actual business meaning and interest. Also, if you’re a beginner in the Data world, I’d recommend using that sort of dataset for your auto-training, rather than the far most popular sets about flowers and sank boats that we see in most Data Science online tutorials and examples.
After this caustic comment, let’s go straight to the point.
Papers with code
Here is the link: Papers with code
We can find research papers with the related datasets. Often topics are really complex, so we can use to learn the methodologies, and apply them to the data sets. The data are nicely classified by type (Images, Tabular, Text,..) so that pretty much all sort of Data Science application can be built on them.
For example, to get the Tabular “Adult Data Set”
Just select which paper you want the dataset from, and in the Code section you can find the data repository links (usually Git-Hub) to download the data completely free.
Notice that you can directly download also the Paper.
Google Cloud Public Datasets
Here is the link: Google Cloud Public Datasets
Let’s see an example of how to use it to extract datasets.