Your first Data Analysis in 4 uncomplicated steps: datasets, softwares and resources here!

Andrea Leonel - Data Analyst
7 min readJun 11, 2022

--

I recently wrote an article about the 7 things really worth dedicating your time and energy to if you’re on the search for a Data Analyst job. One of the key things I mentioned there is that you don’t need to have a formal job or a Coursera certification to be a Data Analyst, you just need to start analysing data!

When I was starting out, I really struggled to get my head around where to get datasets from, which software to use and even what to do with my analysis. With that in mind, this article will make it very easy and simple for you to find free datasets online, download the software you need, get some basic skills to start with and publish your stories online.

Step 1: Let’s download some data

The biggest challenge of a solo Data Analyst is access to interesting datasets. In fact, one of the reasons why I decided to go from freelancing to being an employed Data Analyst was the ability to work with larger datasets and answer big business questions.

But it doesn’t mean that you can’t find free interesting datasets online. Below, I’ll list out some cool sources of free datasets to get you started.

Data sources for you to create thought-provoking stories

Firstly, my favourite data source of all time: Information Is Beautiful

Information Is Beautiful: from random, fun topics to more serious stuff!

What I love about them is that they publish datasets on random topics that allow to create fun data stories. For example, they have a dataset on Which Country Eats the Most? or the Top 500 Passwords. They also cover more serious topics like Diversity in Tech and Russian Gas and Oil.

Other equaly interesting data sources:

  • Kaggle: this is actually a very popular source for datasets and it does feature a variety of topics. The only thing I’d be aware of is that some of their datasets can go a bit viral and then you get a million Medium articles / Github repositories about the sama dataset. If you want to stand out for employers, maybe try one of the less known data sources below. Otherwise, Kaggle still has a lot to offer to help you practice your data skills.
  • Our Wold in Data: this source focus on datasets related to wider societal issues like health, politics, human rights, economy, etc. They have a vast selection of topics to choose from and their data is actually very robust and of high quality — something I find to be an issue with Kaggle at times, by the way. If you fancy looking into the world’s problems, you’ll have fun here!
  • Inside Airbnb: this organisation extracts and publishes fairly robust data from Airbnb by city. I actually wrote a data story analysing their Napoli dataset and I can tell you there’s a lot of interesting insight you can get from it. Plus, Airbnb is a big player in the travel industry, so if you’d like to wow employers in that sector, I’d say this is a must do.
  • UK Gov: I included this source in the list just to remind you to also check your local government to see what they make available in terms of data. The UK, for example, offers a good amount of data on topics like Business, Environment, and Transport.
  • MusicBrainz: for the music lovers out there, I can definitely relate to how hard it is to find good quality, interesting music data for free online. This website actually built a relational database of music metadata. I know it’s not as exciting as Spotify data, but a good start if you’re not ready to start fumbling with APIs.

Step 2: Download the right software for your analysis

Software for analysing data:

Let me debunk a myth here straight away: you can absolutely create amazing data stories using just Excel (yes, including amazing visualisations, check out this article I wrote on how to pimp up your charts).

So, if you have no SQL or Python skills, don’t wait until you do to start analysing data!

She’s like: I only work with Excel and I’m doing just fine with it! Source: Katerina Holmes on Pexels

Now, if your goal is to improve your SQL skills, the software I would personally recommend to start with is MySQL Workbench. When I was starting out, I faffed about with Azure and other softwares that were highly recommended online, but they were too hard to set up because of all the functionalities they had. All I really needed was a workspace to practice my SQL skills, so MySQL Workbench worked well for me.

For Python, I’m going to be honest and say I’m not great at it. I only use Python when I fumble with the Spotify API and for this sort of task, Jupyter notebooks serve me well. However, if there’re any Python heads amongst my readers, please leave any tips for starting softwares in the Responses!

Software for data visualisation:

We’re lucky enough to live in a world where Tableau offers a free version for you to play around with. And before you start typing the Coursera website on your browser to find a certification, I would recommend to just download it straight away and learn on the go.

A beautiful Tableau dashboard using the famous Covid dataset: later on I’ll mention a guided analysis that uses this very same dataset for you to practice with. Source: Clay Banks on Unsplash.

PS: While I do believe beginners overestimate certifications, I do secretely dream to complete the official Tableau Certified Data Analyst certification. I just love how much you can create with Tableau — check out their Viz of the Day page with some examples of that.

Having said that, again, if you don’t feel like learning a new data visualisation tool, Excel is more than enough to get you started.

Step 3: Skills and inspiration

I can understand that it may be daunting to start using Excel or SQL with no previous knowledge of them. Here are some tips to help you mitigate that.

Certifications (with a huge caveat)

Ok, I’ll give some credit to certifications helping you to get the basic skills you may need. But I would just advise you not to wait until you finish your course to start doing your own analysis. You can follow the steps above and put the things you learn into practice as you go through the modules. You see, the problem with certifications is that they often don’t teach you curiosity and how to investigate the data. They don’t make you think on your own, only practical work can do that.

Guided Analysis

I love the idea of guided analysis because you do have someone teaching you the technical part but, at the same time, you have the ability to explore the data on your own once you start feeling more confident. I’m a huge advocate of Alex the Analyst’s Portfolio Project series: he’s got one for Excel, SQL, Tableau and more.

Guided analysis is a great way to get you started even if you have little to no technical skills. Source: Alex the Analyst

Good old Google and Stack Overflow

I work with some very experienced Data Analysts and Scientists and let me tell you that they still google stuff when they run into issues. Usually, I just google something like “Power BI how to create an IF clause on DAX”. However, Stack Overflow is a great community to have your questions answered too (PS: I love their new tagline “Every data scientist has a tab open to Stack Overflow”. It’s so true!)

Step 4: What to do with your analysis

Obviously, you may analyse a dataset just for fun (I can’t be the only one who does that, right?) and it may never see the light of day. That’s ok. But if you’re analysing data to build portfolio or to create content around it, I have some tips for you.

What? You don’t hang out with your friends analysing datasets? Source: Koolshooters on Pexels.

Sharing your coding

There’s a million ways you can make your coding available online. I’m just going to share one of the ways in which you can do that which I found to be pretty straight forward: Github.

When I was building portfolio, I would do my analysis on MySQL Workbench and upload the code to a Github repository. I would then make sure there was a link to my Github page on LinkedIn and on my CV too. This is enough to show your technical skills to employers.

Showcasing storytelling skills

I love to write, so my go-to method to make my data stories visible online was to write them on Medium, like the Airbnb in Napoli story. But this is where you can let your imagination go wild!

Let’s say you like creating visualisations. You could create an account on Instagram to showcase your skills. If you like telling data stories, maybe a Youtube channel or a podcast could be a good option. There’re a lot of opportunities to not only showcase your storytelling skills but also to stand out from other Data Analysts out there and show off your soft skills.

Finally, a note on personal websites: look, I like them. I have been meaning to build one for myself so I could have a one-stop shop for all the different projects I have. Are they absolutely necessary for your portfolio? Absolutely not. It does make you look more professional, but I feel like there’re other equaly good ways to stand out, like I mentioned above.

Final advice:

I genuinely hope this article motivates you to roll up your sleeves and start doing your first analysis. When I was starting out, I was so scared of feeling stupid or like I wasn’t ready yet. But honestly, you’ll learn so much from experimenting with data.

Just don’t put pressure on yourself to generate insight that will solve all of the world’s problems and have fun with it!

--

--

Andrea Leonel - Data Analyst

A Data Analyst, a music lover and a full-time traveler walk into a bar.