Hands-On Data Analysis with Pandas (1st edition)

Efficiently perform data collection, wrangling, analysis, and visualization using Python

740 pages

Cover image of Hands-On Data Analysis with Pandas (1st edition)*As an Amazon Associate, I earn from qualifying purchases at no cost to you.

Data analysis has become an essential skill in a variety of domains where knowing how to work with data and extract insights can generate significant value.

Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn. Using real-world datasets, you will learn how to use the powerful pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification, using scikit-learn, to make predictions based on past data.

By the end of this book, you will be equipped with the skills you need to use pandas to ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analysis across multiple domains.

See also: Hands-On Data Analysis with Pandas (2nd edition)



This is the book I would have wanted when I was learning. Many books use randomly-generated data, which makes it hard for the reader understand why the action shown is necessary and recognize when to use it on their own data. Using real-world datasets, this book covers data collection, data wrangling, data analysis, data visualization (with Matplotlib and Seaborn), and introduces machine learning concepts (with Scikit-Learn), all while focusing on writing quality code. You will learn not only how to use the pandas library, but also how to use Python in both a functional and object-oriented capacity, collect data from APIs, document code, set up a virtual environment, use git for version control, build Python packages, and create scripts that accept command line arguments.

It took nearly a year. I mostly worked on the book during weekends. I started the outline in August 2018 and worked about 20 hours a week on average. As the publication date approached, this grew to about 40 hours a week. It was a labor of love.

Not at all. However, while I was reading a lot of books to learn Python data science, I often thought about how I would do things differently (e.g., not use random data, show more realistic outcomes, incorporate computer science concepts, etc.). One day Packt reached out to me about writing Hands-On Data Analysis with Pandas, and I signed on – it was a way to give back to the community. I wrote the book that I would have wanted when I was learning.

I think the hardest part is organizing yourself and creating a detailed outline of the book. Early on in the writing process for my first book, I learned that a list of 3-5 main concepts for a chapter was nowhere near enough to come up with 30+ pages of content. I needed to further break down each of those concepts to make a detailed outline for each chapter. I also set micro-deadlines for each of those subsections to make sure I would make the deadlines I had set with my publisher.

I learned to work efficiently and to quickly switch focus between my full-time job and working on my book. This was something that I found really helpful while pursuing my Master's degree, as well.

I also learned when to ask for and accept help – I could not have finished this book by myself. A big part of this help was asking for feedback. I had to learn to accept criticism and not take it personally, which can be difficult, especially when you've spent so much time working on something.

Yes! I'm not ready to share what I'm working on, but I do want to write another book.