Hands-On Data Analysis with Pandas (2nd edition)

A Python data science handbook for data collection, wrangling, analysis, and visualization

788 pages

Cover image of Hands-On Data Analysis with Pandas (2nd edition)*As an Amazon Associate, I earn from qualifying purchases at no cost to you.

Data analysis has become an essential skill in a variety of domains where knowing how to work with data and extract insights can generate significant value. Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with the Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn.

Using real-world datasets, you will learn how to use the pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data.

This updated edition will equip you with the skills you need to use pandas 1.x to efficiently perform various data manipulation tasks, reliably reproduce analyses, and visualize your data for effective decision making—valuable knowledge that can be applied across multiple domains.




Cover image for the Korean translation of Hands-On Data Analysis with Pandas (2nd edition)


Cover image for the Chinese translation of Hands-On Data Analysis with Pandas (2nd edition)


This is the book I would have wanted when I was learning. Many books use randomly-generated data, which makes it hard for the reader understand why the action shown is necessary and recognize when to use it on their own data. Using real-world datasets, this book covers data collection, data wrangling, data analysis, data visualization (with Matplotlib and Seaborn), and introduces machine learning concepts (with Scikit-Learn), all while focusing on writing quality code. You will learn not only how to use the pandas library, but also how to use Python in both a functional and object-oriented capacity, collect data from APIs, document code, set up a virtual environment, use git for version control, build Python packages, and create scripts that accept command line arguments.

The first edition took nearly a year, and the second edition took around nine months.

Not at all. However, while I was reading a lot of books to learn Python data science, I often thought about how I would do things differently (e.g., not use random data, show more realistic outcomes, incorporate computer science concepts, etc.). One day Packt reached out to me about writing Hands-On Data Analysis with Pandas, and I signed on – it was a way to give back to the community. I wrote the book that I would have wanted when I was learning.

I think the hardest part is organizing yourself and creating a detailed outline of the book. Early on in the writing process for my first book, I learned that a list of 3-5 main concepts for a chapter was nowhere near enough to come up with 30+ pages of content. I needed to further break down each of those concepts to make a detailed outline for each chapter. I also set micro-deadlines for each of those subsections to make sure I would make the deadlines I had set with my publisher.

I learned to work efficiently and to quickly switch focus between my full-time job and working on my book. This was something that I found really helpful while pursuing my Master's degree, as well.

I also learned when to ask for and accept help – I could not have finished this book by myself. A big part of this help was asking for feedback. I had to learn to accept criticism and not take it personally, which can be difficult, especially when you've spent so much time working on something.

Yes! I'm not ready to share what I'm working on, but I do want to write another book.