Data science in my book Dancing with Python

posted in: Dancing with Python | 0

In my blog entry Quantum computing in my book Dancing with Python,” I covered what my book covers related to quantum computing. I also published the entry “Availability of my book Dancing with Python and its table of contents.”

Today, I want to specifically list what I discuss in the book in what I term “an extended definition of data science.” The core chapters are in Part III. Here are their titles, introductions, and chapter tables of contents:

III Advanced Features and Libraries

12 Searching and Changing Text

We represent much of the world’s information as text. Think of all the words in all the digital newspapers, e-books, PDF files, blogs, emails, texts, and social media services such as Twitter and Facebook. Given a block of text, how do we search it to see if some desired information is present? How can we change the text to add formatting or corrections or extract information?

Chapter 4, Stringing You Along, covered Python’s functions and methods. This chapter begins with regular expressions and then proceeds to natural language processing (NLP) basics: how to go from a string of text to some of the meaning contained therein.

12.1 Core string search and replace methods
12.2 Regular expressions
12.3 Introduction to Natural Language Processing
12.4 Summary

13 Creating Plots and Charts

Among mathematicians and computer scientists, it’s said that a picture is worth 210 words. Okay, that’s a bad joke, but it’s one thing to manipulate and compute with data, but quite another to create stunning visualizations that convey useful information.

While there are many ways of building images and charts, Matplotlib is the most widely used Python library for doing so. [MAT] Matplotlib is very flexible and can produce high-quality output for print or digital media. It also has great support for a wide variety of backends
that give you powerful mouse-driven interactivity. Generally speaking, if you have a coding project and you need to visualize numeric information, see if Matplotlib already does what you want. This chapter covers the core functionality of this essential library.

13.1 Function plots
13.2 Bar charts
13.3 Histograms
13.4 Pie charts
13.5 Scatter plots
13.6 Moving to three dimensions
13.7 Summary

14 Analyzing Data

While we can use fancy names like “data science,” “analytics,” and “artificial intelligence” to talk about working with data, sometimes you just want to read, write, and process files containing many rows and columns of information. People have been doing this interactively for years, typically using applications like Microsoft Excel® and online apps like Google Sheets™.

To “programmatically” manipulate data, I mean that we use Python functions and methods. This chapter uses the popular pandas library to create and manipulate these collections of rows and columns, called DataFrames. [PAN] [PCB] We will later introduce other methods in Chapter 15, Learning, Briefly. Before we discuss DataFrames, let’s review some core ideas from statistics.

14.1 Statistics
14.2 Cats and commas
14.3 pandas DataFrames
14.4 Data cleaning
14.5 Statistics with pandas
14.6 Converting categorical data
14.7 Cats by gender in each locality
14.8 Are all tortoiseshell cats female?
14.9 Cats in trees and circles
14.10 Summary

15 Learning, Briefly

Machine learning is not new, but it and its sub-discipline, deep learning, are now being used extensively for many applications in artificial intelligence (AI). There are hundreds of academic and practical coding books about machine learning.

This final chapter introduces machine learning and neural networks primarily through the scikit-learn sklearn module. Consider this a jumping-off point where you can use the Python features you’ve learned in this book to go more deeply into these essential AI areas if they interest you.

15.1 What is machine learning?
15.2 Cats again
15.3 Feature scaling
15.4 Feature selection and reduction
15.5 Clustering
15.6 Classification
15.7 Linear regression
15.8 Concepts of neural networks
15.9 Quantum machine learning
15.10 Summary

This book is an introduction, so my goal is to get you started on a broad range of topics. For example, here are the Python modules and packages discussed or used in each of the four chapters in Part III:

12 Searching and Changing Text: re, flashtext, spacy
13 Creating Plots and Charts: matplotlib, numpy, mpl_toolkits.mplot3d
14 Analyzing Data: pandas, numpy, matplotlib, squarify, matplotlib-venn
15 Learning, Briefly: sklearn, pandas, numpy

I mention in passing in the book several other packages, such as pytorch, as pointers for further exploration. I did not include in the list above standard modules such as math, random, and sys.