You control whether you walk straight or slouch,

Juan morris
6 min readNov 22, 2020

I’ve purposefully chosen a categorical variable (i.e. make) to highlight that you need to do some extra work to convert it into a machine-readable format (a.k.a. numbers!). There are several ways to do that such as with Label Encoder or One Hot Encoder — both available in sklearn module. But I’m going with the classical “dummy variable” approach, which converts categorical features into numerical dichotomous variables (0s and 1s).
If you face a bad day with a good attitude, it can still be a meaningful one. Posture holds power. While you can overdo the “fake it till you make it” of how you carry yourself, making an effort to not collapse in the face of adversity — both mentally and onto the couch — is empowering.
You’ve got the intuition with a simplified example of how multiple regression makes prediction of the price of a used car based on two features: horsepower and high-way mpg.

Now comes the moment of truth — how well does the model perform? There are many ways to evaluate model performance but in classical statistics, the performance of linear regression models is evaluated with R² — which gives a value between 0 and 1, and the higher the R² the better the model.If, however, you do what’s meaningful in every situation, even failure will have a purpose. Failing will still be painful, but your perspective will never feel “empty,” and you’ll always have reason to look forward to the future.there are a few ways to choose variables for the model. Forward selection and backward elimination are two of them. As the names suggest, in this process you add or remove one variable at a time and check mode performance;

Now it’s time to show how multiple regression works in data scientists’ notebooks. First I will give an intuition with a fire-drill and then will dive a bit deeper. If you want to follow along, you can do so by downloading the automobile dataset from the UCI Machine Learning Repository.

We want to predict price, so the dependent variable is already set. Now comes which features to use for prediction. Ideally I’d include all of the features in the initial model, but for this demo I’m choosing 1 categorical feature (i.e. make) and two numeric features (i.e. horsepower and highway-mpg) which I think most people would care about while choosing a car.

My ikigai is hard to define because it’s in all the small things. My normal day, before the coronavirus, was standing on a train with sweaty people who played their music too loudly. But it never got me down as I loved trying to work out the stories of the other people and what brought them joy. Now I miss my commute.

“He could’ve said that in a few paragraphs!” Well, he did. The book is based on a viral Quora answer Peterson wrote. But a post on a website does not hold the same power as a book full of stories. It’s true: Most self-help books are too long. But through their packaging, they can do a better job of spreading and delivering a message than any blog post ever can.

Like his book, Peterson is a controversial figure. I’m not here to discuss his politics, his logic, or his views on our culture. I’m here to learn. I only have “a few paragraphs,” but this is how I interpret his 12 lessons.

(I’m not a car expert so I really don’t know what some of those columns represent, but in real-world data science, this is not a good excuse, some domain knowledge is essential).

In our demonstration we are getting a R² value of 0.81, meaning 81% of the variation in the dependent variable (i.e. used car price) can be explained by the three independent variables (i.e. make, horsepower and highway-mpg). Of course, this metric can be improved by including more variables and trying with different combinations of them and by tuning model parameters — the topic of a separate discussion.

Still, I’m just bumbling my way through life and enjoying the present. My life could be completely different in five years, but that’s for older me to work out. Being mindful of my ikigai is not letting the good slip through my fingers because I’m too busy reaching for perfect.

But the world we live in is complex and messy. Each of the steps I’ve shown above will need to be branched out further. For example, in Step 2 we imported data and assigned features to X, y variables, without doing any further analysis. But who doesn’t know that data wrangling alone can take upwards of 80% of all tasks in any machine learning project?

I did few other things as part of data exploration and preparation which I’m skipping here (such as checking and converting data types, removing a couple of rows that had symbols like ‘?’ etc.), but suffice to reiterate my previous point that getting the data in the right format can be a real pain in the neck.

This is one of Jordan Peterson’s 12 Rules for Life, and, like all of them, it’s common sense that somehow still stabs you right in the heart. We’re great at ignoring common sense until someone hits us over the head with it. This is what Peterson does in his book, which many criticize for being too verbose.

Finally realizing that my brain was creating a flawed narrative was freeing. It meant I didn’t have to be loyal to past versions of myself that no longer exist. With more balance in my life, I can find joy everywhere. This has made me far more resilient to individual setbacks.

I felt like I could write forever on multiple regression, there are so many areas to cover but I have to stop somewhere. Here are few additional things to keep in mind while building a linear regression model for real-world application development:

Still, I’m just bumbling my way through life and enjoying the present. My life could be completely different in five years, but that’s for older me to work out. Being mindful of my ikigai is not letting the good slip through my fingers because I’m too busy reaching for perfect.
In his book Homo Deus, the Israeli historian Yuval Noah Harari explains that we often tell ourselves a fixed story about who we are, when in fact our lives are not one continuous stream. I myself have had a problem with tying my identity and self-worth to a small part of my life: Years ago, my competitive karate career was ended by a string of serious injuries that I kept ignoring. That crushed me. I wasn’t sure who I was without karate. I undervalued everything else great in my life because I was so obsessed.
It’s always amazing to think that within the whole complex machine learning pipeline the easiest part is (in my opinion of course!) actually specifying the model. It’s just three easy steps of instantiate — fit — predict as in most ML algorithms. Of course, you have to parameterize the model and iterate it several times until you get the one that satisfies your criteria, but still, this step of the model building process gives me the least headache.

