Compression Middleware ⏳, Package Managers 📦, Django 🧑🏻‍💻, ML 🤖 | Techletter #74

May 25, 2024

Why use compression middleware in nodejs?

Compression middleware can help you with the performance. Some of the reasons to use it are:

Reduces size of response: Compression can dramatically reduce the size of the response sent from the server to the client, often by over 70%.
Improves performance: By reducing the payload size, compression leads to faster download times for clients.
Saves bandwidth: Compressing responses saves bandwidth on both the server and client side. This can lead to cost savings.

By the way, what do you mean by middleware? If you are not aware of it then you can check this article.

Which is better npm vs yarn vs pnpm?

Recently I have started working using pnpm. It’s quite fast compared to both npm and yarn. The .lock file that is generated is readable and is simply better than both yarn and npm.

Now let’s see what are some of the advantages of them:

Yarn excels in handling large projects with many dependencies, focuses on security and reproducibility, and is good while working with Monorepos.

NPM is a good choice for simple projects with few dependencies.

PNPM is known for its high performance, efficiency, and advanced dependency management, making it suitable for high-performance environments, large-scale Monorepos, and projects with complex dependencies.

How to start a Django Project?

What is Django?

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.

It is a full-featured web framework that follows the Model-View-Controller (MVC) architectural pattern. It provides a set of tools and libraries for building web applications, including an ORM, a templating engine, and a built-in admin interface.

Before moving ahead you need to understand what a virtual environment is. I have written about it here in Techletter #63

In #63 I have shown how you can create a virtual environment using the traditional method. But now let’s understand how to use the uv package to create it (the uv package is insanely fast).

# On Windows.
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# With pip.
pip install uv

# With Homebrew.
brew install uv

# to create a virtual environment
uv venv

# activation commands are same as above

# On macOS and Linux.
source .venv/bin/activate

# On Windows.
.venv\Scripts\activate

Now install the Django

uv pip install django

uv pip install -r requirements.txt

Now let’s set up a Django project

django-admin startproject YOUR_PROJECT_NAME

cd YOUR_PROJECT_NAME

Once you are inside your project folder, now you can start the server using the below command:

python manage.py runserver

Machine Learning Section

What is a decision tree?

Decision tree is one of the most widely used classification techniques.

In simple terms, a decision tree is an algorithm that works based on if-else conditionals.

How does the tree construction work?

The mathematics that decides how to split a dataset using something is called information theory.

To build a decision tree, you need to make a first decision on the dataset to dictate which feature is used to split the data. To determine this, you try every feature and measure which split will give you the best results.

After that, you’ll split the dataset into subsets. The subsets will then traverse down the branches of the first decision node.

If the data on the branches is the same class, then you’ve properly classified it and don’t need to continue splitting it.

If the data isn’t the same, then you need to repeat the splitting process on this subset.

The decision on how to split this subset is made the same way as the original dataset, and you repeat this process until you’ve classified all the data.

Information gain

The change in the information before and after the split is known as the information gain.

When you know how to calculate the information gain, you can split your data across every feature to see which split gives you the highest information gain.

The measure of information of a set is known as the Shannon entropy or just entropy for short.

from math import log

def calcShannonEnt(dataset):
    numEntries = len(dataset)
    labelCounts = {}
    for featVect in dataset:
        currentLabel = featVect[-1]
        if currentLabel not in labelCounts.keys():
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries
        shannonEnt -= prob * log(prob, 2)
    return shannonEnt

def createDataSet():
    dataSet = [[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']]
    labels = ['no surfacing','flippers']
    return dataSet, labels

mydat, labels = createDataSet()
ent = calcShannonEnt(mydat)
print(ent) # prints 0.9079505944

The higher the entropy, the more mixed up the data is.

Splitting the dataset

For our classifier algorithm to work, you need to measure the entropy, split the dataset, measure the entropy on the split sets, and see if splitting it was the right thing to do. You’ll do this for all of our features to determine the best feature to split on.

The Tech;Letter

Discussion about this post