Compression Middleware β³, Package Managers π¦, Django π§π»βπ», ML π€ | Techletter #74
Why use compression middleware in nodejs?
Compression middleware can help you with the performance. Some of the reasons to use it are:
Reduces size of response: Compression can dramatically reduce the size of the response sent from the server to the client, often by over 70%.
Improves performance: By reducing the payload size, compression leads to faster download times for clients.
Saves bandwidth: Compressing responses saves bandwidth on both the server and client side. This can lead to cost savings.
By the way, what do you mean by middleware? If you are not aware of it then you can check this article.
Which is better npm vs yarn vs pnpm?
Recently I have started working using pnpm. Itβs quite fast compared to both npm and yarn. The .lock file that is generated is readable and is simply better than both yarn and npm.
Now letβs see what are some of the advantages of them:
Yarn excels in handling large projects with many dependencies, focuses on security and reproducibility, and is good while working with Monorepos.
NPM is a good choice for simple projects with few dependencies.
PNPM is known for its high performance, efficiency, and advanced dependency management, making it suitable for high-performance environments, large-scale Monorepos, and projects with complex dependencies.
How to start a Django Project?
What is Django?
Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.
It is a full-featured web framework that follows the Model-View-Controller (MVC) architectural pattern. It provides a set of tools and libraries for building web applications, including an ORM, a templating engine, and a built-in admin interface.
Before moving ahead you need to understand what a virtual environment is. I have written about it here in Techletter #63
In #63 I have shown how you can create a virtual environment using the traditional method. But now letβs understand how to use the uv package to create it (the uv package is insanely fast).
# On Windows.
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# With pip.
pip install uv
# With Homebrew.
brew install uv
# to create a virtual environment
uv venv
# activation commands are same as above
# On macOS and Linux.
source .venv/bin/activate
# On Windows.
.venv\Scripts\activate
Now install the Django
uv pip install django
uv pip install -r requirements.txt
Now letβs set up a Django project
django-admin startproject YOUR_PROJECT_NAME
cd YOUR_PROJECT_NAME
Once you are inside your project folder, now you can start the server using the below command:
python manage.py runserver
Machine Learning Section
What is a decision tree?
Decision tree is one of the most widely used classification techniques.
In simple terms, a decision tree is an algorithm that works based on if-else conditionals.
How does the tree construction work?
The mathematics that decides how to split a dataset using something is called information theory.
To build a decision tree, you need to make a first decision on the dataset to dictate which feature is used to split the data. To determine this, you try every feature and measure which split will give you the best results.
After that, youβll split the dataset into subsets. The subsets will then traverse down the branches of the first decision node.
If the data on the branches is the same class, then youβve properly classified it and donβt need to continue splitting it.
If the data isnβt the same, then you need to repeat the splitting process on this subset.
The decision on how to split this subset is made the same way as the original dataset, and you repeat this process until youβve classified all the data.
Information gain
The change in the information before and after the split is known as the information gain.
When you know how to calculate the information gain, you can split your data across every feature to see which split gives you the highest information gain.
The measure of information of a set is known as the Shannon entropy or just entropy for short.
from math import log
def calcShannonEnt(dataset):
numEntries = len(dataset)
labelCounts = {}
for featVect in dataset:
currentLabel = featVect[-1]
if currentLabel not in labelCounts.keys():
labelCounts[currentLabel] = 0
labelCounts[currentLabel] += 1
shannonEnt = 0.0
for key in labelCounts:
prob = float(labelCounts[key])/numEntries
shannonEnt -= prob * log(prob, 2)
return shannonEnt
def createDataSet():
dataSet = [[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']]
labels = ['no surfacing','flippers']
return dataSet, labels
mydat, labels = createDataSet()
ent = calcShannonEnt(mydat)
print(ent) # prints 0.9079505944
The higher the entropy, the more mixed up the data is.
Splitting the dataset
For our classifier algorithm to work, you need to measure the entropy, split the dataset, measure the entropy on the split sets, and see if splitting it was the right thing to do. Youβll do this for all of our features to determine the best feature to split on.