CodeAlchemy

Jotting one man's journey through software development, programming, and technology


Project maintained by pablogarciaprado Hosted on GitHub Pages — Theme by mattgraham

◀️ Home

Random Concepts

AI

Generative AI vs LLMs

Generative artificial intelligence, which is commonly referred to as gen AI, is a subset of artificial intelligence that is capable of creating text, images, or other data using generative models, often in response to prompts.

Generative AI encompasses a broader range of models capable of generating various types of content beyond just text, while LLM specifically refers to a subset of generative AI models focusing on language tasks.

While both terms describe AI models capable of generating human-like responses based on input prompts in many references, it’s important to note they’re not identical.

LLMs

Large language models are highly sophisticated computer programs trained on gigantic amounts of data that can be text or images. LLMs refer to large, general-purpose language models that can be pre-trained and then fine-tuned for specific purposes.

In this context, large refers to: The size of the training dataset, which can sometimes be at the petabyte scale. And the number of parameters. Parameters are the memories and knowledge that the machine has learned during model training.

Prompts

When you submit a prompt to an LLM, it calculates the probability of the correct answer from its pre-trained model. The probability is determined through a task called pre-training. In this way, the LLM works like a fancy autocomplete, suggesting the most common correct response to the prompt.

Hallucinations

Hallucinations are words or phrases that are generated by the model that are often nonsensical or grammatically incorrect. This happens because LLMs can only understand the information they were trained on.

Algorithm

Piece of code that does an interesting thing.

API

APIs (Application Programming Interfaces) are software programs that give developers access to computing resources and data. Any interface that lets two pieces of software communicate. It defines how clients request data and how servers respond. They can work over any protocol (HTTP, TCP, gRPC, WebSockets, etc.).

The ability to access data and computing resources greatly increases a developer’s efficiency. It is much easier to use an API than to build every single program, method, or dataset from scratch. APIs are built on the principle of abstraction—you don’t need to understand the inner workings or complexities of an API to use it in your own environment.

Examples:

How do APIs work?

Architecture

To communicate effectively, programs must adhere to a clear protocol that governs the transfer and interpretation of data. The internet is the standard communication channel that APIs use to transmit requests and responses between programs. Web-based APIs use the client-server model as the underlying architecture for exchanging information. The client is a computing device that makes a request for some computing resource or data, and the server has data and/or computing resources stored on it, which interprets and fulfills the client’s request.

HTTP protocol and methods

Since APIs use the web as a communication channel, many of them adhere to the HTTP protocol, which specifies rules and methods for data exchange between clients and servers over the internet. APIs that utilize the HTTP protocol use HTTP request methods (also known as “HTTP verbs”) for transmitting client requests to servers. The most commonly used HTTP request methods are GET, POST, PUT, and DELETE. GET is used by a client to fetch data from a server, PUT replaces existing data or creates data if it does not exist, POST is used primarily to create new resources, and DELETE removes data or resources specified by the client on a server.

Endpoints

APIs use HTTP methods to interact with data or computing services hosted on a server. These methods are useless if there isn’t a way to access specific resources with consistency. APIs utilize communication channels called endpoints so that clients can access the resources they need without complication or irregularity. Endpoints are access points to data or computing resources hosted on a server and they take the form of an HTTP URI. Endpoints are added to an API’s base URL to create a path to a specific resource or container of resources. Additionally, query strings can be added to endpoints to pass in variables that may be needed to complete an API’s request.

REST and RESTful APIs

APIs that utilize the HTTP protocol, request methods, and endpoints are referred to as RESTful APIs. RESTful APIs live on the server, acting as an implementer for client requests. This model defines a framework of endpoints (nouns) that HTTP methods (verbs) act on, and APIs use this framework to fulfill requests. To summarize, RESTful APIs utilize the client-server model, adhere to the HTTP protocol, utilize HTTP request methods, and utilize endpoints to access specific resources.

A REST API (Representational State Transfer API) is a way for applications to communicate over the web using HTTP requests. It follows a set of conventions for creating, reading, updating, and deleting resources.

Key points:

Authentication and authorization

Authentication and authorization are two terms that are often used interchangeably, but they are not the same thing.

Authentication identifies who you are, and authorization determines what you can do.

There are three types of authentication/authorization services that Google APIs use. These are “API Keys”, “Service accounts”, and “OAuth”. An API uses one of these authentication services depending on the resources it requests and from where the API is called from.

API keys

API keys are secret tokens that usually come in the form of an encrypted string. API keys are quick to generate and use. Oftentimes, APIs that use public data or methods and want to get developers up and running use API keys to quickly authenticate users.

In Google Cloud terms, API keys identify the calling project making the call to an API. By identifying the calling project, API keys enable usage information to be associated with that project, and they can reject calls from projects that haven’t been granted access or enabled by the API.

An API key is less likely to be logged or saved in browser history if it is specified in a header.

OAuth

OAuth tokens are similar to API keys in their format, but they are more secure and can be linked to user accounts or identities. These tokens are used primarily when APIs give a developer the means to access user data.

While API keys give developers access to all of an API’s functionality, OAuth client IDs are all based on scope; different privileges are granted to different identities.

Service Accounts

A service account is a special type of Google account that belongs to your application or a virtual machine (VM) instead of to an individual end user. Your application assumes the identity of the service account to call Google APIs, so that the users aren’t directly involved.

You can use a service account by providing its private key to your application, or by using the built-in service accounts available when running on Cloud Functions, Google App Engine, Compute Engine, or Google Kubernetes Engine.

Bootstrap

Refers to the initial process of setting up an application or system when it starts. Specifically, it refers to the time and resources required for the application to load and become fully operational after it is triggered or launched. This process can include tasks like loading dependencies, initializing configurations, connecting to databases, and preparing the application to handle requests.

To mitigate the impact of long startup times (especially in serverless or cloud environments), you can pre-warm instances, meaning that the application is kept alive and ready to handle traffic without having to go through the bootstrap process every time a new request is made. This helps reduce the delay that could occur when an instance is started from scratch (often referred to as cold start).

CI/CD

Continuous Integration and Continuous Deployment (or Delivery) is a set of practices and tools in software development aimed at automating and improving the process of delivering software.

Automates and streamlines the development, testing, and deployment process, enabling faster and more reliable software delivery.

Continuous Integration (CI)

CI is the practice of frequently integrating code changes into a shared repository, followed by automated builds and tests to detect errors early.

Key Features:

  1. Developers push code changes frequently (e.g., daily).

  2. Automated systems build and test the code after each change.

  3. Ensures new code integrates well with the existing codebase.

  4. Catches bugs early, improving overall software quality.

Tools: Jenkins, GitHub Actions, GitLab CI, Travis CI.

Continuous Deployment (CD)

CD extends CI by automating the process of deploying code to production or other environments after it passes all tests.

Key Features:

  1. Deployment happens automatically (or semi-automatically) after successful tests.

  2. Ensures faster delivery of new features and bug fixes to users.

  3. Reduces human intervention, minimizing errors in the deployment process.

Two Variants:

Tools: AWS CodePipeline, Azure DevOps, GitHub Actions, CircleCI.

Benefits of CI/CD

  1. Faster Development Cycles: Code changes are quickly tested and deployed.

  2. Higher Code Quality: Automated tests ensure fewer bugs reach production.

  3. Reduced Risks: Small, incremental updates are easier to test and rollback if needed.

  4. Increased Collaboration: Teams integrate and share changes more frequently.

Cron vs Airflow

Core difference

In other words:

Current assessment

Right now, the current setup is more than enough because:

Airflow gives you reliability, observability, and control as pipelines grow in complexity, so it makes sense to consider migrating when the pain is clear.

Computer Infrastructure

Memory (RAM) – “Short-term brain”

Fast storage used to hold data that programs are currently using. Measured in: GB (gigabytes) or GiB (gibibytes, a binary version).

More memory = more or bigger apps can run at once without slowing down.

CPU (Processor) – “Worker speed & count”

The part that actually executes instructions.

Measured in cores: More cores = more things it can do at the same time. Clock speed (GHz) tells you how fast each core is, but core count is often the first concern.

Use Case Memory CPU Cores
Light web browsing 4–8 GiB 2
Coding + light tools 8–16 GiB 2–4
Video editing / VMs 16–32 GiB 4–8
Medium web server (API) 8–32 GiB 2–8
Big data / ML workloads 64+ GiB 16+

CORS (cross-origin resource sharing)

CORS is a protocol that uses HTTP headers to indicate to browsers whether it is safe to access restricted resources from a separate domain. By default, cross-domain requests are forbidden by the same-origin security policy.

The same-origin policy protects browser users from unknowingly sharing session information with bad actors. The same-origin policy means that a web page served from www.example.com could not, by default, make a call to APIs at api.example.com because the host name is different. CORS can be used to allow this kind of cross-origin access.

CORS also uses preflight requests.

A preflight request is a special kind of HTTP request that the browser automatically sends before the actual request, to check whether it’s safe to send the real request across origins.

The browser sends a preflight request using the OPTIONS verb to find out whether the next call will be allowed.

Data Storage Units

GB vs GiB

The difference between GB (gigabytes) and GiB (gibibytes) is in how the size is calculated—decimal vs binary.

Unit Value (in bytes) Based on
1 GB 1,000,000,000 bytes Decimal (base 10)
1 GiB 1,073,741,824 bytes Binary (base 2)

Easy rule of thumb: GiB is slightly bigger than GB when referring to the same number.

DNS (Domain Name System)

DNS is like the phone book of the internet.

Example: You type www.youtube.com → DNS translates it → Your device connects to the actual IP address of YouTube’s servers.

Modern Web Communication and Networking Protocols

Network protocols (TCP, HTTP/2)

TCP

The TCP protocol (Transmission Control Protocol) is a core internet protocol that ensures reliable, ordered, and error-checked data delivery between devices.

HTTP/2

HTTP/2 is an improved version of the HTTP protocol that makes web communication faster and more efficient by:

It speeds up loading websites and reduces latency.

Security (HTTPS, TLS certificates)

A TLS certificate is a digital file that enables secure, encrypted communication over HTTPS. It proves a website’s identity and ensures data is private and trusted between the user and the server. Without it, browsers show “Not Secure” warnings.

Communication patterns (HTTP requests, WebSockets, gRPC)

HTTP requests

HTTP requests are messages sent by a client (like a browser) to a server to ask for data or perform an action.

WebSockets

WebSockets are a communication protocol that allows a persistent, two-way connection between a client and server. Unlike regular HTTP (which is one request, one response), WebSockets let both sides send and receive data in real time — great for chats, games, or live updates.

gRPC

gRPC is a fast, open-source framework for remote procedure calls that lets different systems communicate efficiently using:

It’s great for connecting microservices with low latency and strong typing.

Infrastructure concepts

Endpoints

A reliable HTTPS endpoint is a secure web address (URL) that:

Ports

A TCP port is a numbered endpoint used by a computer to identify specific services or applications when communicating over the internet using the TCP protocol.

VPC Network

A VPC (Virtual Private Cloud) network is a private, isolated virtual network within a cloud provider where you can securely run your resources (like virtual machines or containers). It lets you control IP addresses, subnets, routing, and firewall rules—just like a traditional private network but in the cloud.

Prompting

A prompt is a specific instruction, question, or cue given to a computer. In other words, it is the text that you feed to the model. Prompt engineering is a way of articulating your prompts to get the best response from the model. The better structured a prompt is, the better the output from the model will be.

Types

Prompts can be in the form of a question, and are categorized into four categories:

Elements

The two elements of a prompt: the preamble and the input.

Best Practices

  1. Write detailed and explicit instructions. Be clear and concise in the prompts that you feed into the model.
  2. Be sure to define boundaries for the prompt. It’s better to instruct the model on what to do rather than what not to do.
  3. Adopt a persona for your input. Adding a persona for the model can provide meaningful context to help it focus on related questions, which can help improve accuracy.
  4. It’s a recommended practice to keep each sentence concise. Longer sentences can sometimes produce suboptimal results.

Proxy

A proxy is an intermediary that acts on behalf of something else. The meaning varies slightly by context:

Networking

A proxy server sits between a user and the internet. It forwards requests and responses between them.

Uses:

Think of it like giving your letter to a trusted friend, who then delivers it to the post office instead of you sending it directly.

General Use

A proxy is someone or something authorized to act for another.

Programming:

A proxy object is a placeholder or interface that controls access to another object.

Use cases:

AI or Data Science:

A proxy variable is an indirect measure used when the actual variable isn’t available or measurable.

Unit Tests

A unit test is a type of software testing that focuses on verifying the correctness of individual units or components of a program. A “unit” typically refers to the smallest piece of code that can be tested in isolation, such as a function, method, or class.

Key Characteristics

  1. Small Scope:
  1. Automation:
  1. Isolation:
  1. Fast Execution:

Purpose

Example

def add(a, b):
    return a + b

A unit test for the function above might look like this:

import unittest

class TestMathFunctions(unittest.TestCase):
    def test_add(self):
        self.assertEqual(add(2, 3), 5)
        self.assertEqual(add(-1, 1), 0)
        self.assertEqual(add(0, 0), 0)

if __name__ == "__main__":
    unittest.main()

Webhook

A webhook is a way for one system to automatically send real-time data to another system when a specific event happens. It’s like a reverse API call: instead of your app asking for data, the webhook pushes the data to your app.

  1. You provide a URL (your webhook endpoint).
  2. Another service is set up to send data (usually via an HTTP POST request) to that URL when a certain event occurs.
  3. Your server receives and processes the data instantly.

Real-World Analogy: Imagine you place an order at a coffee shop and give them your phone number. Instead of waiting around, they text you when your drink is ready. That text is the webhook: it’s sent when the “drink is ready” event happens.