Recently I performed a large refactor in a Go project and wrote down the approach.

Originally, the project used a flat layout like this:

- cmd/api/main.go
- user.go
- user_db.go
- user_db_test.go
- user_handler.go
- user_handler_test.go
- server.go

This worked for a while, but after about a year the number of files grew and overall readability deteriorated.

Keeping everything in one package avoids cyclic imports, but we started to hit symbol name collisions that made certain names unusable.

After considering options, we adopted the layout that places the domain package at the repository root.

What this layout means

By “placing the domain package at the repository root,” I mean creating a domain package at the repo root and organizing implementations into subpackages according to their dependencies. The idea is inspired by Ben Johnson’s article “Standard Package Layout”.

Although the article calls it a “Standard Package Layout,” that name can invite bikeshedding. In this post I’ll simply call it the “root‑domain package layout.”

For example, for a backend API using MySQL, the root domain package might look like this:

- cmd/api/main.go
- mysql/
   - user.go
- http/
   - user_handler.go
   - server.go
- user.go

You don’t have to place Go files directly at the Git repository root. If you prefer not to, or if the repository name doesn’t match the domain name, create a directory for the root domain and put the code there.

If the product is named myapp, the domain package becomes myapp.

- cmd/api/main.go
- myapp/
  - mysql/
    - user.go
  - http/
    - user_handler.go
    - server.go
  - user.go

myapp/user.go defines models used across the codebase:

package myapp

type User struct {
    Name string
    Age int
}

type UserService interface {
    CreateUser(uid string) (*User, error)
    GetUser(uid string) (*User, error)
    UpdateUser(uid string) (*User, error)
    DeleteUser(uid string) error
}

In the mysql package, implement the myapp.UserService interface.

package mysql

type UserService struct {
		db *sqlx.DB
}

func (s *UserService) CreateUser(uid string) (*myapp.User, error) {
	// Concrete implementation using MySQL
}

In the http package, depend on the interface from myapp instead of using mysql.UserService directly.

package http

type UserHandler struct {
    userService myapp.UserService
}

For testing, I generated mocks of the root package via gomock under the mocks directory and had subpackages depend on those mocks when they needed services from the root package.

package myapp

//go:generate mockgen -destination=./mocks/user_service_mock.go -package=mocks . UserService
type UserService interface {
    CreateUser(uid string) (*User, error)
    GetUser(uid string) (*User, error)
    UpdateUser(uid string) (*User, error)
    DeleteUser(uid string) error
}

This way, the mysql package focuses solely on MySQL‑related logic, while the http package avoids a hard dependency on MySQL and can write tests focused on the HTTP server behavior.

- cmd/api/main.go
- myapp/
  - mocks/
    - user_service_mock.go
  - mysql/
    - user.go
    - user_test.go
  - http/
    - user_handler.go
    - user_handler_test.go
  - user.go
  - user_test.go

In the actual project, the structure looked like this:

- cmd/api/main.go
- myapp
  - http (HTTP server)
  - grpc (gRPC server)
  - rdb (MySQL)
  - s3 (Amazon S3)
  - dydb (DynamoDB)
  - iot (AWS IoT)
  - ssm (AWS SSM)
  - fcm (Firebase Cloud Messaging)
  - ... (omitted)

In these subpackages we run tests as close to reality as possible using dockertest and LocalStack when applicable.

Why not name the package domain

It’s perfectly fine to create a domain package instead of placing the domain model at the root:

- cmd/main.go
- domain/
  - user.go
  - user_test.go
- http/
  - user_handler.go
  - user_handler_test.go
- mysql/
  - user.go
  - user_test.go

In that case, code would refer to types as domain.XXX:


func main() {
    user := domain.User{}
}

If placed at the root as myapp, it becomes myapp.XXX:


func main() {
    user := myapp.User{}
}

Using the product name as the package makes it more obvious that these types are used across the whole product.

Comparison with other layouts

Some projects reference specific architectures like Clean Architecture and introduce directories named after layers:

- cmd/main.go
- domain/
- infrastructure/
- presenter/
- repository/
- usecase/

Such naming assumes familiarity with the architectural vocabulary and can be opaque to newcomers.

Similar to why I prefer a product name over domain, I find more direct naming makes code easier to navigate.

Grouping subpackages by dependency keeps placement obvious even without special background knowledge:

- cmd/main.go
- mysql/
- http/
- user.go

Takeaways

Refactoring to place the domain package at the repository root made the project easier to understand and to test. It encourages a design that’s approachable without prior context.

In the previous post we generated OpenAPI with TypeSpec and SWR hooks with Orval.

Here we’ll use Orval’s MSW (Mock Service Worker) generation to make Storybook request mocked APIs.

What is MSW?

MSW is a mocking library for web apps. It intercepts requests (via Service Worker in the browser) and returns mock responses.

Compared to alternatives:

  • Simpler than replacing calls in code; no extra mock plumbing in your code.
  • Simpler setup than running a separate JSON Server mock process.

MSW also works server‑side (Node.js) by intercepting the http module.

Storybook

Storybook lets you develop UI components in isolation.

With Next.js 13, MSW integration with a running app can be tricky—according to this issue, it didn’t work with either Pages or App Router:

This project develops components in Storybook, which is sufficient; below is how to use MSW from Storybook.

Add to Storybook

Use the MSW addon:

MSW Storybook Addon

Install:

npm i msw msw-storybook-addon -D

Create the service worker

Generate the worker script under public/:

npx msw init public/

Update .storybook/preview.ts

Add the addon and pass handlers (created below):

import { initialize, mswLoader } from "msw-storybook-addon";
import { handlers } from "../mocks/handlers";

// Initialize MSW
initialize();

const preview: Preview = {
  parameters: {
    msw: {
      handlers
    }
  },
  loaders: [mswLoader]
};

Generate MSW mocks with Orval

Edit orval.config.ts

Add mock: true so Orval generates mocks along with SWR files:

module.exports = {
  "user-api": {
    input: {
      target: "./openapi.yml",
    },
    output: {
      mode: "split",
      target: "./api/endpoints",
      schemas: './api/models',
      client: "swr",
      mock: true,
    },
    hooks: {
      afterAllFilesWrite: "prettier --write",
    },
  },
};

Generated userAPI.msw.ts

After enabling mocks, run npx orval. You’ll get userAPI.msw.ts with MSW handlers and mock data generation using @faker-js/faker.

Add mocks/handlers.ts

Create mocks and export handlers:

mkdir mocks
cd mocks
import { getUserAPIMock } from "../api/userAPI.msw.ts";

export const handlers = getUserAPIMock();

These handlers are referenced in .storybook/preview.ts. Use the generated SWR hooks in components as usual and MSW will serve responses:

const { data, error, isLoading } = useUsersGetUsers();

Customize mock data

Customize fields via the properties option:

Example to randomize name/email/image URL:

module.exports = {
  "user-api": {
    input: {
      target: "./openapi.yml",
    },
    output: {
      mode: "split",
      target: "./api/endpoints",
      schemas: './api/models',
      client: "swr",
      mock: true,
      override: {
        mock: {
          properties: {
            name: () => faker.name.fullName(),
            email: () => faker.internet.email(),
            '/image/': () => faker.image.url(),
          },
        }
      },
      hooks: {
        afterAllFilesWrite: "prettier --write",
      },
    },
  },
};

Summary

Development flow:

  1. Define the API in TypeSpec (.tsp).
  2. Generate OpenAPI YAML.
  3. Use Orval to generate SWR hooks and MSW mocks.
  4. Develop components in Storybook against the MSW API.

In the previous post I generated an OpenAPI YAML from a TypeSpec API definition.

Now that we have an OpenAPI spec, let’s generate client code and even React hooks for data fetching from the frontend.

With Orval, we can generate hooks for SWR, so let’s try it.

What is SWR?

SWR is a data‑fetching hooks library for React providing caching, error handling, and more.

It makes implementing API requests in the frontend easy. Example from the docs:

function Profile() {
  const { data, error, isLoading } = useSWR('/api/user', fetcher)

  if (error) return <div>failed to load</div>
  if (isLoading) return <div>loading...</div>
  return <div>hello {data.name}!</div>
}

What is Orval?

Orval generates TypeScript clients from OpenAPI (Swagger). Features:

  1. Generate TypeScript models
  2. Generate HTTP calls
  3. Generate MSW (Mock Service Worker) mocks

It can also generate SWR hooks, React Query, Angular clients, etc.

Install

npm i orval -D

Prepare files

We’ll need the TypeSpec tsp file and Orval config.

main.tsp

Define a simple API with TypeSpec:

using TypeSpec.Http;

model User {
	id: string;
	name: string;
	email: string;
}

@route("/users")
interface Users {
	@get
	getUsers(): User[];

	@get
	@route("/{id}")
	getUser(@path id: string): User;
}

Generate openapi.yml as in the previous post:

tsp compile main.tsp --emit @typespec/openapi3

orval.config.ts

We want SWR hooks; docs:

Create orval.config.ts:

module.exports = {
  userstore: {
    input: {
      target: "./openapi.yml",
    },
    output: {
      mode: "split",
      target: "./api/endpoints",
      schemas: "./api/models",
      client: "swr",
    },
  },
};

This places model types under api/models and SWR files under api/endpoints.

Run prettier on generated files

Use afterAllFilesWrite hook:

module.exports = {
  userstore: {
    input: {
      target: "./openapi.yml",
    },
    output: {
      mode: "split",
      target: "./api/endpoints",
      schemas: "./api/models",
      client: "swr",
    },
    hooks: {
      afterAllFilesWrite: "prettier --write",
    },
  },
};

Generate

npx orval

Warnings may appear, but files should be generated under api.

models/user.ts

Contains data models used by the API, reflecting the TypeSpec definitions:

export interface User {
  email: string;
  id: string;
  name: string;
}

endpoints/userAPI.ts

SWR hooks for each API endpoint are generated, e.g.:

export const useUsersGetUsers = (...) => { /* ... */ }
export const useUsersGetUser = (id: string, ...) => { /* ... */ }

Usage:

const { data, error, isLoading } = useUsersGetUsers();

You can pass SWR options; e.g., disable caching with enabled: false:

const { data, error, isLoading } = useUsersGetUsers({
  swr: {
    enabled: false,
  },
});

Set API base URL

By default the generated client requests the same origin as the frontend. To target a different domain, set axios’s baseURL (Orval uses axios under the hood):

axios.defaults.baseURL = '<BACKEND URL>';

See also: Base URL guide.

Summary

Workflow:

  1. Generate OpenAPI from TypeSpec tsp files.
  2. Use Orval to generate SWR hooks for the frontend.

Since Orval can also generate MSW mocks, you can proceed with frontend dev using a TypeSpec definition even before the backend is ready.

This site’s blog is built with Next.js. Articles are written as MDX files, i.e., Markdown plus some React components.

I created a [/posts] “Dev Notes” category to collect development memos and enabled syntax highlighting there. Notes follow.

Install

Use react-syntax-highlighter for highlighting:

Install into a Next.js project:

npm install react-syntax-highlighter --save
npm install @types/react-syntax-highlighter --dev

Setup

When using MDX with Next.js, you can customize elements via mdx-components.tsx.

Add the following:

export function useMDXComponents(components: MDXComponents): MDXComponents {
  return {
    ...components,
    pre: ({ children }) => <>{children}</>, // ②
    code: ({ className, children, ...rest }) => { // ③
      const match = /language-(\w+)/.exec(className || "");
      return match ? (
        <SyntaxHighlighter language={match[1]} style={oneDark}>
          {String(children)}
        </SyntaxHighlighter>
      ) : (
        <code {...rest} className={className}> // ④
          {children}
        </code>
      );
    },
  }
}

This highlights fenced code blocks in Markdown automatically.

Notes

  1. I’m using Prism’s oneDark style.
  2. MDX code blocks render as <pre><code>…</code></pre>, but SyntaxHighlighter generates its own pre, so we strip the outer pre to avoid nesting.
  3. If a language is present on code, render with SyntaxHighlighter.
  4. When no language is specified or for inline code, fall back to a normal code element.

Background

For APIs, I prefer defining them in a DSL and generating code from it. With REST, you can write OpenAPI (Swagger) and generate docs/clients, but writing YAML is often painful.

There are GUI tools to author OpenAPI, but requiring a separate GUI to write and read specs feels wrong—specs should be text, versioned like code.

Code‑first tools that generate OpenAPI from a backend exist, but they tie you to a language/framework. If you’re only on the frontend, it’s not ideal.

Since I already write TypeScript types on the frontend, a TypeScript‑like way to write API specs would be ideal. That’s where TypeSpec fits.

What is TypeSpec?

TypeSpec is a Microsoft DSL for designing and documenting APIs using TypeScript‑like syntax.

Official site: https://typespec.io/

Install and setup

Install globally:

npm install -g @typespec/compiler

Check the version:

tsp --version

Initialize a new project (creates main.tsp, package.json, tspconfig.yaml, etc.):

tsp init

Install dependencies:

tsp install

Basic usage

Create main.tsp. Example from the docs:

using TypeSpec.Http;

model Store {
  name: string;
  address: Address;
}

model Address {
  street: string;
  city: string;
}

@route("/stores")
interface Stores {
  list(@query filter: string): Store[];
  read(@path id: Store): Store;
}

If you’ve written code before, you can read this at a glance. Compared to the equivalent OpenAPI YAML, TypeSpec is far more concise.

Generate OpenAPI

Install the package:

npm install @typespec/openapi3

Compile TypeSpec to generate OpenAPI:

tsp compile main.tsp --emit @typespec/openapi3

You can configure tspconfig.yaml so you don’t pass --emit each time:

emit:
  - "@typespec/openapi3"

Watch for file changes:

tsp compile main.tsp --emit @typespec/openapi3 --watch

By default, tsp-output/@typespec/openapi3/openapi.yaml is created. To output openapi.yml at the project root, add:

options:
  "@typespec/openapi3":
    emitter-output-dir: "{output-dir}"
    output-file: "openapi.yml"

output-dir: "{project-root}"

Syntax

Models

Define data models with model. Defaults and optional properties are TypeScript‑like:

model User {
  id: string;
  name: string = "Anonymous";
  description?: string;
}

Decorators add docs and validation:

model User {
  @doc("User ID")
  @maxLength(50)
  username: string;

  @doc("User email address")
  @pattern("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$")
  email: string;

  @doc("User age (18+)" )
  @minValue(18)
  age: int32;
}

Use enums:

enum UserRole {
  Admin,
  Editor,
  Viewer
}

model User {
  role: UserRole;
}

Operations

Use op to define endpoints. Example GET to fetch a user:

op getUser(userId: string): User;

Decorate with HTTP metadata:

@get
@route("/users/{userId}")
op getUser(userId: string): User;

Interfaces

Group operations with interface. CRUD for users:

interface Users {
	@get
	@route("/users")
	listUsers(limit?: int32, role?: string): User[];

	@get
	@route("/users/{userId}")
	getUser(userId: string): User;

	@post
	@route("/users")
	createUser(
		user: {
			name: string;
			email: string;
		},
	): User;

	@patch
	@route("/users/{userId}")
	updateUser(
		userId: string,
		user: {
			name?: string;
			email?: string;
		},
	): User;

	@delete
	@route("/users/{userId}")
	deleteUser(userId: string): void;
}

Takeaways

I used TypeSpec on a project to write an API spec and it felt like writing TypeScript. With the VS Code extension you also get static checks, so syntax errors are easy to catch.

Following the “principle of least astonishment,” TypeSpec feels like a low‑surprise way for TypeScript programmers to write API specs. I plan to use it more going forward.

I worked on multiple projects in parallel, so I ended up touching a fairly wide range of technologies. Here’s a rough list from memory.

  • Mobile apps
    • Flutter
    • React Native, Expo
  • Frontend
    • Next.js (TypeScript)
    • Astro
    • Tailwind CSS
  • Backend
    • Go (echo)
    • Python (FastAPI)
    • Ruby (Rails)
  • Embedded
    • PlatformIO (C++)
    • ESP32
  • Cloud services
    • AWS (ECS, IoT Core)
    • GCP (Cloud Run)
    • Firebase (Auth/Firestore)
    • PlanetScale (MySQL)
    • Cloudflare
  • Other
    • Graph DB (Neo4j)
    • gRPC (Go, Flutter)
    • MQTT (ESP32)
    • Terraform
    • Chrome Extension
    • Figma (web design)

In recent years I did a lot of mobile work; lately there’s been more backend too.

The most challenging has been C++ firmware (I’m still doing it). Debugging is tough, root‑causing errors takes time, memory has to be carefully managed, and you can’t always lean on libraries—so things take longer than expected. More practice needed.

I’d mainly used React Native for mobile, but started using Flutter heavily last year. Hard to pick a favorite for cross‑platform apps, but Flutter feels stable and solid.

For backend, there are many choices. When I don’t have a strong reason otherwise, I gravitate to Go because it’s fast in many senses and leaves fewer things to agonize over. A few days ago I built a small personal API and went with Go again.

For my dev environment, I use IntelliJ or CLion (for C++). Most people around me use VS Code now, but JetBrains IDEs still fit me better.

When building a new GitHub Action, repeatedly testing on GitHub is slow. With act you can validate locally to some extent and then run it on GitHub.

nektos/act GitHub Repository

Installation

Requires a Docker environment.

On macOS:

brew install act

If Go is installed (non‑Mac as well):

go install github.com/nektos/act@latest

On Apple Silicon (M1/M2), pass --container-architecture linux/amd64.

You may hit this yarn issue:

Specify a platform with -P/--platform; as in the issue, -P ubuntu-latest=ghcr.io/catthehacker/ubuntu:js-latest allowed yarn to run.

Put these in ~/.actrc so you don’t have to type them every time:

--container-architecture linux/amd64
-P ubuntu-latest=ghcr.io/catthehacker/ubuntu:js-latest

Basic usage

With no args, act runs workflows triggered by push:

act

To trigger a pull_request workflow:

act pull_request

List workflows for an event

Add -l/--list after the event:

act pull_request -l
Stage  Job ID      Job name    Workflow name  Workflow file    Events      
0      helloworld  helloworld  Hello World    helloworld.yaml  pull_request

Run a specific workflow

Use -W/--workflows to point to a file:

act -W .github/workflows/helloworld.yaml pull_request

Simulate merging a Pull Request locally

Create a test workflow that prints “hello world” on merge (.github/workflows/helloworld.yaml):

name: Hello World

on:
  pull_request:
    types: [closed]

jobs:
  hello-world:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - run: echo "Hello World!"

Running locally now fails because the event payload is missing:

act -W .github/workflows/helloworld.yaml pull_request
Error:  Error in if-expression: "if: github.event.pull_request.merged == true" (Unable to dereference 'merged' on non-struct 'invalid')

Work around this by providing an event JSON via -e/--eventpath.

Reference: https://github.com/nektos/act/wiki/Beginner's-guide#using-event-file-to-provide-complete-event-payload

cat pull_request.json
{
  "pull_request": {
    "merged": true
  }
}
act -W .github/workflows/helloworld.yaml pull_request -e pull_request.json
[Hello World/hello-world] 🚀  Start image=ghcr.io/catthehacker/ubuntu:js-latest
...
| Hello World!
[Hello World/hello-world]     Success - echo "Hello World!"

If you change merged to false in pull_request.json, it won’t run.

Recently while reading the vector search OSS Qdrant, I learned an approach for efficient top‑k selection and took notes here.

Partial sorting

When building a search response, you often want only the top‑k items (largest values) from an array of length N.

I used to sort the whole array and slice the first k items. In Rust, that might look like:

fn top_k(v: &mut [i32], k: usize) -> &[i32] {
    v.sort_by_key(|&x| std::cmp::Reverse(x));
    &v[0..k]
}

fn main() {
    let mut v = vec![1, 8, 2, 4, 3];
    assert_eq!(top_k(&mut v, 3), vec![8, 4, 3]);
}

Python and Rust use Timsort (Wikipedia), which is O(n) best‑case and O(n log n) on average/worst‑case, so the above is O(n log n).

It’s simple but wasteful when N ≫ k. With N=10,000 and k=100, there should be a more efficient method.

This is where partial sorting (Wikipedia) helps. Of several methods listed there, heap‑based algorithms are often easiest because standard libraries commonly provide BinaryHeap.

We’ll assume we want the top‑k in descending order.

BinaryHeap complexity

Costs for a binary heap:

  • Peek max: O(1)
  • Push: average O(1), worst O(log n)
  • Pop root: average O(log n), worst O(log n)

Consider a max‑heap where the root is the maximum.

Image
Quoted from https://en.wikipedia.org/wiki/Binary_heap

Getting the max is O(1) by looking at the root.

Why is push average O(1)? Roughly half the nodes are leaves; there’s a 50% chance insertion ends at a leaf, 25% one level up, 12.5% two levels up, etc., giving O(1)*0.5 + O(2)*0.25 + O(3)*0.125 + … = O(2).

Removing the root takes O(log n) swaps proportional to tree height.

  1. Argument for O(1) average‑case heap insertion
  2. Sum of the series on WolframAlpha

Keep a size‑k min‑heap

Push until the heap size reaches k; once at k, on each new push, pop the root to keep size k. The heap then contains the top‑k elements.

Complexity: inserting into a size‑k heap is O(log k), repeated n times → O(n log k). This keeps memory bounded at k, which helps when n is huge or data arrives online.

Qdrant uses this approach (implementation).

Build a max‑heap and pop k times

Alternatively, build a max‑heap over all n items, then pop k times to collect the top‑k in order.

This is a straightforward priority‑queue usage and may be easier to implement.

Complexity: heap construction O(n) plus k pops O(k log n), total O(n + k log n). For large n, this can beat the previous method thanks to the log n factor.

Image
k=100, n=100–10000

Other methods

Wikipedia’s Partial sorting lists quickselect+quicksort and mergesort+quicksort hybrids with O(n + k log k), which can be even more efficient.

Another option is bubble sort that stops after reaching the k‑th element, which is O(nk) (Wikipedia).

For reference, here are visualized complexities:

Image
k=100, n=100–10000; O(n + k log n) and O(n + k log k) overlap

Image
k=1000, n=1000–10000; with large k, O(n + k log k) is more efficient

Introduction

This post is for day 16 of the Information Retrieval / Search Technology Advent Calendar 2021. I’ll briefly explain the overall picture of similar image search.

What is similar image search?

Similar image search lets you search with an image as the query and retrieve visually similar images. Unlike conventional text search, which finds documents by text, image similarity search finds targets by visual appearance.

For example, if you’re looking for clothing with a unique pattern, it can be difficult to describe that pattern in words, making text search hard. When appearance matters, image‑based search can be more suitable.

Google Images also supports searching by image, so many readers may have used it before.

Google Images

Google Images

Recap: full‑text search

In full‑text search, we must retrieve all documents matching a text query. A naive scan over all documents becomes slower as the corpus grows.

To avoid this, we build an index in advance—specifically an inverted index— which maps each term to a list of document IDs, analogous to a book’s back‑of‑book index.

For Japanese text, we perform morphological analysis to split text into tokens and remove particles that aren’t useful for search.

Finally, we rank results by similarity using measures that consider term frequency such as tf‑idf. Apache Lucene is a popular Java library that implements these components. For real systems, we typically rely on Solr or Elasticsearch (both built on Lucene) for clustering, performance, and availability.

How similar image search works

Vectorizing images

In image similarity search, we convert images into vectors so we can compute similarity between vectors (e.g., Euclidean distance or cosine similarity).

Once images are vectors, we can compute similarity via vector distances

Once images are vectors, we can compute similarity via vector distances

We assume that similar images map to nearby points in the feature space. How we create these vectors is a central design choice.

Feature extraction

With 8‑bit color, each pixel is represented by (R, G, B) in the range 0–255. We could flatten these values to a long vector, but similarity using raw pixels performs poorly due to sensitivity to position shifts and background noise.

Instead, we extract useful features from the image—this is feature extraction. Today, we typically use convolutional neural networks (CNNs) for feature extraction.

CNNs excel at image recognition. With deep learning, they can recognize images at near‑human accuracy and are robust to small shifts, color, or shape variations within a class.

In many applications we want to search by a region within an image. Using object detection, we first detect regions and categories of objects, then extract features only from relevant regions. For example, for face similarity search, we crop faces and extract features from the face area only.

Extract only the face region from the image

Extract only the face region from the image

To create features suited for similarity, we can use metric learning—more specifically deep metric learning—which trains embeddings so that items of the same class are closer together and different classes are farther apart. Metric learning outputs vectors directly and is effective when there are many classes but few samples per class—exactly the scenario for image search.

Approximate nearest neighbor (ANN)

As with text search, scanning everything and computing similarities at query time does not scale. Practical systems must respond quickly and also search large corpora to provide convincing results.

Thus we use approximate nearest neighbor (ANN) techniques, which trade some accuracy for much better speed. Like inverted indexes for text, ANN builds an index ahead of time.

Representative ANN libraries include Annoy, NMSLIB, Faiss, and NGT. These play a role similar to Lucene in text search.

ANN involves trade‑offs among accuracy, speed, and memory. Choose methods based on data size and latency requirements. For an overview, see Prof. Matsui’s slides “Frontiers of Approximate Nearest Neighbor Search” (Japanese).

Closing

This post outlined how practical similar image search systems work.

As a side note, I previously published a book on similar image search at a tech fair (in Japanese) and plan to release an updated edition in the Izumi Books series. The book walks through implementing an image search system and dives deeper into metric learning and ANN.

Lately I’ve been studying generative models to build something with them. This post explains the basics of a foundational approach—Generative Adversarial Networks (GANs)—and then walks through a simple implementation with TensorFlow.

Since GANs are a kind of generative model, let’s briefly review generative models first.

What is a generative model?

A generative model outputs data such as photos, illustrations, or music.

Machine learning is often used for tasks like recognizing what’s in an image, classifying true/false, or predicting numeric values—these are typically discriminative tasks. For binary classification, the output is a label like 0 or 1.

Generative models differ: they take a random vector as input and output data (e.g., an image).

Image

They literally create data that didn’t exist before—very “AI‑like”.

What is a GAN?

Among generative models, GANs have been particularly successful in recent years.

Modern GANs combine generative modeling with deep learning to produce high‑fidelity outputs. For example, here is an image generated by StyleGAN—it’s remarkably photo‑realistic.

Image
Source: https://en.wikipedia.org/wiki/StyleGAN

GAN basics

GANs pair two models: a generator, which produces data, and a discriminator, which classifies real vs. fake. They are trained together.

The discriminator learns to tell real data (label 1) from generated (label 0):

Image

The generator takes a random vector and is trained through the discriminator so that its outputs are judged “real”. In other words, the generator learns to produce images that fool the discriminator.

Image

Important: when training the generator, do not update the discriminator’s parameters. We want the discriminator to remain a strict judge; if it changes during generator training, it becomes too lenient.

Implementing a GAN with TensorFlow

We’ll implement a DCGAN in TensorFlow. Training takes time, so use a GPU if possible. Google Colaboratory provides free GPUs:

We’ll proceed as follows:

  • (1) Load data
  • (2) Build the generator
  • (3) Build the discriminator
  • (4) Compile the model
  • (5) Train the model

The runnable notebook is published here:

(1) Load data

Use the MNIST dataset that ships with TensorFlow. The tensors come as (60000, 28, 28); reshape to (28, 28, 1) for the model.

from tensorflow.keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

MNIST is 0–255 ints; scale inputs to [-1, 1] for stable training:

x_train = (x_train - 127.5) / 127.5
x_test = (x_test - 127.5) / 127.5

(2) Build the generator

Using Keras’s Functional API:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, \
   Dense, Conv2D, LeakyReLU, BatchNormalization, \
   Reshape, UpSampling2D, Activation

Constants:

kernel_size = 5
noize_vector_dim = 100

We want 28×28 outputs, so choose an input spatial size that doubles cleanly via upsampling layers.

Upsampling repeats rows/columns to enlarge feature maps.

... (Define generator and discriminator architectures and training loop as in the notebook.)

Summary

We reviewed how GANs work and implemented a minimal DCGAN in TensorFlow. With a GPU and enough iterations, you should see the generator progressively produce more realistic digits.

Showing 11 - 20 of 27