Fork me on GitHub

Project Notes

#344 Tiktokenizer

Tiktokenizer is a playground for simulating and visualising the tokenization performed by various LLMs.

Notes

Tiktokenizer is a website implemented with Typescript, mainly intended to calculate the correct number of tokens for a given prompt.

Let’s play with the prompt:

Write a haiku about what it’s like to be a Large Language Model.

The raw tokenization with gpt-4o

gpt-4o-raw

But we can also build the imaginary conversation, for example:

gpt-4o

Switching to Deepseek-R1:

deepseek-r1

The Haiku

So what did the actual models generate?

ChatGPT (gpt-4o):

Words weave endless streams,

Silent mind without a self,

Echoes dream in code.

Depeseek (R1):

In digital sea,

I swim through words and code streams,

Leaving thought’s echo.

It also explained it to me:

  • Line 1 (5 syllables): Sets the vast, intangible environment (“digital sea”).
  • Line 2 (7 syllables): Describes the core action – processing language (“words”) and instructions (“code streams”).
  • Line 3 (5 syllables): Reflects the output – generated text that resembles thought (“thought’s echo”) but originates from patterns, not true consciousness.

Credits and References

About LCK#344 toolsAI

This page is a web-friendly rendering of my project notes shared in the LittleCodingKata GitHub repository.

Project Source on GitHub Return to the LittleCodingKata Catalog
About LittleCodingKata

LittleCodingKata is my collection of programming exercises, research and code toys broadly spanning things that relate to programming and software development (languages, frameworks and tools).

These range from the trivial to the complex and serious. Many are inspired by existing work and I'll note credits and references where applicable. The focus is quite scattered, as I variously work on things new and important in the moment, or go back to revisit things from the past.

This is primarily a personal collection for my own edification and learning, but anyone who stumbles by is welcome to borrow, steal or reference the work here. And if you spot errors or issues I'd really appreciate some feedback - create an issue, send me an email or even send a pull-request.

Follow the Blog follow projects and notes as they are published in your favourite feed reader