Notes from Building a Japanese Dictionary: Why Intermediate Vocabulary Learning Breaks Down

You finished a textbook. You can read NHK News Web Easy. You can sort of follow a manga you have already seen the anime of. Then you pick up something you actually want to read, and the vocabulary curve goes vertical. Frequency lists run out. The shared decks online cover words you already know and miss the ones you keep tripping over. The textbook tradition was not built for this part of the journey, because by this point the textbook has handed you off and quietly stepped back.

That moment is the one I keep designing immit for. My day job is the dictionary side of the product, and the strange privilege of that work is getting to look at the words intermediate learners actually need, and noticing how often the standard ways of teaching them are not quite the shape of the problem. This post is some of what I have been thinking about while we build.

Why does Japanese language vocabulary get harder at the intermediate level?

Japanese vocabulary plateaus at the intermediate level for three reasons. The textbook tradition runs out of curated lists around N4 to N3, just as the reading material the learner wants to engage with assumes the next five thousand words. Japanese uses three writing systems and depends heavily on context, so isolated word lists transfer poorly to real reading. And the workflow most learners assemble (a popup dictionary in one tab, an SRS app in another, example sentences in a third) fragments the loop right when continuity matters most. The short answer to how to learn Japanese vocabulary past this point is to combine three methods: spaced repetition (an SRS system that schedules reviews at the right intervals) for memorization, context-rich immersion for retention, and thematic grouping to make new vocabulary connect to what is already there.

Why Japanese language learners hit a steeper vocabulary curve

The first few thousand Japanese words come in a structured drip. Greetings, particles, the verbs you need to introduce yourself, the kanji that show up on day one, the basic vocabulary that covers daily life. Every textbook agrees on roughly what those basic words are, and every textbook in the traditional method gets you there.

The next five thousand are different. They are not in a single curated list. They live in the books, articles, manga, and video subtitles you want to read, and the order you meet them in depends entirely on what you decide to read. A common reading-comprehension goal cited in the immersion community is around two thousand jōyō kanji and six to seven thousand vocabulary words, the threshold at which reading speed catches up with comprehension and you can actually study Japanese by reading Japanese. The best textbooks (Genki or Tobira or whichever you used) cover a small fraction of that. The wall is the gap.

Japanese makes the gap harder than it would be in, say, Spanish. Japanese uses three writing systems and relies heavily on context, which is why standard memorization lists are less effective for Japanese than for languages that have a single writing system and predictable spelling. A word you can pronounce by reading the kana might still hide a kanji compound you cannot decode. A word in your vocabulary list might be the one your reading source uses with a slightly different connotation. The list is not wrong, exactly. It is just not the shape of what reading asks for.

Active immersion with authentic Japanese content (manga, news, novels, video with subtitles) is the documented community consensus for how Japanese learners get past this point. The mechanism is the one every immersion guide describes: you encounter a word in a real sentence you care about, you look it up, you keep going, and at some point the word becomes a word you know. The tool stack that supports this loop is usually a popup dictionary app plus an SRS plus some way of harvesting example sentences. The pieces work. The seams between them are where the friction lives, and bridging those seams is most of what makes the difference between a learning style that compounds and one that stalls.

What I think about when I design a dictionary entry

When you spend a lot of time looking at words people are about to look up, you start to notice the same patterns recurring. None of these are big revelations. They are just the texture of designing a dictionary that intermediate learners use.

The entries that look simple are the ones that need the most thought

The hardest entries to design are not the rare kanji compounds with one specialised meaning. Those are easy: there is one definition, you write it down, you move on. The hard entries are the common verbs and adjectives that look obvious until you try to write a clean English translation for them.

A word like a common motion verb, or a common adjective for something close to "good," or a particle that is technically optional. The dictionary tradition has been collapsing the nuance of these into a single English gloss for a long time, because a gloss has to fit on a line. The intermediate learner reading a real sentence then arrives at a moment where the gloss is true but not enough. They know what the word means in some technical sense, and they still cannot quite tell why the sentence reads the way it does.

The design work, the part that is interesting to me, is the layering. What is the core meaning. What are the two or three contexts where that meaning shifts. What is the example sentence that makes the shift visible. None of this is mysterious. It is just careful, and it has to be done one entry at a time.

Multiple readings, multiple contexts

Kanji characters often have multiple readings, commonly grouped as Kun-reading (Kun'yomi) for native Japanese words and On-reading (On'yomi) for words of Chinese origin. Which reading applies depends on the surrounding word. A learner who has memorised the readings in isolation still has to learn, sentence by sentence, which reading the context wants.

A learner who has memorized the readings in isolation still has to learn, sentence by sentence, which reading the context wants. This is also why vocabulary cannot be fully separated from learning Japanese grammar: particles, verb forms, and sentence structure often decide which meaning makes sense. And because immit lets you add your own notes to each word, you can tie original example sentences, mnemonics, or other reminders directly to the card so it matches your learning style.

The standard advice for someone just starting kanji is to start early, because learning kanji early in the process aids both vocabulary acquisition and grammar reading later on. This is the territory where WaniKani has set the standard. WaniKani uses a spaced repetition system with kanji-focused flashcards to teach kanji readings and meanings, and the immersion community has been recommending it for years as a good starting point to get the kanji foundation right.

immit and WaniKani are solving different problems. WaniKani teaches the kanji curriculum from scratch with a structured progression: this kanji, then this kanji, with mnemonics and radicals to make the readings stick. immit's job starts after that, when you have enough kanji to read real material and you need a dictionary that surfaces the right reading and the right meaning for the sentence in front of you. The two work in sequence, WaniKani then immit, not as same-time companions.

What I notice when I design entries is that the reading the intermediate learner most needs is not always the most common reading. It is the reading appropriate to the context they encounter. The dictionary has to make that visible without burying it in a long list of every reading the kanji can take.

Where the dictionary stops and the learner steps in

Even the best dictionary entry cannot supply situational nuance. When this word would feel out of place in polite Japanese. When a near-synonym would land more naturally. When the word carries a register that the gloss does not. That is the work the learner has to do, through reading and listening to native speakers and, eventually, through producing Japanese themselves.

What the dictionary can do is shorten the distance to that work. A good entry gets the learner from "I have no idea what this means" to "I have a working hypothesis I can refine" as quickly as possible. The native-speaker calibration is something the learner builds. The dictionary is a starting point, not a substitute.

Three methods for language learners building Japanese vocabulary

The three methods below are the ones the JP-learning community has converged on, and they are also the three the design of immit is built around. They are not new. The combination is what matters.

1. Spaced repetition: the structured approach most Japanese learning apps rely on

Spaced repetition systems improve memory retention by increasing the intervals between reviews, an effect aligned with the spacing principle from cognitive psychology. Using an SRS system is widely recognised as an effective method for memorising vocabulary, because the schedule reviews each word at the optimal interval for long-term retention. Research on the spacing effect suggests that information is recalled more reliably when learning sessions are distributed over time rather than massed in a single session.

For vocabulary, this matters because the work of remembering five thousand new words cannot happen in a single concentrated push. It has to be metered out over months, with each word reviewed at the moment just before it would have been forgotten. Anki has been the standard tool for this for years. It works. The friction is the setup, the deck maintenance, the AnkiConnect plumbing, the daily review pile that grows when reading volume outpaces card retirement.

Most Japanese language learning apps now integrate spaced repetition in some form, because the method is well-established and the implementation has become routine. The differences between apps are less about whether they use SRS and more about what surrounds it: how words enter the queue, how the cards are formatted, whether the app sits next to your reading or separately from it.

The design choice we made with immit is to ship the SRS as a built-in 8-stage schedule that runs automatically on every word the learner saves from the popup. No deck configuration, no add-on installation, no separate app. The trade-off is real: Anki gives you maximum control, immit gives you less control in exchange for less overhead. Whether that trade-off is right for you depends on whether the overhead has been the thing stopping you. If you would like to try the lookup-and-save loop in one tool, immit is free on Chrome and as a desktop app for Mac, Windows, and Linux.

2. Context-rich immersion: where learning Japanese grammar and vocabulary connect

Reading low-stakes materials like graded readers helps learners encounter words naturally, without the overwhelm that comes from frequency-list-only study. Learning vocabulary inside complete sentence patterns supports both grammar acquisition and natural context recall. Daily immersion and practice with native speakers reinforces retention through use rather than recall.

Using example sentences in flashcards makes vocabulary learning more natural and contextual. This is the part of the immersion loop that is easy to underrate. A word reviewed in isolation is a word reviewed against your memory of a definition. A word reviewed with the sentence you encountered it in is a word reviewed against a moment, an image, a small emotional charge.べn

What I notice when I design the lookup-to-save loop is that even when the card itself stores the word only, the context of the lookup travels with the learner's attention. You remember where you were, what you were reading, why you wanted to know. The card is a placeholder for that moment, not a replacement for it. The design intent is to preserve the moment as much as possible by keeping the lookup, the save, and the review inside one tool.

3. Thematic grouping: a useful method beyond beginner word lists

Grouping vocabulary by themes rather than by isolated words enhances retention and retrieval, because thematic clusters give the brain more handles to connect new words to existing ones. A list of fifty random words is fifty unrelated memory items. The same fifty words organised by a theme (cooking, weather, office work, a single novel's recurring vocabulary) is a set of clusters where each word reinforces the others.

Dictionaries and context-based resources like Jisho and Tatoeba are widely used in the JP-learning community to provide example-sentence context for thematic vocabulary work. The two-tool version of the loop is: look up in Jisho, search Tatoeba for example sentences, paste back into your SRS. It works, and it is what most immersion stacks have been doing for years.

What I keep thinking about when designing for this is that the most useful vocabulary cluster is the one the learner builds from texts they personally read. A list pulled from a textbook is someone else's clustering. A list pulled from a novel you read last month is yours, and your brain knows where each word came from. Starting from high-frequency words is still a reasonable first cut. The vocabulary that sticks is the vocabulary that connects to a theme you care about.

What this post does not cover: grammar, speaking, JLPT, and other resources

This post is about reading-side vocabulary acquisition at the intermediate level. It is not the right resource for several adjacent problems.

Grammar. The intermediate grammar curve is its own subject. Bunpro and Renshuu both do structured grammar SRS well, and they are the tools to reach for when grammar is the bottleneck. immit and grammar SRS work together; the substitution frame is between immit and the Yomitan-plus-Anki vocabulary stack, not between immit and grammar tools.

Pronunciation, pitch accent, speaking practice, listening drills. These are different surfaces with different tools. immit's popup currently does not show pitch accent (Yomitan covers this with the appropriate dictionary). Speaking practice is something the immersion stack does not solve directly; it usually involves output practice, language exchange, or one-on-one tutoring outside the vocabulary loop.

JLPT prep. The JLPT exam (Japanese Language Proficiency Test) is divided into five levels, with N5 the most basic and N1 the most advanced, and learners use it to gauge their proficiency in the language. The standard advice for preparing for the JLPT is to combine textbooks, language learning apps, and practice tests tailored to the level you are targeting, and to run full-length mock tests so the format and timing become familiar. JLPT level thresholds are commonly cited as N5 through N1, though the Japan Foundation does not publish official kanji-count or vocabulary-count thresholds; the community estimates that circulate online are useful as rough goalposts, not as syllabi. If your goal is to pass a specific JLPT level on a specific date, the right resource is JLPT-specific practice materials, not a general vocabulary post.

Audio exposure (podcasts, music, drama) supports pronunciation familiarity and context recognition, which is adjacent to the lookup-to-save loop but not the primary surface this post covers. The vocabulary you absorb passively from audio is real; the vocabulary you actively capture from reading is what this post is about.

How immit works as a dictionary app for Japanese language learning

immit is a popup Japanese dictionary app with built-in spaced repetition, available as a Chrome extension and a desktop app for Mac, Windows, and Linux. The free tier covers lookup, save, and SRS review with no account required, and works offline. Pro at $9 a month, $108 a year, or $299 one-time adds multi-device sync, cloud backup, and dark mode.

The reason the dictionary, the SRS, and the save action live in the same tool is the design observation in §1: the seams between tools are where the friction lives. The Yomitan-plus-Anki-plus-AnkiConnect stack is the workflow most immersion learners assemble piece by piece, often after evaluating several Japanese language learning apps and deciding none of them cover the whole loop. immit ships it as one tool. That is a design choice about workflow continuity on the intermediate study path, not a claim about what users do.

The audit engine I mentioned earlier is an internal tool we built to review and correct dictionary entries based on user reports. It is not a feature the user sees directly. What the user sees is the result, which is cleaner entries over time. The submit-a-fix button inside the popup is the user-facing side of that loop. The plan is to keep refining the dictionary entry by entry, with the audit engine as the long-term mechanism that makes the dictionary better the longer it has been used.

The design is still early. There are entries that need more work, readings the intermediate learner needs that we have not surfaced as cleanly as we would like, contexts where the right English gloss is still a single line when it should be three. Saying this in a blog post feels strange because the convention is to project finished confidence. The honest version is that the dictionary is being built for the wall this post describes, and the build is ongoing.

If you want to try it, the extension is free on Chrome, no account, works offline. The desktop app pairs with it for everything outside the browser. We would rather you try the free version and decide whether the loop feels right than be sold to.

FAQ

Why does Japanese vocabulary get harder at the intermediate level?

Japanese vocabulary plateaus at the intermediate level because the textbook tradition runs out of curated lists around N4 to N3, while the reading material learners want to engage with assumes the next five thousand words. Japanese also uses three writing systems and depends on context, so isolated word lists transfer poorly to real reading. The wall is the gap between what the textbook covered and what your reading source assumes.

How do you build a Japanese vocabulary efficiently as an intermediate learner?

Combine three methods: spaced repetition to schedule reviews so words are recalled just before they would be forgotten, context-rich immersion (reading and watching real Japanese material) so the words you study are the ones you actually encounter, and thematic grouping so new vocabulary connects to what you already know. The most efficient version of this loop is one where lookup, save, and review happen in the same tool, so the workflow does not fragment.

Is it better to learn Japanese vocabulary from frequency lists or from immersion lookups?

Both. Frequency lists are a reasonable first cut to get to the threshold where you can start reading native material at all. Once you are reading, the most useful vocabulary list is the one assembled from texts you personally engage with, because each word comes with the memory of where you met it. The two approaches stack: frequency for the foundation, immersion lookups for everything after.

How many Japanese words do you need to know to read a novel or watch anime without subtitles?

A common reading-comprehension goal cited in the immersion community is around two thousand jōyō kanji and six to seven thousand vocabulary words. The actual numbers vary by source and by the kind of material; a slice-of-life manga and a hard-boiled detective novel will demand different vocabulary even at the same word count. The estimates are useful as rough goalposts, not as exact thresholds.

What is the difference between recognition vocabulary and active vocabulary in Japanese?

Recognition vocabulary is the set of words you can understand when you encounter them in reading or listening. Active vocabulary is the set of words you can produce yourself in speaking or writing. For most intermediate immersion learners, recognition vocabulary grows much faster than active vocabulary, because reading and listening practice are higher-volume than speaking and writing practice. This is normal. Active vocabulary catches up later, through output practice.

Do you need to learn kanji separately from vocabulary, or do they come together through lookups?

Both paths work, and the immersion community has converged on a hybrid. A structured kanji foundation (often via WaniKani) gets you to the point where you can read at all. After that, kanji and vocabulary acquisition merge: the words you look up reinforce the kanji they contain, and the kanji you already know let you guess at words you have not seen. The pure-lookup approach without a kanji foundation tends to stall, because each new compound has too many unknowns.

What is different about immit's approach to building a Japanese dictionary?

immit's dictionary is designed for the intermediate learner who is reading native material and needs the right reading and the right meaning for the sentence in front of them, not a list of every reading the kanji can take. An internal audit engine reviews entries based on user reports, so the dictionary gets cleaner over time. The lookup, the save action, and the spaced repetition review all happen inside the same tool, so the workflow does not fragment between a dictionary tab, an SRS app, and an example-sentence corpus. The free tier covers this end to end with no account required.