The Nomenclator

Date: 2025 Jun 08

Words: 970

Draft: 1 (Most recent)

THIS POST IS AN EXPLORATORY POST

the emergent problem

with recent technological advances of the 2020s, the amount of research that researchers have had to pour through has never been higher.¹ at the same time, certain stand-alone broader fields like physics have hit “dead ends” and haven’t had any major breakthroughs; while cross-domain researchers and programs have become of paramount importance for the emerging fields of the 2020s.² the newer coding agents of the 2020s like AlphaEvolve³have yielded cross-domain breakthroughs and shown what can be possible with “agentic”-computer-assisted human research teams. cross-domain breakthroughs are increasingly becoming the main driver of academic progress in the 2020s.

one of the main obstacles that actively prevent cross-domain breakthroughs is ambiguous terminology across disciplines. terminological chaos is becoming a bigger bottleneck. And if computer agents are getting better at cross-domain synthesis, humans need better coordination tools to keep up. so, to unlock the next steps in academic research methodology; to build even better computer systems such as AlphaEvolve; and also to keep humans on the same edge as these computer systems, resolving these ambiguities is crucial. this brings up the necessity for a nomenclator

proposal

the nomenclator is a proposed human committee of academic leaders across disciplines to the purpose of naming new concepts and ensuring that future terminologies do not conflict.

further on the problem

researchers waste enormous amounts of time figuring out if they’re talking about the same concepts when they use words like “complexity”⁴ or “emergence.” sometimes apparent disagreements are actually just people using the same word for completely different things. sometimes genuine insights get buried because they don’t have clear terminology to express them.

there are a lot of foundational concepts floating around academia that are genuinely irreducable - things like “monad,” “quark,” “axiom,” “datum” - but they’re scattered across fields without systematic organization. meanwhile, people keep borrowing everyday english words (“turtle,” “patch,” “observer” in netlogo) to describe fundamental computational concepts, creating namespace collisions and conceptual confusion - this poor terminology and unusable concepts from netlogo⁵ produced about 20 years of research into “agents” which has largely been useless for 2020s agentic usecases.

the nomenclator would systematically identify these irreducable concepts and give them proper names, while cleaning up terminological chaos in existing fields.

precedence

it would be similar in function to:

Academic Organizations – IUPAC, International Astronomical Union, the International Committee on Taxonomy of Viruses, Mathematical Subject Classification (MSC)
Language Authorities – Académie française, Real Academia Española, Icelandic Language Committee, Pontifical Academy for Latin
Standards Bodies – IEEE, ISO
Computer Standards bodies + conventions: IETF, Unicode Consortium, POSIX, language specs

naming the naming committee

nomenclator is the name of the ancient roman officials who assigned names

nomenclator organization and structure

the nomenclator human committee would use a codified systematic criteria and a monolith computer program to decide on importance of new concepts to be named.

governance and procedures

~160 members total. small enough that everyone knows everyone (dunbar number considerations), large enough for field representation.

core committees:

maintainer committee – handles the open source monolith system
linguistics and cognitive science committee – phonetics, semantics, historical linguistics expertise, how language shapes thought, terminology adoption patterns
field representatives – rotating 6-year terms, ~5-7 seats per major discipline, elected by professional societies (aps, acs, etc.)

naming criterion

category 1: compound existing words for concepts that can be naturally represented with them (e.g. machine learning, social network)
category 2: classical roots for important concepts or breakthroughs (e.g. genome, cybernetics, meme)
category 3: entirely new phonemes for foundational breakthroughs (e.g. quark, qubit)

pure anglosphere participation for now - coordinating across languages adds massive complexity. can expand internationally later.

the monolith

the monolith is a computer program that will be open source with a core maintainer committee handling the algorithmic infrastructure. the monolith would also be used to determine the importance of proposed new terminologies, and used to find unused phonic space for category three terms.

algorithmic importance calculation:

network analysis in the field (co-citation patterns)
citation velocity for concepts and authors
nlp vector similarity search to map conceptual relationships
automated detection of terminological conflicts

human oversight for edge cases:

appeals process for contested decisions
evaluation of concepts the algorithm can’t handle
final approval for category 3 (new phoneme) assignments

application process for new terms

researchers submit concepts that need terminology through formal application:

provide evidence of conceptual importance (citations, cross-field usage)
demonstrate existing terminological confusion or gaps
submit to algorithmic evaluation + human review
appeals process for contested decisions

necessity

this is a one-shot decision with permanent consequences. once a nomenclator exists and gains authority, it becomes the standard. any competitor would be like the avignon papacy - a schism that weakens the whole system.

the governance structure choice is permanent and shapes academic discourse forever. there won’t be a “nomenclator 2.0” if the first implementation has flawed governance - the network effects and institutional adoption make the initial design irreversible.

this means getting the governance structure right immediately is crucial. too democratic → paralysis and bikeshedding. too small → vulnerability to getting hijacked. the balance has to be perfect from day one.

most “impossible” coordination problems are actually just missing infrastructure. once you build the right tools, the “impossible” stuff becomes trivial.

arxiv has been rising in popularity. this link could just illustrate the rise of arxiv’s popularity, but i use it here to illustrate the volume individual researchers are consuming is rising. arXiv Monthly Downloads ↩︎
On the importance of AI research beyond disciplines ↩︎
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms ↩︎
A previous discussion I the author was part of about “complexodynamics” spent most of the time clearing up term ambiguities. https://scottaaronson.blog/?p=762 ↩︎
NetLogo: Design and Implementation of a Multi-Agent Modeling Environment ↩︎

<<<Back to Blog

Diego Cabello