Epistylometry

Date: 2025 Aug 07

Words: 1563

Draft: 1 (Most recent)

Tags: socdym

a forger’s skill becomes their signature.

epistylometry, from the greek “epistos” for “knowledge” and “stylos” for “style”, is the knowledge signature of a person. what someone does demonstrates what they know, possibly what they don’t know, and possibly what they are trying to obscure they know. the term borrows from the commonplace word “stylometry”, which is used to describe someone’s writing style based on various metrics.

you know in “breaking bad”, where it should have been obvious to agent schrader that heisenberg was walter white, due to the advanced chemistry knowledge needed to make blue meth? that chemistry is walter white’s epistylometric signature.

this terminology pulls together a few other similar-but-disparately-named terms and concepts from various places (see the need for a nomenclator), including:

in stylometry, analyzing writing patterns to identify authors. this is how they unmask anonymous writers - your word choices, sentence structures, even punctuation habits form a signature
in cyber attacks, security researchers analyze malware techniques, coding styles, and operational patterns to identify threat actors
in art forgery, forgers must study not just technique but the knowledge limitations of the period. anachronistic knowledge reveals fakes.
counterfeiting currency

context and terminology

this framework for epistylometry was written with high-stakes scenarios in mind, where it would be necassary, such as corporate espionage, national intelligence agencies, and cybercrime. a more applicable but medium-stakes scenario is criminal investigations. an example of staff at a restaurant is used because it is a low-stakes environment.

here we use:

alpha for an attacker, who wants to carry out something without it being known they did it
beta for a detective, who is trying to find who carried out an action
gamma for an attacker who is not known from the perspective of any detectives
tau for a skill in a skillset

we also use:

iso, from the greek for “one”, for the first of two comparable agents where order does not matter
allo, from the greek for “other”, for the other of two comparable agents where order does not matter

formal definitions

epistylometric uniqueness

epistylometric uniqueness is the measurement of how disparate your domains of expertise are.

the simple formula for EU is the mean pairwise distance of all your knowledge domains. $EU = \frac{\sum_{(i,j) \in \binom{D}{2}} d(i,j)}{|\binom{D}{2}|}$

where $D$ is your set of knowledge domains and $d(i_1,i_2)$ is the distance between two domains.

someone who uses only {python, javascript, react} in an operation demonstrates minimal EU - those domains are all adjacent. someone who uses {mechanical engineering, louisiana folklore, fashion design} in an operation demonstrates much higher EU because those domains are maximally distant.

now if someone has a bunch of “domains” that are closely related (for example, 20 different programming languages) and then one or two really disparate ones (louisiana culture, clothing design), the mean pairwise distance formula would have a really low result, (lets arbitrarily say 0.1); whereas if you regard all the programming languages together as one domain - which it might as well be - than your EU would skyrocket (say, 0.9).

to deal with this problem best, we use the spatio-parametric Rao’s quadratic entropy (Rocchini, Marcantonio, Re, Bacaro, Enrico, et. al., 2021) formula to adjust for domain proximity

$\alpha$ - influence on other domains (how much something else influences something else) (this is the parameter that gets adjusted)

$EU = \left( \sum_{i,j=1}^{N} \frac{1}{N^2} d_{ij}^\alpha \right)^{\frac{1}{\alpha}}$

when comparing two or more people’s EU, the highest one should always be measured as 1, and the other ones lower.

red teaming

several good strategies for “red teaming” emerge.

epistylometric camouflage

imagine agent iso studying agent allo’s knowledge signature so carefully they can carry out an operation that looks exactly like allo did it. the higher your epistylometric uniqueness, the harder you are to impersonate fully.

when choosing a target for framing someone with epistylometric camouflage, two things must be considered

epistylometric similarity to you
epistylometric uniqueness

the more similar someone’s epistylometric profile, the easier it will be to mimic them.

epistylometric camouflage, if executed well, is most effective on very unique epistylometric profiles (“there is only one person in the world with this combination, so the culprit has to be them”), but it is harder to replicate on them. highly unique profiles are both the most valuable to impersonate and the hardest to fake.

a good heuristic would not be choosing someone who’s EU is too similar to your own, because than it makes you a suspect if epistylometric camouflage is not considered.

if there is someone who’s knowledge base is fairly similar to your own, but not too similar, and who has a very high epistylometric uniqueness (i.e. they know a lot of the things you know, but they also know a couple topics that are way out there), than they would be an ideal camouflage target.

to demonstrate actual expertise, an impostor would have able to counterfeit for low alpha parameter and show enough specific expertise within subdomains or closely related domains. to demonstrate someone else’s signature clearly, an impostor would have to be able to counterfeit very disparate domains, or low alpha parameter.

epistylometric obfuscation

a passive strategy an attacker can take is to obscure what they know, particularly certain techniques. with attacker alpha and skill tau, if no one knows alpha knows tau, than alpha can use tau without being feared of being easily detected. (see)

$K_{\emptyset}(K_{\alpha} \tau)$

epistylometric handicapping

when carrying out a non-crucial attack and not using epistylomtric camouflage, a good strategy an attacker can take is to handicap themselves, and take care to use only commonly known skillsets (no obscure computer programming languages from russia or odd lisp dialects) - using pep-8 styled python, linux styled c. this may limit their capabilities, but also limits the possibility of them being detected.

epistylometric dilution

if an attacker has a some specific skills that compliments the rest of their skillset really well but is somewhat obscure and can be used to identify them, a strategy they could take is broadcasting to many people how to do that skill. then it becomes much harder for detectives to use those particular skills to narrow down their search.

for example, if a restaurant manager tells a few trainees where the key to get into the backdoor is while the owner is present in the conversation, and the next day the manager sneaks into the store and wipes the cash register, then the trainees are also suspect. this reasonable doubt attribution cannot be overcome.

epistylometric honeypot

compliments epistylometric dilution. can also be used in blue teaming.

an attacker or a detective can create and thinly distribute honeypot techniques, which could be faulty techniques that don’t work, or techniques that are easily identifiable, or something else. if an attacker is framing another attacker (maybe to eliminate competition), than they would have to make sure that both the other attacker and a detective know it. when the other attacker uses the honeypot technique, than they are trapped. for a detective, this is a classic setup. is more effective the less it is spread around. (if it’s an intentionally faulty technique, and it is discovered it is faulty, than a bridge channel could broadcast to everyone that is faulty and then no one would use it.)

for example, if a restaurant employee iso wants to get rid of another employee allo, they can show them a video of a hairpin lockpicking technique that only work on some locks and breaks the hairpin inside in other locks, that iso knows doesn’t work on the cash register. it is important that iso makes allo demonstrate that he knows the technique to at least one other person. now if allo attempts to open the cash register and the hairpin breaks, than allo has been epistylometrically honeypotted.

blue teaming

skill matching

if known agent $\alpha$ ’s skillset contains skills $\{A, B, C\}$ , and detective $\beta$ knows that unknown attacker $\gamma$ ’s skillset contains $A, B, C$ , than $\beta$ can suspect $\alpha$ .

${\alpha}_{\text{skillset}} \supset \{A, B, C\}$
$K_{\beta}({\gamma}_{\text{skillset}} \supset \{A, B, C\}), K_{\beta} 1$ , or $K_{\beta}({\gamma}_{\text{skillset}} \cap {\alpha}_{\text{skillset}})$

schelling points

schelling points are natural convergence points where most competent people would arrive at the same solution to a problem, first introduced by the American economist Thomas Schelling in his 1960 book The Strategy of Conflict. A criminologist can use schelling points to narrow down who the culprit is based on what levels of competency the culprit displays.

novice move: obvious approach everyone would try
competent move: the “correct” textbook approach
expert move: elegant optimization most pros would converge on (schelling point for experts)

specific cases using epistylometry

in the unabomber case, Ted Kaczynski’s use of “you can’t eat your cake and have it too” instead of the common “have your cake and eat it too” helped identify him¹
When Isaac Newton solved it Johann Bernoulli’s brachiostone problem anonymously, Bernoulli reportedly said “tanquam ex ungue leonem”, Latin for “we know the lion by his claw”.
in the field of academic attribution, much effort has famously been poured into attempting to identify Satoshi Nakamoto from his epistylometric signature
joe job attacks in cybersecurity - framing someone else by mimicking their digital signature

conclusion

epistylometry is about how knowledge itself creates identity. we are what we know and the unique combination of what we know. it’s not paranoia

http://itre.cis.upenn.edu/~myl/languagelog/archives/002762.html ↩︎

<<<Back to Blog

Diego Cabello