ENTS (Extendable Nested Tagging Schema)
One of the best pieces of advice I’ve ever gotten was “follow whatever you are interested in”.
About a year ago I started using Obsidian but I was bothered by the limited tagging systems in it and how there were no nested tags - there were add-ons for it but it wasn’t native and I came to find I preferred more minimalistic CLI/UNIX inspired solutions anyways. Implications for researchers who have to comb through a lot of local files trying to find connections between disparate fields. Common file systems including (EXT-4, NTFS, exFAT, APFS) treat tags as flat labels, but this limits hierarchical tag organization and nuanced queries. So, I made the Extendable Nested Tagging Schema, or ENTS.
ENTS is an implementable YAML and command schema. When implemented, the “S” in “ENTS” stands for “System”. Currently it comes in two implementations, MENTS (Metal ENTS) and LENTS (Local ENTS).
ENTS Schema
Tags Format in YAML
Example:
Organic Chemistry:
- Medicinal Chemistry
- Polymer Chemistry:
- Plastic Materials
- Carbon Nanomaterials:
- Carbon Nanotubes
- Graphene
- Petrochemistry
Inorganic Chemistry:
- Coordination Chemistry
- Organometallic Chemistry
Commands
A generic xents command will be used here as an example
implementation.
Usage:
xents <command> <operator> <arguments>
ENTS commands use inverse function mapping - one to many, not many to one. This is chosen so the first argument of one type can have many arguments (that are of a different type than the first and are all of a similar type) after it
process
Takes a YAML file and turns it into a .lent file
- Usage:
xents process file.yaml - no additional operators or arguments
tagtofiles
Usage:
xents tagtofiles <add|remove|show> <arguments>*
Description: Manages relationships between one tag and many files
Alias: ttf
add- Usage:
xents tagtofiles add <tag> <files>* - Description: creates links from one tag to 1 files
- Usage:
remove- Usage:
xents tagtofiles remove <tag> <files>* - Description: removes a link one from one tag to 1 files
- Alias:
rm
- Usage:
show- Usage:
xents tagtofiles show <tags>* - Description: shows all files linked to each tag inputted. can take more than one tag as input
- Usage:
filetotags
Usage:
xents filetotags <add|remove> <arguments>*
Description: Manages relationships between one file and many tags
Alias: ftt
add- Usage:
xents filetotags add <file> <tags>* - Description: creates links from one file to 1 tags
- Usage:
remove- Usage:
xents filetotags remove <file> <tags>* - Description: removes links from one file to 1 tags
- Alias:
rm
- Usage:
show- Usage:
xents filetotags show <files>* - Description: shows all tags linked to each file inputted. can take more than one file as input
- Usage:
filter
Usage: xents filter [-flags] <tags>*
Description: Returns files associated with tags. The main command of
MENTS, where the nesting comes into play. Differentiated from the
tagtofiles command, where the nesting does not come into
play.
- Nested:
- Usage:
xents filter <tags> - Description: the default mode.
- Example:
xents filter "organic chemistry"will return everything that is tagged with something within organic chemistry. If a file is tagged only with “carbon nanotubes”, it will be returned because “carbon nanotubes” is a sub-tag in organic chemistry
- Usage:
- Explicit
-e- Usage:
xents filter -e <tags> - Description: will only return a file if it is explicitly tagged with it
- Example:
xents filter -e "organic chemistry"will return a file marked with “organic chemistry,carbon nanotubes” but not one with just “carbon nanotubes”. - Note: Uses the same implementation as
xents tagtofiles show <files>*and is included here with-etags for mnemonic purposes
- Usage:
- Traverse Down by Count
-td- Usage:
xents filter -td <count> <tags> - Description: Returns files only if explicitly tagged with a tag a certain number of levels removed from the query tag.
- Example:
xents filter -td 2 "organic chemistry"will return everything explicitly tagged with any tag between 0 and 2 layers down from “organic chemistry”. In this example, files tagged with “carbon nanomaterials” will be returned but files tagged with only “carbon nanotubes” will not. - Note: Can be combined with Traversal Down by Increment
-tu <count> -td <count>. For examplexents -tu 1 -td 2will traverse one node up the tree then return everything recursively down two layers from that.
- Usage:
- Traverse Up by Count
-tu- Usage:
xents filter -tu <count> - Description: Recursive by default.
- Example:
xents -tu 1 "organic chemistry"will traverse one node up the tag tree then return everything recursively down from that - Note: Can be combined with Traversal Up by Increment
-td <count> -tu <count>
- Usage:
MENTS (Metal ENTS)
MENTS is a close-to-the-metal implementation of ENTS written in C with a custom binary file format. It is the most hackerish version of ENTS, has the smallest database size, and was made as a low-level learning project. It is not very extendable and the amount of tags possible is limited, so LENTS is also an option.
The MENTS file format
- Terminology and stylistic choices
- Some set theory will be used in combination with colloquial English because it is the best way to communicate some things clearly and unambiguously
- Hex-line: refers to a standard 16-byte row in a hex dump
- Metadata: is a common term meaning “data about the data” used here.
- Ipsodata: contrasted with metadata. “Data that is
the data itself”. Commonly used English terms for “the data itself” such
as {“core data”, “primary data”, “source data”, “raw data”} do not
capture what I want to mean here, the data itself which is
opposed to the data-about-the-data. And so the Latin
root “ipso”, meaning “itself”, is used as a prefix to “data”.
- Offset: When offset is said here, I am referring to only the length of the offset and not the content of the offset.
- Over-arching design choices for MENTS
- MENTS is made for a nested tagging system for tree tag hierarchies. Each tag has a parent (except for the top level tags) and can have children.
- MENTS is designed for a multi-user session many simultaneous query database
- The file format is designed for human-readability in hex dump
- No existing solutions accomplish any of these points
- MENTS is designed for quick lookup for children tags for a multi-user-session database. This means hash table lookup (O(1) complexity) as opposed to tree traversal or linear traversal (O(N) complexity).
- File Header (32 Bytes, or 2 hex-lines)
- MENTS magic number in unicode
- Version number
- Tags quantity denoter
- Tags Metadata (4a)
- Tags Metadata = {Tag metadata individual length indicator, Tag meta/ipsodata individual offset indicator, Tag name}
LENTS (Local ENTS) - Planned Version
LENTS (Local ENTS) is an implementation of ENTS written with individual researchers and small research teams in mind. It will likely be a wrapper around Refsplitr and PartiQL/DynamoDB Local, which allows for more customizable queries.
- Refsplitr - “refsplitr is a package designed to assist researchers
dealing with bibliometric data by providing tools for author name
disambiguation, author georeferencing, and coauthorship network mapping
using data from the Web of Science”1
- Author disambiguation
- Support for multiple authors per file
- Author/Institution timeline mapping
- Institution disambiguation
- PartiQL - “An expressive, SQL-compatible query language giving
access to relational, semi-structured, and nested data.”2
- allows for more custom queries, including queries with the additional information above
- Potential Implementations: Metadata Extraction Tools
- ExifTool by Phil Harvey - “ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files.”
- Apache Tika - “The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.”
- Visualization tools