LENTS (Local Extendable Nested Tagging System) is an
implementation of ENTS written in C that turns input from a
.ents
file format and turns it into a custom binary file
format .led
it uses as a database. It is the most hackerish
version of ENTS, has the smallest database size, and was made as a
low-level learning project. It is also named for the liturgical season
it was started in, Lent.
Disambiguation
lents
is the name of a computer program invoked from the
terminal that is an implementation of the ENTS file-format and command
schema. It takes human-writable .ents
text files, which is
specified by the ENTS specification. It outputs and uses for tag/file
relationship filtering the .led
format, which stands for
LEnts Data.
LENTS usage
process
The output for lents process tags.ents
will always be to
a .led
hidden dot-file. This output cannot be changed, for
the sake of parsing.
The LED file format
- Terminology and stylistic choices
- Some set theory will be used in combination with colloquial English because it is the best way to communicate some things clearly and unambiguously
- Hex-line: refers to a standard 16-byte row in a hex dump
- Metadata: is a common term meaning “data about the data” used here.
- Ipsodata: contrasted with metadata. “Data that is
the data itself”. Commonly used English terms for “the data itself” such
as “core data”, “primary data”, “source data”, and “raw data” do not
capture what I want to mean here, the data itself which is
opposed to the data-about-the-data. Uses the Latin root
“ipso”, meaning “itself” as a prefix.
- Offset: When offset is said here, I am referring to only the length of the offset and not the content of the offset.
- Over-arching design choices for LED
- LED is a directed acyclic graph structure where hierarchically organized tags (nodes with parent-child relationships) map to files (destination nodes)
- The file format is designed for human-readability in hex dump
- File Header (32 Bytes, or 2 hex-lines)
- Magic Number, Version Number, Signature
- Offsets
- Start of tags ipsodata
- Start of file-to-tags metadata
- Start of file-to-tags ipsodata
- Filler
Name | Length | Description | Example |
Magic Number, Version Number, Signature | 8 bytes | LED magic number, 3 bytes for version (in hex, not ASCII), DC signature. Version maxxes out at 255.255.255 | Version 0.1.0: 4c45 4400 0100 4443
|
Offsets | 6 bytes | 2 bytes for tags ipsodata (3.2.a), 2 bytes for file-to-tags metadata (3.2.b), 2 bytes for file-to-tags ipsodata (3.2.c). The offset distance is measured in byts (for example, an header offset value of 0x2710 means an offset of 10000 bytes) | 2710 2710 2710
|
Filler | 18 bytes | Will probably be used for future versions of Lents | 0x000000000000000000000000000000000000
|
- Data
Current implementations of LED use a structured a two-level index to two offset lookup tables (tags & file->tags) and the first table (tags) also has tag-to-files relationships. This is the most space-efficient representation for sparse relationships, but as relationships get denser there is a threshold after which it is more space-efficient to use a table encoding. The table encoding will be included in a future version of LED.
- Tags Data
Each tag ipsodata can vary significantly in length, so an offset table is used to save lots of file space at minimal expense of lookup time. All the attributes in the ipsodata can also vary in length from eachother, so a “tag individual ipsodata part length” field is needed in the metadata. Tag ipsodata offset is the distance measured from (3.3.a). The tag UUID is not neccassary for the tag ipsodata, but is included for debug purposes.
- Tags Metadata =
{
Tag Metadata}
- Tag Metadata = {Tag UUID, Tag ipsodata offset}
Name | Size | Description |
Tag Hash UID | 7 bytes | Hash of the tag name |
Space for flags | 1 byte | Unused for now. May be used in future versions of LED while still being backwards compatible |
Tag ipsodata offset | 4 bytes | Value of how far away from start-of-tags-ipsodata (3.3.a) the tag ipsodata is |
- Tags Ipsodata =
{
Tag Ipsodata}
- Tag Ipsodata = {Tag UUID, Tag name, Tag Ancestry UUIDs, Tag Children UUIDs}
Name | Size | Description |
Tag Hash UID | 7 bytes | Hash of the tag name |
Children Count | 1 byte | Counts the number of children this tag has. Used to calculate offset. This gives a max value of 255, but please be more organized and use nesting instead. The only reason it goes up to 255 is for the case of having each country as a tag. |
Tag Immediate Parent UID | 7 bytes | For traversal upwards |
Name Length | 4 bits | Values 0-15. Each bit is multiplied by 4 and then has 4 added to it (for the case of 0 - you cannot have a nameless tag) to get a final value between 4 and 64. Half of the byte with Tag Type Flags |
Tag Type Flags | 4 bits | Future feature to be implemented. Half of the byte with Name Length |
Tag-to-Files relationship lenghts | 2 bytes | Currently has value from 0-2^16. There is space to be expanded for larger systems |
Blank Space | 6 bytes | For storing values in future versions. Also for the sake of visual padding when viewed from a hex dump |
- File-to-tags Relationship Data
The same heuristics about length variation and UUID-inclusion from (4) apply here. File-to-tags metadata starts at (3.3.b) and file ipsodata offset is the distance measured from (3.3.c).
- Files Metadata =
{
File Metadata}
- File Metadata = {File UUID, File individual ipsodata part length, File ipsodata offset}
- Files Ipsodata =
{
File Ipsodata}
- File Ipsodata = {File UUID, Tags UUIDs}