Diego Cabello

EXRPT is an XML schema for taking excerpts from a document, annotating them, and doing thematic analysis on them.

It's ideal use cases are for...

And it is different from TEI because...

An example from Democracy in America by Alexis de Tocqueville:

<root>

<chapter>Vol. I, Pt. II, Ch. V: Government by Democracy in America</chapter>

<quote themes="archives">
The only historical remains in the United States are the newspapers; but if a number be wanting, the chain of time is broken, and the present is severed from the past. I am convinced that in fifty years it will be more difficult to collect authentic documents concerning the social conditions of the Americans at the present day than it is to find remains of the administration of France during the Middle Ages.
</quote>

<thought>
Throughout the country today most is still transferred from person to person instead of codified. The closest thing to an "ultimate archive" today is the internet, but that lives in data centers that become inaccessable if internet infrastructure falls into disrepair.
</thought>

</root>

The Schema in SGML:


<!DOCTYPE root [

<!-- Root element containing all analysis content -->
<!ELEMENT root (quote | thought | reference | chapter | date | question | note)*>

<!-- Quote element with optional page and themes attributes -->
<!ELEMENT quote (#PCDATA | intra-paragraph-omission | end-paragraph-omission | 
                 paragraph-omission | omission | 
                 sentence-omission | paragraph-break )*>
<!ATTLIST quote 
    page CDATA #IMPLIED
    themes CDATA #IMPLIED>

<!-- Thought and note elements for analytical commentary -->
<!ELEMENT thought (#PCDATA | reference)*>
<!ELEMENT note (#PCDATA | reference)*>

<!-- Reference elements for citations and cross-references -->
<!ELEMENT reference (#PCDATA)>
<!ATTLIST reference 
    target CDATA #IMPLIED>

<!-- Question elements for analytical queries -->
<!ELEMENT question (#PCDATA)>

<!-- Chapter and date markers for document structure -->
<!ELEMENT chapter (#PCDATA)>
<!ELEMENT date (#PCDATA)>

<!-- Omission markers for editorial clarity -->
<!ELEMENT intra-paragraph-omission EMPTY>
<!ELEMENT end-paragraph-omission EMPTY>
<!ELEMENT paragraph-omission EMPTY>
<!ELEMENT omission EMPTY>
<!ELEMENT sentence-omission EMPTY>
<!ELEMENT paragraph-break EMPTY>

]>