wiki:HemlockProgrammer/RepresentationOfText

Version 5 (modified by gz, 6 years ago) (diff)

--

Back to Table of Contents

2. Representation of Text

In Hemlock, text is represented as a sequence of lines. Newline characters are never stored but are implicit between lines. The implicit newline character is treated as the single character #\Newline by the text primitives.

Text is broken into lines when it is first introduced into Hemlock. Text enters Hemlock from the outside world in two ways: reading a file, or pasting text from the system clipboard. Hemlock uses heuristics (which should be documented here!) to decide what newline convention to use to convert the incoming text into its internal representation as a sequence of lines. Similarly it uses heuristics (which should be documented here!) to convert the internal representation into a string with embedded newlines in order to write a file or paste a region into the clipboard.

2.1. Lines

A line is an object representing a sequence of characters with no line breaks.

linep line [Function]

This function returns t if line is a line object, otherwise nil.

line-string line [Function]

Given a line, this function returns as a simple string the characters in the line. This is setf'able to set the line-string to any string that does not contain newline characters. It is an error to destructively modify the result of line-string or to destructively modify any string after the line-string of some line has been set to that string.

line-previous line [Function]
line-next line [Function]

Given a line, line-previous returns the previous line or nil if there is no previous line. Similarly, line-next returns the line following line or nil.

line-buffer line [Function]

This function returns the buffer which contains this line. Since a line may not be associated with any buffer, in which case line-buffer returns nil.

line-length line [Function]

This function returns the number of characters in the line. This excludes the newline character at the end.

line-character line index [Function]

This function returns the character at position index within line. It is an error for index to be greater than the length of the line or less than zero. If index is equal to the length of the line, this returns a #\newline character.

line-plist line [Function]

This function returns the property-list for line. setf, getf, putf and remf can be used to change properties. This is typically used in conjunction with line-signature to cache information about the line's contents.

line-signature line [Function]

This function returns an object that serves as a signature for a line's contents. It is guaranteed that any modification of text on the line will result in the signature changing so that it is not eql to any previous value. The signature may change even when the text remains unmodified, but this does not happen often.

2.2. Marks

A mark indicates a specific position within the text represented by a line and a character position within that line. Although a mark is sometimes loosely referred to as pointing to some character, it in fact points between characters. If the charpos is zero, the previous character is the newline character separating the previous line from the mark's line. If the charpos is equal to the number of characters in the line, the next character is the newline character separating the current line from the next. If the mark's line has no previous line, a mark with charpos of zero has no previous character; if the mark's line has no next line, a mark with charpos equal to the length of the line has no next character.

This section discusses the very basic operations involving marks, but a lot of Hemlock programming is built on altering some text at a mark. For more extended uses of marks see Altering And Searching Text.

2.2.1. Kinds of Marks

A mark may have one of two lifetimes: temporary or permanent. Permanent marks remain valid after arbitrary operations on the text; temporary marks do not. Temporary marks are used because less bookkeeping overhead is involved in their creation and use. If a temporary mark is used after the text it points to has been modified results will be unpredictable. Permanent marks continue to point between the same two characters regardless of insertions and deletions made before or after them.

There are two different kinds of permanent marks which differ only in their behavior when text is inserted at the position of the mark; text is inserted to the left of a left-inserting mark and to the right of right-inserting mark.

2.2.2. Mark Functions

markp mark [Function]

This function returns t if mark is a mark object, otherwise nil.

mark-line mark [Function]

This function returns the line to which mark points.

mark-charpos mark [Function]

This function returns the character position in the line of the character after mark, i.e. the number of characters before the mark in the mark's line.

mark-buffer mark [Function]

Returns the buffer containing this mark.

mark-absolute-position mark [Function]

This function returns the character position in the buffer of the character after the mark, i.e. the number of characters before the mark in the mark's buffer.

mark-kind mark [Function]

This function returns one of :right-inserting, :left-inserting or :temporary depending on the mark's kind. A corresponding setf form changes the mark's kind.

previous-character mark [Function]
next-character mark [Function]

This function returns the character immediately before (after) the position of the mark, or nil if there is no previous (next) character. These characters may be set with setf when they exist; the setf methods for these forms signal errors when there is no previous or next character.

2.2.3. Making Marks

mark line charpos &optional kind [Function]

This function returns a mark object that points to the charpos'th character of the line. Kind is the kind of mark to create, one of :temporary, :left-inserting, or :right-inserting. The default is :temporary.

copy-mark mark &optional kind [Function]

This function returns a new mark pointing to the same position and of the same kind, or of kind kind if it is supplied.

delete-mark mark [Function]

This function deletes mark. Delete any permanent marks when you are finished using it.

with-mark ({(mark pos [kind])}*) {form}* [Macro]

This macro binds to each variable mark a mark of kind kind, which defaults to :temporary, pointing to the same position as the markpos. On exit from the scope the mark is deleted. The value of the last form is the value returned.

2.2.4. Moving Marks

These functions destructively modify marks to point to new positions. Other sections of this document describe mark moving routines specific to higher level text forms than characters and lines, such as words, sentences, paragraphs, Lisp forms, etc.

move-to-position mark charpos &optional line [Function]

This function changes the mark to point to the given character position on the line line. Line defaults to mark's line.

move-to-absolute-position mark position [Function]

This function changes the mark to point to the given character position in the buffer.

move-mark mark new-position [Function]

This function moves mark to the same position as the mark new-position and returns it.

line-start mark &optional line [Function]
line-end mark &optional line [Function]

This function changes mark to point to the beginning or the end of line and returns it. Line defaults to mark's line.

buffer-start mark &optional buffer [Function]
buffer-end mark &optional buffer [Function]

These functions change mark to point to the beginning or end of buffer, which defaults to the buffer mark currently points into. If buffer is unsupplied, then it is an error for mark to be disassociated from any buffer.

mark-before mark [Function]
mark-after mark [Function]

These functions change mark to point one character before or after the current position. If there is no character before/after the current position, then they return nil and leave mark unmodified.

character-offset mark n [Function]

This function changes mark to point n characters after (n before if n is negative) the current position. If there are less than n characters after (before) the mark, then this returns nil and mark is unmodified.

line-offset mark n &optional charpos [Function]

This function changes mark to point n lines after (n before if n is negative) the current position. The character position of the resulting mark is (min (line-length resulting-line) (mark-charpos mark)) if charpos is unspecified, or (min (line-length resulting-line) charpos) if it is. As with character-offset, if there are not n lines then nil is returned and mark is not modified.

2.3. Regions

A region is simply a pair of marks: a starting mark and an ending mark. The text in a region consists of the characters following the starting mark and preceding the ending mark (keep in mind that a mark points between characters on a line, not at them). By modifying the starting or ending mark in a region it is possible to produce regions with a start and end which are out of order or even in different buffers. The use of such regions is undefined and may result in arbitrarily bad behavior.

2.3.1. Region Functions

region start end [Function]

This function returns a region constructed from the marks start and end. It is an error for the marks to point to non-contiguous lines or for start to come after end.

regionp region [Function]

This function returns t if region is a region object, otherwise nil.

make-empty-region [Function]

This function returns a region with start and end marks pointing to the start of one empty line. The start mark is a :right-inserting mark, and the end is a :left-inserting mark.

copy-region region [Function]

This function returns a region containing a copy of the text in the specified region. The resulting region is completely disjoint from region with respect to data references --- marks, lines, text, etc.

region-to-string region [Function]
string-to-region string [Function]

These functions coerce regions to Lisp strings and vice versa. Within the string, lines are delimited by newline characters.

line-to-region line [Function]

This function returns a region containing all the characters on line. The first mark is :right-inserting and the last is :left-inserting.

region-start region [Function]
region-end region [Function]

This function returns the start or end mark of region.

region-bounds region [Function]

This function returns as multiple-values the starting and ending marks of region.

set-region-bounds region start end [Function]

This function sets the start and end of region to start and end. It is an error for start to be after or in a different buffer from end.

count-lines region [Function]

This function returns the number of lines in the region, first and last lines inclusive. A newline is associated with the line it follows, thus a region containing some number of non-newline characters followed by one newline is one line, but if a newline were added at the beginning, it would be two lines.

count-characters region [Function]

This function returns the number of characters in a given region. This counts line breaks as one character.

Back to Table of Contents