GEDCOM 5.5.2 Implementation Notes

Editor:
John Cardinal (Family History Hosting, LLC)

Contents

Document Information

Abstract

This document provides GEDCOM 5.5.2 implementation advice intended for software authors.

Document Status

This document is a Draft. Feedback and comments are welcome. You may create a pull-request to propose and collaborate on changes to the document. A future version of this document will remove its Draft status.

GitHub Project
GEDCOM-5.5.2
Public URL of Document
https://jfcardinal.github.io/GEDCOM-5.5.2/implementation-notes.html
Associated documents
https://jfcardinal.github.io/GEDCOM-5.5.2

Introduction

The GEDCOM 5.5.2 specification is a minor revision to GEDCOM 5.5.1. For software authors, there are relatively few changes to implement, which was an explicit goal for GEDCOM 5.5.2.

GEDCOM readers that import GEDCOM 5.5.1 documents using UTF-8 encoding should require few changes. Most full-featured programs already handle common exceptions to GEDCOM 5.5 and GEDCOM 5.5.1, such as reasonable field values that exceed the GEDCOM length restriction but are valid in GEDCOM 5.5.2.

GEDCOM writers that export GEDCOM 5.5.1 documents using UTF-8 encoding should also require few changes to export GEDCOM 5.5.2 documents. However, it is vital that GEDCOM writers that export GEDCOM 5.5.2 documents follow the GEDCOM 5.5.2 specifications completely.

System Changes

UTF-8

GEDCOM 5.5.2 specifies only a single valid character encoding: UTF-8.

Full Support

Most modern GEDCOM-processing systems support UTF-8, and for those systems, supporting this requirement will be tactical:

  • GEDCOM Readers
    • Verify that GEDCOM 5.5.2 documents (A) are encoded with UTF-8 and (B) specify UTF-8 on the HEAD.CHAR subrecord. Reject documents that do not meet those two requirements.
  • GEDCOM Writers
    • Restrict output encoding to UTF-8 when writing GEDCOM 5.5.2 documents. Do not allow users to chose another output encoding when writing GEDCOM 5.5.2 documents.
    • Write the UTF-8 Byte Order Mark prefix (U+00EF, U+00BB, U+00BF) as the first bytes in the document.

Limited Support

Systems that can read documents encoded with UTF-8 but do not support Unicode internally should be capable of limited support for GEDCOM 5.5.2:

  • GEDCOM Readers
    • Decode the UTF-8 text to the encoding used internally in the program. If the input document contains characters that cannot be represented accurately in the encoding used internally:
      1. Write an error message for each affected subrecord so the end user is aware of each decoding failure.
      2. Convert the character(s) that cannot be decoded accurately to a character or character sequence that enables the end user to locate the incorrect text.

Systems that can write documents encoded with UTF-8 but do not support Unicode internally should be capable of full support for GEDCOM 5.5.2:

  • GEDCOM Writers
    • The requirements are the same as for Full Support described above.

No Support

Systems that cannot read or write documents encoded with UTF-8 should not attempt to support GEDCOM 5.5.2.

CONT and CONC

In GEDCOM 5.5.2, the CONT and CONC subrecords are allowed with almost all records or subrecords that have a line_value. The exceptions are the HEAD.CHAR or HEAD.GEDC.VERS subrecords: CONT and CONC are not valid with those subrecords.

While CONT and CONC may appear in more places, that does not imply that line_values may be longer than stated in the specification or that line-ending characters are valid in line_values. All line_values must meet the specific requirements described for their subrecords.

Full Support

  • GEDCOM Readers
    • Implement CONT and CONC processing prior to evaluating the line_value in a gedcom_line so that CONT and CONC can be used with any subrecord.
    • Validate the line_value after processing CONT and CONC subrecords.
  • GEDCOM Writers
    • Do not use a CONC subrecord when the line_value will not overflow the maximum length of a gedcom_line.
    • Do not use a CONC subrecord with subrecords that have keyword line_values.

Limited Support

Systems that cannot support CONT and CONC with any subrecord will usually meet the requirements for limited support. This will be common for systems that support CONT and CONC in the typical contexts where they are used in GEDCOM 5.5 and 5.5.1, such as with the NOTE and ADDR subrecords.

  • GEDCOM Readers
    • When a CONT or CONC subrecord occurs in a context that is not supported, write an error message. This should be no different from the current behavior of systems that claim support for GEDCOM 5.5 or 5.5.1.
  • GEDCOM Writers
    • The requirements are the same as for Full Support described above.
White Space Handling

White space handling has been simplified and clarified in GEDCOM 5.5.2. Reading and writing systems must implement the rules described in the Space Handling section of the specification.

HEAD.CHAR.VERS

HEAD.CHAR.VERS has been removed in GEDCOM 5.5.2. GEDCOM 5.5.2 uses the UTF-8 character encoding exclusively and a useful version number does not apply to UTF-8.

  • GEDCOM Readers
    • If a HEAD.CHAR.VERS subrecord is present, ignore the subrecord and write an error message.
  • GEDCOM Writers
    • Do not write a HEAD.CHAR.VERS subrecord.
HEAD.DEST

HEAD.DEST is not mandatory in GEDCOM 5.5.2 and its use is discouraged.

  • GEDCOM Readers
    • If a HEAD.DEST subrecord is present, ignore the subrecord.
  • GEDCOM Writers
    • Do not write a HEAD.DEST subrecord.
FAM.SUBM and INDI.SUBM

The FAM.SUBM and INDI.SUBM subrecords are not valid in GEDCOM 5.5.2.

  • GEDCOM Readers
    • If FAM.SUBM or INDI.SUBM subrecords are present, ignore the subrecords and write a single error message or an error message for each occurrence.
  • GEDCOM Writers
    • Do not write FAM.SUBM or INDI.SUBM subrecords.
INDI.EVEN <EVENT_DESCRIPTOR>

The INDI.EVEN subrecord in GEDCOM 5.5.2 includes an <EVENT_DESCRIPTOR> line_value.

  • GEDCOM Readers
    • Process the INDI.EVEN line_value in the same or similar manner as the application currently processes the FAM.EVEN line_value.
  • GEDCOM Writers
    • Allow end users to specify an INDI.EVEN line_value.
    • Write INDI.EVEN line_values when the end user has specified a value.
INDI.RFN

INDI.RFN has been removed in GEDCOM 5.5.2. The INDI.RFN subrecord in GEDCOM 5.5.1 and previous was intended to hold a PERMANENT_RECORD_FILE_NUMBER, a record number using a "registered network resource". The registration system was never implemented.

  • GEDCOM Readers
    • If an INDI.RFN subrecord is present, ignore the subrecord and write an error message.
  • GEDCOM Writers
    • Do not write an INDI.RFN subrecord.
LDS Subrecords

Subrecords that were specific to The Church of Jesus Christ of Latter-day Saints have been removed from GEDCOM 5.5.2. This includes the following records and subrecords: AFN, ANCE, BAPL, CONL, DESC, ENDL, FAMF, ORDI, SLGC, SLGS, SUBN, TEMP.

For more information, see Obsolete LDS Definitions.

  • GEDCOM Readers
    • Ignore the subrecords listed above and issue error messages or convert the subrecords to the appropriate internal structures in the reading system.
  • GEDCOM Writers
    • Do not write the subrecords listed above.
MAP.LATI and MAP.LONG

The size of the MAP.LATI and MAP.LONG line_values have been changed in GEDCOM 5.5.2. The PLACE_LATITUDE size in GEDCOM 5.5.2 is "{Size=2:10}". The PLACE_LONGITUDE size in GEDCOM 5.5.2 is "{Size=2:11}".

  • GEDCOM Readers
    • Validate MAP.LATI and MAP.LONG line_values using the GEDCOM 5.5.2 size restrictions.
  • GEDCOM Writers
    • Write MAP.LATI and MAP.LONG line_values that meet GEDCOM 5.5.2 size restrictions.
OBJE.FILE

The OBJE.FILE subrecord's <MULTIMEDIA_FILE_REFERENCE> line_value has been expanded from 30 characters in GEDCOM 5.5.1 to 1048 characters in GEDCOM 5.5.2.

  • GEDCOM Readers
    • Validate OBJE.FILE line_values using the GEDCOM 5.5.2 size restrictions.
  • GEDCOM Writers
    • Write OBJE.FILE line_values that meet GEDCOM 5.5.2 size restrictions.
APPROVED_SYSTEM_ID and RECEIVING_SYSTEM_NAME

The <APPROVED_SYSTEM_ID> and <RECEIVING_SYSTEM_NAME> primitive values in GEDCOM 5.5.1 have been merged as <SYSTEM_ID> in GEDCOM 5.5.2, and its length has been expanded from 20 characters to 60 characters.

  • GEDCOM Readers
    • Validate HEAD.SOUR and HEAD.DEST line_values using the GEDCOM 5.5.2 size restrictions.
  • GEDCOM Writers
    • Write HEAD.SOUR and HEAD.DEST line_values that meet GEDCOM 5.5.2 size restrictions.
Pointer Characters

Valid Pointer Characters – Prior versions of GEDCOM allowed a wide range of characters in ID values. GEDCOM 5.5.2 restricts valid ID characters to a smaller subset of characters.