What is an LSID?

The Life Sciences Identifier (LSID) is an I3C and OMG Life Sciences Research (LSR) Uniform Resource Name (URN) specification in progress.

The LSID concept introduces a straightforward approach to naming and identifying data resources stored in multiple, distributed data stores in a manner that overcomes the limitations of naming schemes in use today. Almost every public, internal, or department-level data store today has its own way of naming individual data resources, making integration between different data sources a tedious, never-ending chore for informatics developers and researchers.

By defining a simple, common way to identify and access biologically significant data, whether that data is stored in files, relational databases, in applications, or in internal or public data sources, LSID provides a naming standard underpinning for wide-area science and interoperability.

A detailed LSID URN naming specification is available at the OMG LSR.

What does an LSID look like?

A LSID conforms to the URN standards defined by the IETF.

Every LSID consists of up to five parts: the Network Identifier (NID); the root DNS name of the issuing authority; the namespace chosen by the issuing authority; the object id unique in that namespace; and finally an optional revision id for storing versioning information. Each part is separated by a colon to make LSIDs easy to parse.

Here are a few examples:
urn:lsid:pdb.org:1AFT:1
This is the first version of the 1AFT protein in the Protein Data Bank.
urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434
References a PubMed article
urn:lsid:ncbi.nlm.nig.gov:GenBank:T48601:2
Refers to the second version of an entry in GenBank

These LSIDs name and refer to one unchanging data object each. Unlike the familiar URLs of the World-Wide-Web, LSIDs are location independent. This means that a program or a user can be certain that what they are dealing with is exactly the same data if the LSID of any object is the same as the LSID of another copy of the object obtained elsewhere.

The problem with URLs is that they always point to a particular web server (which may not always be in service) and worse, that the contents referred to by a URL often change - think about your favorite news URL. For researchers and legal authorities the requirement to be able to exactly reproduce any observations and experiments based on a data object means that it is essential that data be uniquely named and available from many cached sources.

What is a Resolver

An LSID Resolver is a software system that implements an agreed LSID resolution protocol in order to allow higher level software to be able to locate and access the data uniquely named by any LSID URN.

At a minimum this software system usually comprises of two parts that communicate over a network. The first part is server software operated by any party that wishes to make data available and that has assigned LSID names to this data. This party is also known as the LSID issuing authority. The second part is software that usually executes on a client that can communicate over a network using an agreed protocol with the LSID authority server in order to retrieve the data or metadata associated with a particular LSID instance. A schematic of the client and server network interaction can be found at the I3C.

The amount of data generated in the Life Sciences field is estimated to be doubling every month. The general adoption of the LSID naming specification and an agreed resolution protocol will provide a standard method for locating and accessing these resources for the industry.

Online Resolver

A basic LSID resolved can be accessed on this web site. Append the LSID to http://www.lsid.info/, for example http://www.lsid.info/urn:lsid:marinespecies.org:taxname:138474. The format of data returned depends on what the issuing authority supports, and what is requested by your client. Many authorities only provide machine readable data. In this case, the detailed resolver may be useful.

The source code for the resolver is available on GitHub.