Recently, the NCBI retired the well known GI numbers in favor of the more structured accession numbers. To allow users to still convert existing data, they provide a 40gb database dump along with a little python program to extract the data. However, since it’s rather tedious to pull such a massive file, and to make sure to have all required python dependencies, we would like to wrap this conversion model into a small REST service. We also discuss in this post how to integrate this microservice with R, bash and kscript.

Prepare the mapping model

First, we pull the massive data file - which is a python Lightning database - to our local system and test the provided python tool.

cd ~/gi_acc
wget ftp://ftp.ncbi.nlm.nih.gov/genbank/livelists/gi2acc_mapping/gi2acc_lmdb.db.2017.01.04.0001.gz.*
gunzip gi2acc_lmdb.db.2017.01.04.0001.gz

ln -s gi2acc_lmdb.db.2017.01.04.0001 gi2acc_lmdb.db

## also download the provided script
wget ftp://ftp.ncbi.nlm.nih.gov/genbank/livelists/gi2acc_mapping/gi2accession.py

## test the provided tool
chmod u+x gi2accession.py 

## install required python modules
#sudo apt-get install python-dev
pip install lmdb

## try an example to test the installation
echo 42 | ./gi2accession.py 
#> 42	CAA44840.1	416

To avoid this tedious setup whenever we need to convert GIs, we now would like to expose it via a tiny REST API.

Setup the REST application

The general concept about to get started with REST, Spring-Boot and Kotlin is described in

So essentially all we need is a single method that accepts one or more GIs as input and which returns a mapping scheme.

// value type to model python-script output
data class IdPair(val gi: Long, val accession: String?, val seqLength: Long?)

// installation dir of ncbi provided pyton script and database
//val INSTALL_DIR = File(System.getProperty("user.home"), "projects/gi_acc")
val INSTALL_DIR = File("/local/web/files/gi2acc_service/")



@RestController
class IdConversionController {

    @RequestMapping("/gi2acc")
    fun mapGI(@RequestParam(value = "gi") giNumbers: String): List<IdPair> {
        val queryGis = giNumbers.split(',', ';').map(String::toInt).toList()

        val idListFile = createTempFile()

        queryGis.saveAs(idListFile)

        // run the python script over the ids
        val cmd = "cat ${idListFile.absolutePath} | ${INSTALL_DIR}/gi2accession.py"
        val result = evalBash(cmd, wd = INSTALL_DIR)

        val convertedIds: List<IdPair> = result.stdout.
                filter(String::isNotBlank).
                map {
                    // if id was not mappable return NA instead (example 5353)
                    if (it.contains("not found")) {
                        IdPair(it.split(" ")[0].toLong(), null, null)
                    } else {
                        // example line: 34	X17614.1	1632
                        with(it.split('\t')) { 
                            IdPair(this[0].toLong(), this[1], this[2].toLong()) 
                        }
                    }
                }

        idListFile.delete()

        return convertedIds
    }
}

@SpringBootApplication
open class Application

fun main(args: Array<String>) {
    // http://stackoverflow.com/questions/21083170/spring-boot-how-to-configure-port
    System.getProperties().put("server.port", 7050);

    SpringApplication.run(Application::class.java, *args)
}

There’s just a single method that accepts a list of comma/semicolon separated GIs and returns a json array with the mapping table. Unmappable IDs are mapped to null.

We notice and welcome the surprisingly little amount of boilerplate code required to turn it into a Spring-Boot ready application. Only a specially annotated Application class is needed which is used as an argument to SpringApplication.run in the main function of the kts. Kotlin makes it possible to keep everything in a single class here.

To test the app locally we can use use http://localhost:7050/gi2acc?gi=42 or http://localhost:7050/gi2acc?gi=123,222, 232,3 for multiple IDs.

To check if also invalid GI are handled gracefully we can mix in an invalid id http://localhost:7050/gi2acc?gi=23,5353,34, which gives:

[
  {
    "gi": 23,
    "accession": "X53811.1",
    "seqLength": 422
  },
  {
    "gi": 5353,
    "accession": null,
    "seqLength": null
  },
  {
    "gi": 34,
    "accession": "X17614.1",
    "seqLength": 1632
  }
]

How to deploy the app into production?

To deploy our micro-service into production we simply follow the spring-boot deployment guidelines.

## Build it
gradle build

## Copy to server (this steo will depend on your local setup)
scp build/libs/gi2acc_service-1.0-SNAPSHOT.jar java-srv1:/local/web/files/gi2acc_service/gi2acc_service.jar

## Change to deployment server target directory, and then 
chmod o+x gi2acc_service.jar  ## make it executable

## Use special user created with "sudo adduser bootapp" to increase app-security 
## (see spring-boot docs link from above for details) 
sudo chown bootapp:bootapp gi2acc_service.jar
sudo chmod 500 gi2acc_service.jar  ## only owner can read and write


## Now we could just run it directly...
sudo su bootapp
./gi2acc_service.jar


## ... or install as an init.d service (recommended)
sudo ln -s $(readlink -f gi2acc_service.jar) /etc/init.d/gi2acc

## start the service
sudo service gi2acc start
## or stop it
sudo service gi2acc stop

Test the installation with simply with curl "http://java-srv1.mpi-cbg.de:7050/gi2acc?gi=23,5353,34" or in your browser.

Workflow integration

Finally, we’d lik to use our new GI to accession conversion microservice. Since most bioinformatic workflows live in R or the shell, we’ll show integrations for both here. Let’s start with R:

How to integrate with R?

Using the conversion webservice from R can be easily done using httr + dplyr mixed with a bit of purr:

library(httr)
library(tidyverse)

## define the queries
GIs = list(23,5353,34)

## iterate over the queries, call the service, and bind the results into a data.frame
idMap = map_df(GIs, function(gi_nr){
#    gi_nr=5353
    paste0("http://bioinfo.mpi-cbg.de:7050/gi2acc?gi=", gi_nr) %>% 
        GET %>% 
        content %>% 
        flatten %>%
        with(data.frame(gi=gi, accession= ifelse(is.null(accession), NA, accession)))
}) 

idMap
#     gi accession
# 1   23  X53811.1
# 2 5353      <NA>
# 3   34  X17614.1

How to use it in bash?

Since the shell is not really made for json we’d like to convert the output into csv to allow more bash-style processing of the converted GIs. There are various solutions to process json in the shell. We recommend jq:

# install jq if not yet present: sudo apt-get install jq
gi_nr=24,323

curl -s "http://bioinfo.mpi-cbg.de:7050/gi2acc?gi=$gi_nr" | \
    jq -r '(.[0] | keys) as $keys | $keys, map([.[ $keys[] ]])[] | @csv'

which gives

"accession","gi","seqLength"
"X53812.1",24,422
"CAA32192.1",323,155

How to use it in Kotlin?

Since we used kotlin for the service backend, we might want it for the client-side as well. To do so we use Fuel and Klaxon:

import com.beust.klaxon.*
import com.github.kittinunf.fuel.httpGet

// define list of query GIs
val gis = listOf(23,5353,34)

val queryURL = "http://bioinfo.mpi-cbg.de:7050/gi2acc?gi=${gis.joinToString(",")}"

val json = String(queryURL.httpGet().response().second.data)

// use fuel library to call the service (see https://github.com/kittinunf/Fuel)
val jsonArray = Parser().parse(json.byteInputStream())!! as JsonArray<*>

// use klaxon library to parse the json result (see https://github.com/cbeust/klaxon)
val idMap = jsonArray.map { (it as JsonObject) }.map { 
    it.int("gi") to it.string("accession") 
}

// print conversion table
idMap.forEach { println(it) }

which outputs

(23, X53811.1)
(5353, null)
(34, X17614.1)

This solution could be also easily wrapped into a self-contained client-side application using kscript. This can be done by just adding (a) a dependency header and by (b) reading the query GI list from the provided arguments.

## define a convenience wrapper for the remote scriplet
alias gi2acc="kscript https://raw.githubusercontent.com/holgerbrandl/gi2accession/master/parse_json.kts"

## use it!
gi2acc 23 324 534
#> (23, X53811.1)
#> (324, X53689.1)
#> (534, X67823.1)

Summary

With little effort we could build, and deploy a spring-boot application providing a REST service for GI to accession number conversion. Because of Kotlin’s flexible language design we could keep things concise together in a single source file. We walked through different integrations using R, the shell, and Kotlin.

The complete code is available unter https://github.com/holgerbrandl/gi2accession

The described conversion service can be used via the following URL:

http://java-srv1.mpi-cbg.de:7050/gi2acc?gi=23,5353,34

Feel welcome to post comments or suggestions.