Facilitating the Creation of Arlequin (.arp) Files with R

genetics

packages

Learn how to generate a .arp file directly in R for Arlequin

Author

Bruno Mioto

Published

March 10, 2025

The Problem

This is a very specific post, but it might be useful for those conducting population analyses with genetic data. So I decided to write a simple post to help with this process.

During my thesis, I needed to perform population analyses (AMOVA, FST, etc.), but since I am not an expert in this field, I had never used the Arlequin software before. Arlequin has been around since 1995 and is still widely used in scientific studies.

The problem arises when we need to create an input file .arp for use in Arlequin. Some programs (like DNAsp) do save an .arp file, but it often does not work well and usually requires manual editing.

Tutorials

While looking for help, I found this comprehensive tutorial (in Portuguese-BR) by Natália E. de Lima.

Natália clearly understands this topic much better than I do, so she can manually edit population data in a easy way.

Since I am more comfortable with R than with population genetics, I looked for other tutorials using R and found this video by Josh Banta.

In this video, he demonstrates how to convert a FASTA file into an .arp file while also adding groupings to the data. The entire process is done using an R script, and the data are available in the video above.

A Simpler Approach

However, the script for this task is quite confusing and may be challenging for those who are not very proficient in R. With that in mind, and to make things easier for my colleagues (and myself) in the future, I adapted this code into a simple R function called create_arlequin().

This function is available in the package I created for my lab, Nupgen! This is a highly experimental package with extremely specific functions, so don’t worry about the other ones.

To install the package, simply open R and run:

if (!requireNamespace("remotes", quietly = TRUE)){
    install.packages("remotes")
  }
remotes::install_github("brunomioto/nupgen")

Done! Now let’s create the .arp file we need. First, we need to load the required packages. We will use the ape package to import sequences and nupgen to create the file we want.

library(ape)
library(nupgen)

Now, let’s load a FASTA alignment and a group file. We will use the example files from Josh Banta’s tutorial, but you can use your own as well!

The FASTA file is a common alignment file, trimmed in software like MEGA, following this structure:

>C1NFaBCy  
AATCATCCCCCACATAACCTCCACACTTATCACATACCTTCTAATCTTATTAGGCGTAGC
ATTCTTTACCCTTCTTGAACGCAAAGCTTTAGGGTACTTTCAAATCCGAAAAGGCCCAAA
CAAAGTTGGAATTATAGGAATCCCACAACCACTAGCAGACGCCCTAAAACTTTTTGTGAA
AGAATGAGTAATGCCCACATCTTCAAACTACTTACCATTTATTTTAACCCCAACAATCAT
ATTAATTTTAGCACTTAGACTATGACAACTATTTCCATCCTTTATACTCTCATTTCAAAT
AGCCCTAGGAATACTCTTATTCTTATGTATTTCTTCCTTAACCGTCTATACAACCTTAAT
AGCAGGTTGGGCCTCAAACTCGAAGTATGCTCTACTAGGGGCCATTCGAGCCATGGCCCA
AACCATCTCATATGAGGTAACAATAACACTAATTATCATCTTCTACCTATTCTTAATTAT
ACAAATAGACATAGTAACAATCCGCTCAGTTAACACCTCTATACCAACCTTTGCCCTCTC
CGCACCATTAGCTATTATATGGACTGTTGTCATCTTAGCAGAAACAAACCGAGCCCCATT
TGACTTT
>C2NPrBCy  
AATCATCCCCCACATAACCTCCACACTTATCACATACCTTCTAATCTTATTAGGCGTAGC
ATTCTTTACCCTTCTTGAACGCAAAGCTTTAGGGTACTTTCAAATCCGAAAAGGCCCAAA
CAAAGTTGGAATTATAGGAATCCCACAACCACTAGCAGACGCCCTAAAACTTTTTGTGAA
AGAATGAGTAATGCCCACATCTTCAAACTACTTACCATTTATTTTAACCCCAACAATCAT
ATTAATTTTAGCACTTAGACTATGACAACTATTTCCATCCTTTATACTCTCATTTCAAAT
AGCCCTAGGAATACTCTTATTCTTATGTATTTCTTCCTTAACCGTCTATACAACCTTAAT
AGCAGGTTGGGCCTCAAACTCGAAGTATGCTCTACTAGGGGCCATTCGAGCCATGGCCCA
AACCATCTCATATGAGGTAACAATAACACTAATTATCATCTTCTACCTATTCTTAATTAT
ACAAATAGACATAGTAACAATCCGCTCAGTTAACACCTCTATACCAACCTTTGCCCTCTC
CGCACCATTAGCTATTATATGAACTGTTGTTATCTTAGCAGAAACAAACCGAGCCCCATT
TGACTTT

Meanwhile, the group file is a .csv file with two columns: group (containing the group names) and name (containing the sequence names), following this example:

group,name
1,C1NFaBCy  
1,C2NPrBCy  
1,C3NPrBCy  
1,C4NPrBCy  
1,C5NPrBCy  
1,C6NPrBCy  
1,C7NPrBCy  
1,C8NPrBCy  
1,C9NPrNec  
2,C10NPrNec 
2,C11NPrNec 
2,C12NPrNec 
2,C13NPrNec 
3,C14NPrSab 
3,C15NPrSab

If needed, you can create this file using Excel and export it as a .csv file, but make sure the separator is a comma (,) instead of a semicolon (;).

Note that my files are inside the data folder, so I include that in the file path:

alignment <- read.dna("data/fasta_file.fas", format = "fasta")

groups <- read.csv("data/groups_file.csv")

Agora que temos os arquivos carregados, é só rodar a função. Perceba que, além dos argumentos do alinhamento e grupos, temos um chamado output.dir, este argumento define qual diretório você deseja salvar o arquivo output.arp. O padrão é o diretório atual (“.”), mas aqui vou salvar na pasta data, junto com os outros arquivos.

nupgen::create_arlequin(fasta = alignment, groups = groups, output.dir = "./data")

ℹ Creating .arp file

ℹ Saving .arp file

Done! Now you have an output.arp file ready to use in Arlequin!

It’s worth checking if your data is correct. The file should look something like this:

[Profile] 

 

Title="data" 

NBSamples=14
 

DataType=DNA 

GenotypicData=0 

LocusSeparator=WHITESPACE 

 

[Data] 

[[Samples]] 

 

SampleName="1"

SampleSize=9

SampleData={ 

 

c1nfabcy   1 A A T C A T C C C C C A C A T A A C C T C C A C A C T T A T C A C A T A C C T T C T A A T C T T A T T A G G C G T A G C A T T C T T T A C C C T T C T T G A A C G C A A A G C T T T A G G G T A C T T T C A A A T C C G A A A A G G C C C A A A C A A A G T T G G A A T T A T A G G A A T C C C A C A A C C A C T A G C A G A C G C C C T A A A A C T T T T T G T G A A A G A A T G A G T A A T G C C C A C A T C T T C A A A C T A C T T A C C A T T T A T T T T A A C C C C A A C A A T C A T A T T A A T T T T A G C A C T T A G A C T A T G A C A A C T A T T T C C A T C C T T T A T A C T C T C A T T T C A A A T A G C C C T A G G A A T A C T C T T A T T C T T A T G T A T T T C T T C C T T A A C C G T C T A T A C A A C C T T A A T A G C A G G T T G G G C C T C A A A C T C G A A G T A T G C T C T A C T A G G G G C C A T T C G A G C C A T G G C C C A A A C C A T C T C A T A T G A G G T A A C A A T A A C A C T A A T T A T C A T C T T C T A C C T A T T C T T A A T T A T A C A A A T A G A C A T A G T A A C A A T C C G C T C A G T T A A C A C C T C T A T A C C A A C C T T T G C C C T C T C C G C A C C A T T A G C T A T T A T A T G G A C T G T T G T C A T C T T A G C A G A A A C A A A C C G A G C C C C A T T T G A C T T T

 

c2nprbcy   1 A A T C A T C C C C C A C A T A A C C T C C A C A C T T A T C A C A T A C C T T C T A A T C T T A T T A G G C G T A G C A T T C T T T A C C C T T C T T G A A C G C A A A G C T T T A G G G T A C T T T C A A A T C C G A A A A G G C C C A A A C A A A G T T G G A A T T A T A G G A A T C C C A C A A C C A C T A G C A G A C G C C C T A A A A C T T T T T G T G A A A G A A T G A G T A A T G C C C A C A T C T T C A A A C T A C T T A C C A T T T A T T T T A A C C C C A A C A A T C A T A T T A A T T T T A G C A C T T A G A C T A T G A C A A C T A T T T C C A T C C T T T A T A C T C T C A T T T C A A A T A G C C C T A G G A A T A C T C T T A T T C T T A T G T A T T T C T T C C T T A A C C G T C T A T A C A A C C T T A A T A G C A G G T T G G G C C T C A A A C T C G A A G T A T G C T C T A C T A G G G G C C A T T C G A G C C A T G G C C C A A A C C A T C T C A T A T G A G G T A A C A A T A A C A C T A A T T A T C A T C T T C T A C C T A T T C T T A A T T A T A C A A A T A G A C A T A G T A A C A A T C C G C T C A G T T A A C A C C T C T A T A C C A A C C T T T G C C C T C T C C G C A C C A T T A G C T A T T A T A T G A A C T G T T G T T A T C T T A G C A G A A A C A A A C C G A G C C C C A T T T G A C T T T

I hope this post was helpful to you! If you have any questions, suggestions, or feedback, send me an email!