Clump variants in a GWAS using PLINK2 and an appropriate reference panel. For example, the 1000 genomes phase 3 data can be downloaded from the PLINK website (https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg). To remove duplicates you can run:

plink2
–pfile all_phase3
–rm-dup force-first
–make-pgen
–out all_phase3_nodup

The path to the reference (without the plink extensions) should be passed as the plink_ref argument. The path to the plink2 executable should be passed as the plink2 argument.

clump(
  gwas,
  p1 = 1,
  p2 = 1,
  r2 = 0.1,
  kb = 250,
  plink2 = genepi.utils::which_plink2(),
  plink_ref = genepi.utils::which_1000G_reference(build = "GRCh37"),
  logging = TRUE,
  parallel_cores = parallel::detectCores()
)

Arguments

gwas

a data.frame like object with at least columns rsid, ea, oa, and p

p1

a numeric, the p-value threshold for inclusion as a clump

p2

a numeric, the p-value threshold for incorporation into a clump

r2

a numeric, the r2 value

kb

a integer, the window for clumping

a string, path to the plink executable

a string, path to the pfile genome reference

logging

a logical, whether to set the plink logging information as attributes (log, missing_id, missing_allele) on the returned data.table

parallel_cores

an integer, how many cores / threads to use

Value

a data.table with additional columns index (logical, whether the variant is an index SNP) and clump (integer, the clump the variant belongs to)