Multiple processes

Pyfastx can be used with multiprocessing module to speed up the random access. Prior to reading sequences from subprocesses, you have to ensure that index file has been created from main process. The index file of pyfastx is just a SQLite3 database file which supports for concurrency. We provide two simple examples for using pyfastx with multiprocessing pool.

Example one

import random
import pyfastx
import multiprocessing as mp

# process worker
# randomly fetch five sequences and print to stdout
def worker(woker_num, seq_counts):
    #recreate the Fasta object in subprocess
    fa = pyfastx.Fasta('test.fa')

    for i in random.sample(range(seq_counts), 5):
        print("worker {} print:\n{}".format(worker_num, fa[i].raw))

if __name__ == '__main__':
    #ensure index file has been created in main process
    fa = pyfastx.Fasta('test.fa')

    #get sequence counts
    c = len(fa)

    #start the process pool
    pool = mp.Pool()

    #add five task workers to run
    for n in range(5):
        pool.apply_async(worker, args=(n, c))

    #wait for tasks to finish
    pool.close()
    pool.join()