Changelog ========= Version 2.1.0 (2024-02-28) -------------------------- - Added support for Python 3.12 - Fixed fasta sequence composition error - Fixed fastq continuous reading error Version 2.0.2 (2023-11-25) -------------------------- - Fixed subsequence return None error Version 2.0.1 (2023-09-18) -------------------------- - Speedup the gzip index writing to index file Version 2.0.0 (2023-09-05) -------------------------- - Added support for file name with wide char - Added support for specifying index file path - Added support for more characters in DNA sequence - Added reverse complement function for DNA conversion - Improved the performance of kseq library - Optimized gzip index importing and saving without temp file - Fixed segmentation fault when using sequence composition - Fixed memory leak in Fastq read quality integer - Fixed zlib download url broken error when building Version 1.1.0 (2023-04-19) -------------------------- - Fixed unicode error when reading fastq file Version 1.0.1 (2023-03-28) -------------------------- - Fixed invalid uppercase when iterating fastx Version 1.0.0 (2023-03-24) -------------------------- - Added support for fasta header without space - Fixed some files missing in pypi tar.gz file Version 0.9.1 (2022-12-31) -------------------------- - Fixed unicode decode error when parsing large fasta/q file - Fixed sequence retrival error when using sequence object from loop after break Version 0.9.0 (2022-12-30) -------------------------- - Added support for Python3.10, 3.11 - Added support for aarch64 and musllinux - Added using tab as fasta sequence name splitter - Fixed repeat sequence comment error - Fixed the quality score parsing error from fastq - Fixed the reference of sequence returned from function Older versions -------------- Version 0.8.4 (2021-06-30) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added slice feature to FastaKeys - Fixed FastaKeys and FastqKeys iteration memory leak - Optimized FastaKeys and FastqKeys creation Version 0.8.3 (2021-04-25) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed Fastx iteration for next function - Fixed Fastx uppercase for reading fasta Version 0.8.2 (2021-01-02) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed sample segfault error caused by fastq iteration error - Fixed gzip index import error in multiple processes - Fixed fastq iteration segfault error with full_name=True - Fixed all objects iteration to support built-in next function Version 0.8.1 (2020-12-16) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed pip install error from source code - Removed support for python39 32bit due to dll load error Version 0.8.0 (2020-12-15) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added Fastx object as a simple sequence iterator - Added FastqKeys object to obtain read names - Added full_name option to Fastq object - Added support for Python 3.9 - Fixed Fasta object error identifier order - Optimized speed of containing test and iteration - Changed Identifier object to FastaKeys object Version 0.7.0 (2020-09-20) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for extracting flank sequences - Added support for indexing super large gzip file - Reduced memory consumption when building gzip index - Improved the speed of random access to reads from fastq - Fixed sequence dealloc error cuasing no fasta delloc trigger - Fixed fastq max and min quality score return value Version 0.6.17 (2020-08-31) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed gzip index loading error when no write permission Version 0.6.16 (2020-08-27) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Increased the buff size of kseq to speedup sequence iteration - Removed warning message from fasta.c when building full index Version 0.6.15 (2020-08-25) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed key_func error caused by free operation - Fixed full name error when reading sequence without whitespace in names - Fixed a hidden bug in fasta/q iteration when reading attributes (not seq) - Fixed fasta/fastq size and sequence count error on Windows when parsing large file - Fixed zlib 2gb and 4gb limit on windows x64 to support large file - Reduced seek point span size to speedup random access from gzip file Version 0.6.14 (2020-07-31) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for using full header as identifier without building index - Improved the speed of fasta sequence iteration - Improved the speed of gzipped fastq read iteration - Fixed a bug in fastq read reader Version 0.6.13 (2020-07-09) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed fastq read iteration error - Fixed fastq cache buffer reader - Added cache for mean, median and N50 length - Speedup fasta iteration by reduced seeks Version 0.6.12 (2020-06-14) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed DeprecationWarning on py38 caused by '#' formats args - Fixed some memory leak bugs - Cached sequence name to speedup fetch method - Used random string as gzip index temp file to support multiple processes Version 0.6.11 (2020-05-18) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed iteration error on Windows - Fixed test error on Windows - Fixed fastq composition error on 32bit OS - Improved the speed of fasta identifier sort and filter Version 0.6.10 (2020-04-22) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Improved the speed of sequence reading - Improved the speed of sequence line iteration - Added avglen, minlen, maxlen, minqual and maxqual to Fastq object - Fixed read retrieval error - Fixed some hidden memory leaks - Changed fastq index file structure to save more information Version 0.6.9 (2020-04-12) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added buffreader to improve speed for reading from gzipped file - Added extract subcommand to extract sequences from fasta/q file - Added build subcommand to just build index - Changed info subcommand output to a tab seperated table - Changed Fastq object composition parameter to full_index Version 0.6.8 (2020-03-14) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed large offset seek error on windows - Fixed PyUnicode_AsUTF8 const char type warning - Changed sequence read line by line function - Changed gzread to fread for fastq information Version 0.6.7 (2020-03-03) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added check for fasta/q format when open file - Added benchmark scripts for evaluating performance - Speed up the fasta/q object iteration - Optimzed str length warning caused by strlen Version 0.6.6 (2020-02-15) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed incorrect sliced sequence name - Fixed seq,identifier,read object memory dealloc - Changed description text into description length in index file Version 0.6.5 (2020-01-31) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Reduced memory usage when building index for large fasta - Removed rebuild_index method from Fasta object due to segmentation fault - Optimized compatibility between sqlite3 and python GIL Version 0.6.4 (2020-01-14) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed last sequence fetching error caused by missing \n - Improved fasta/q object key error message to make it more human Version 0.6.3 (2020-01-08) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added .raw attribute to sequence object to get seq raw string - Added .raw attribute to read object to get read raw string - Added .description to read object to get full header line - Added iteration for sequence object from FASTA object - Added iteration for tuple from FASTQ object - Changed FASTA class parameter composition to full_index Version 0.6.2 (2020-01-04) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed sample sequence index error - Fixed ci deploy error Version 0.6.1 (2020-01-03) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added sample sequences command line - Added get subsequence command line Version 0.6.0 (2020-01-02) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed FASTA object parameter error - Fixed identifier sprintf warning - Fixed fasta description end \r retained - Fixed error byte length when slice sequence - Removed support for python2.7 and python3.4 - Removed python2 compat - Disabled export gzip index when building memory index Version 0.5.10 (2019-11-20) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added identifier filter function - Remove tp_new for Read, Sequence and Identifier - Fixed module method error Version 0.5.9 (2019-11-17) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added get longest and shortest sequence object - Added composition argument to speedup getting GC content - Added memory index to keep index in memory rather than local file - Fixed command line error - Changed sqlite to higher version - Removed journal_mode OFF - Speedup index building Version 0.5.8 (2019-11-10) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed fasta NL function parameter check - Fixed read id error when fastq iteration Version 0.5.7 (2019-11-09) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed SystemError caused caused by Python 2.7 seperated int and long type - Fixed String type check on Python 2.7 - Fixed objects memory deallocation Version 0.5.6 (2019-11-08) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Optimized random access from plain file - Reduced memory consumption Version 0.5.5 (2019-11-07) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added Support for IUPAC code complement - Speedup reverse complement - Speedup space removing and uppercase Version 0.5.4 (2019-11-04) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added guess fasta type (DNA, RNA, protein) - Added support for calculating protein sequence composition - Optimized the speed of index building - Calculate sequence composition when get gc content or composition - Fixed char return in python 2.7 Version 0.5.3 (2019-10-23) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for coverting fastq to fasta - Updated command line interface docs - Fixed command line entry points Version 0.5.2 (2019-10-18) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed command line interface running error Version 0.5.1 (2019-10-17) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added key function for custom sequence identifier - Optimized speed of fasta indexing - Fixed bool args parsing error in py2.7 Version 0.5.0 (2019-10-13) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for python 2.7 and 3.4 - Added command line tool to manipulate fasta and fastq file - Added gzip attribute to fasta and fastq object to check whether compressed - Added sort function for identifier object - Fixed python bool argument parsing error caused by uint16_t - Fixed identifier sort key initialization Version 0.4.1 (2019-10-05) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed fastq quality encoding system guesser - Fixed gzip index insertion error Version 0.4.0 (2019-09-29) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for parsing FASTQ - Added random access to reads from FASTQ Version 0.3.10 (2019-09-27) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed GC skew exception caused by mixing unsigned with signed for division Version 0.3.9 (2019-09-26) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed sequence read line by line error - Fixed last sequence build index error when fasta file ended without \n - Fixed GC skew error Version 0.3.8 (2019-09-25) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed large offset became negative error - Fixed slice step - Fixed uncorrect median length - Fixed strand compare error - Added GC skew calculation - Updated test script Version 0.3.7 (2019-09-24) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Changed int type to standard type - Added support for processing large fasta file - Added id number for each sequence - Fixed SQL fetch error - Used 50 as default value of nl to calculate N50 and L50 Version 0.3.6 (2019-09-20) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for searching subsequence from a sequence - Added support for checking subsequence weather in a sequence - Fixed gzip index import error - Fixed subsequence parent length for full sequence extraction Version 0.3.5 (2019-09-08) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed unicode error caused by sqlite3_finalize Version 0.3.4 (2019-09-07) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed seq description unicode string error Version 0.3.3 (2019-09-07) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed sequence description encoding error Version 0.3.2 (2019-09-07) ^^^^^^^^^^^^^^^^^^^^^^^^^^ Deleted Version 0.3.1 (2019-09-07) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for geting sequence description Version 0.3.0 (2019-09-07) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added read sequence from fasta file line by line - Added support for calculating assembly N50 and L50 - Added support for calculating median and average length - Added support for getting longest and shortest sequence - Added support for calculating counts of sequence - removed support for Python34 Version 0.2.11 (2019-08-31) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Support for Python 3.4 Version 0.2.10 (2019-08-28) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Changed fseek and fread into gzseek and gzread - Fixed sequence cache name comparision - Fixed last sequence read error without line end - Fixed subsequence slice error in normal FASTA file Version 0.2.9 (2019-08-27) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed bad line calculate error - Changed rewind to fseek for subsequence extraction Version 0.2.8 (2019-08-26) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Changed kseq.h library from li to attractivechaos - Improved fasta parser Version 0.2.7 (2019-08-26) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed no gzip index wrote to sqlite index file Version 0.2.6 (2019-08-26) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Optimized speed of gzip random access Version 0.2.5 (2019-08-25) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed segmentation fault raised when loading gzip index - Changed fasta object method get_seq to fetch Version 0.2.4 (2019-08-25) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed fasta iter error after building new index Version 0.2.3 (2019-08-24) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed fasta iter error when end of file is not \n Version 0.2.2 (2019-07-19) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed identifier contain error Version 0.2.1 (2019-07-15) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - Fixed sequence name always end with 0 - Fixed fasta iterable for flat fasta Version 0.2.0 (2019-07-09) ^^^^^^^^^^^^^^^^^^^^^^^^^^ - First release to PyPI