Changelog#
Version 0.8.0#
Improve median runtime by ~68% (~52% on average) by: 1) more caching, 2) implementing a heavy function in C.
Redesign computation of consistency measure to a class:
ConsistencyDetector
.Fix potential memory leak in C code for base abstraction
Fixes to escape sequences in regexes (thanks to @JakobGM!)
Various improvements to code quality
Switch documentation style to furo.
Version 0.7.7#
Use r-prefix for regex patterns (thanks to @JakobGM!)
Fix documentation typo (thanks to @Aritra8438!)
Version 0.7.6#
Simplify faust-cchardet import for Windows builds
Version 0.7.5#
Add support for Python 3.11 by fixing a bug regarding empty strings in dialects (thanks to @stefanor!)
Fix installation error due to change in internals at setuptools (thanks to @mweinelt!)
Migrate to faust-cchardet as cChardet fails to install on Python 3.11 (on Windows, currently only chardet will work for Python 3.11)
Migrate to packaging for version comparison
Version 0.7.4#
Add wrapper for writing a list of dictionaries (write_dicts)
Fix bug when writing CSVs using the
csv
module dialectsAdd the builtin dialects to CleverCSV (e.g.,
clevercsv.excel
)
Version 0.7.3#
Release to build wheels for Python 3.10
Version 0.7.2#
Re-implement command line interface using Wilderness
Add man-pages to package
Version 0.7.1#
Remove deprecated wrapper functions
Expand URL regex to support
localhost:<port>
urlsMinor changes to the TypeDetector API
Add cChardet as optional dependency (fixes #48)
Version 0.7.0#
Version 0.6.8#
Add a “bytearray” type to address a specific failure case (#35).
Minor clarifications to licensing.
Version 0.6.7#
Updates to release process. This version introduces pre-compiled wheels for Python 3.9.
Version 0.6.6#
Add an
encoding
argument towrite_table
to allow specifying the output encoding. Thanks to @mitchgrogg for reporting issue #27.
Version 0.6.5#
Add support for standardizing in-place and standardizing multiple files.
Add warning on duplicate field names in DictReader
Add return value to writers to match the standard library.
Version 0.6.4#
Various speed ups to constructing the list of potential dialects. This removes a costly step of the detection process that will likely add a few more potential dialects, but has the end result of making overall dialect detection faster.
Version 0.6.3#
Rename wrapper functions to a more coherent naming scheme. Old names will be available until 0.7.0, but now produce a FutureWarning.
Add
stream_dicts
wrapper function.Improve handling of file encoding for the
read_dataframe
wrapper: detected encoding is now passed on to Pandas.Fix handling of optional dependency error for TabView on non-Windows platforms.
Version 0.6.2#
Update URL regex to avoid catastrophic backtracking and increase performance. See issue #13 and issue #15. Thanks to @kaskawu for the fix and @jlumbroso for re-raising the issue.
Add
num_chars
keyword argument toread_as_dicts
andcsv2df
wrappers.Improve documentation w.r.t. handling large files. Thanks to @jlumbroso for raising this issue.
Version 0.6.1#
Add an
explore
command to the command line application for CleverCSV. This command makes it easy to start exploring a CSV file using the Python interactive shell.
Version 0.6.0#
Split the package into a “core” and “full” version. This allows users who only need the improved dialect detection functionality to download a version with a smaller footprint. Fixes issue #10]. Thanks to @seperman.
Version 0.5.6#
Fix speed of
unix_path
regex used in type detection. (issue #13). Thanks to @kaskawu.
Version 0.5.5#
Add
stream_csv
wrapper that returns a generator over rowsMinor update to the URL type detection
Documentation updates
Version 0.5.4#
Fix bugs discovered from fuzz testing (issue #7)
Minor changes to readme and code quality
Version 0.5.3#
Fix using nan as default value when skipping a dialect (issue #5)
Version 0.5.2#
Bump version to fix wheel building
Version 0.5.1#
Bump version to fix wheel building
Version 0.5.0#
Improve type detection for quoted alphanumeric cells (#4)
Pass
strict
dialect property to parser.
Version 0.4.7#
Bugfix for
write_table
wrapper on Windows.Move building Windows platform wheels to Travis.
Use
cibuildwheel
version 1.0.0 for building wheels.
Version 0.4.6#
Add a wrapper function that writes a table to a CSV file.
Version 0.4.5#
Update CleverCSV to match updated clikit dependency
Fix dependency versions for clikit and cleo
Version 0.4.4#
Update
standardize
command to use CRLF line endings on all platforms.Add work around for Tabview being unavailable on Windows.
Remove packaging and dependency management with Poetry.
Add support for building platform wheels on Travis and AppVeyor.
Version 0.4.3#
Add optional
method
parameter to dialect detector.Bugfix for
clevercsv code
command when the delimiter is tab.
Version 0.4.2#
Fix a failing build due to dependency version mismatch
Version 0.4.1#
Allow underscore in alphanumeric strings
Update unix path regular expression
Add more integration tests and log detection method
Version 0.4.0#
Update URL regular expression and add unit tests
Add IPv4 type detection
Add tie-breaker for combined quotechar and escapechar ties
Version 0.3.7#
Bugfix for console script
code
commandUpdate readme
Version 0.3.6#
Cleanly handle failure to detect dialect in console application
Remove any (partial) support for Python 2
Version 0.3.5#
Remove Python parser - this speeds up file reading and tie breaking
Version 0.3.4#
Ensure the C parser is used in the
reader
.Update integration tests to improve error handling
Readme updates
Version 0.3.3#
Ensure detected encoding is in the generated Python code for the
clevercsv code
command.Ensure encoding is detected in
wrappers.detect_dialect
.Bugfix in integration test
Expand readme
Version 0.3.2#
Add documentation on Read the Docs
Use requirements.txt file for dependencies when packaging
Version 0.3.1#
Add help description to each CLI command
Update README
Add transpose flag for
standardize
andview
commands
Version 0.3.0#
Rewrite console application using Cleo
Add unit tests for console application
Add
detect_dialect
wrapper functionAdd support for “unix_path” data type in type detection
Add
encoding
andnum_chars
options toread_csv
wrapperAdd
-p/--pandas
flag tocode
command to generate Pandas output.
Version 0.2.5#
Rename
read_as_lol
toread_csv
.
Version 0.2.4#
Allow setting the number of characters to read
Simplify printing of skipped potential dialects
Version 0.2.3#
Add
read_as_lol
wrapper function.
Version 0.2.2#
Add
code
command toclevercsv
command line program.
Version 0.2.1#
Bugfix to update executable to new name
Version 0.2.0#
Rename package to clevercsv