Changelog¶

Version 0.8.5¶

Fix reference counting bug in C extension #162, thanks to @wr-web
Fix additional issues in C extensions
Migrate away from setup.py to pyproject.toml

Version 0.8.4¶

Bump minimal Python version to 3.9
Minor typing fixes

Version 0.8.3¶

Allow users to specify output encoding for some CLI commands (thanks to @jbdesbas)
Optimize the normal-form detection (thanks to @no23reason)
Internal: fix names of C modules

Version 0.8.2¶

Add more type hints to CleverCSV
Move the import of the optional tabview dependency to where it’s needed (for #101)
Allow inspecting more rows for header detection (fixes #98)

Version 0.8.1¶

Add type hints to CleverCSV
Disable 32-bit builds on Windows and Linux
Bump minimal Python version to 3.8
Minor documentation improvements

Version 0.8.0¶

Improve median runtime by ~68% (~52% on average) by: 1) more caching, 2) implementing a heavy function in C.
Redesign computation of consistency measure to a class: ConsistencyDetector.
Fix potential memory leak in C code for base abstraction
Fixes to escape sequences in regexes (thanks to @JakobGM!)
Various improvements to code quality
Switch documentation style to furo.

Version 0.7.7¶

Use r-prefix for regex patterns (thanks to @JakobGM!)
Fix documentation typo (thanks to @Aritra8438!)

Version 0.7.6¶

Simplify faust-cchardet import for Windows builds

Version 0.7.5¶

Add support for Python 3.11 by fixing a bug regarding empty strings in dialects (thanks to @stefanor!)
Fix installation error due to change in internals at setuptools (thanks to @mweinelt!)
Migrate to faust-cchardet as cChardet fails to install on Python 3.11 (on Windows, currently only chardet will work for Python 3.11)
Migrate to packaging for version comparison

Version 0.7.4¶

Add wrapper for writing a list of dictionaries (write_dicts)
Fix bug when writing CSVs using the csv module dialects
Add the builtin dialects to CleverCSV (e.g., clevercsv.excel)

Version 0.7.3¶

Release to build wheels for Python 3.10

Version 0.7.2¶

Re-implement command line interface using Wilderness
Add man-pages to package

Version 0.7.1¶

Remove deprecated wrapper functions
Expand URL regex to support localhost:<port> urls
Minor changes to the TypeDetector API
Add cChardet as optional dependency (fixes #48)

Version 0.7.0¶

Add a JSON object data type to address a specific failure case (#37).
Add support for timezones for time data type
Add support for building wheels on non-native architectures (#39).
Add a flag to disable skipping type detection using the command line interface.

Version 0.6.8¶

Add a “bytearray” type to address a specific failure case (#35).
Minor clarifications to licensing.

Version 0.6.7¶

Updates to release process. This version introduces pre-compiled wheels for Python 3.9.

Version 0.6.6¶

Add an encoding argument to write_table to allow specifying the output encoding. Thanks to @mitchgrogg for reporting issue #27.

Version 0.6.5¶

Add support for standardizing in-place and standardizing multiple files.
Add warning on duplicate field names in DictReader
Add return value to writers to match the standard library.

Version 0.6.4¶

Various speed ups to constructing the list of potential dialects. This removes a costly step of the detection process that will likely add a few more potential dialects, but has the end result of making overall dialect detection faster.

Version 0.6.3¶

Rename wrapper functions to a more coherent naming scheme. Old names will be available until 0.7.0, but now produce a FutureWarning.
Add stream_dicts wrapper function.
Improve handling of file encoding for the read_dataframe wrapper: detected encoding is now passed on to Pandas.
Fix handling of optional dependency error for TabView on non-Windows platforms.

Version 0.6.2¶

Update URL regex to avoid catastrophic backtracking and increase performance. See issue #13 and issue #15. Thanks to @kaskawu for the fix and @jlumbroso for re-raising the issue.
Add num_chars keyword argument to read_as_dicts and csv2df wrappers.
Improve documentation w.r.t. handling large files. Thanks to @jlumbroso for raising this issue.

Version 0.6.1¶

Add an explore command to the command line application for CleverCSV. This command makes it easy to start exploring a CSV file using the Python interactive shell.

Version 0.6.0¶

Split the package into a “core” and “full” version. This allows users who only need the improved dialect detection functionality to download a version with a smaller footprint. Fixes issue #10]. Thanks to @seperman.

Version 0.5.6¶

Fix speed of unix_path regex used in type detection. (issue #13). Thanks to @kaskawu.

Version 0.5.5¶

Add stream_csv wrapper that returns a generator over rows
Minor update to the URL type detection
Documentation updates

Version 0.5.4¶

Fix bugs discovered from fuzz testing (issue #7)
Minor changes to readme and code quality

Version 0.5.3¶

Fix using nan as default value when skipping a dialect (issue #5)

Version 0.5.2¶

Bump version to fix wheel building

Version 0.5.1¶

Bump version to fix wheel building

Version 0.5.0¶

Improve type detection for quoted alphanumeric cells (#4)
Pass strict dialect property to parser.

Version 0.4.7¶

Bugfix for write_table wrapper on Windows.
Move building Windows platform wheels to Travis.
Use cibuildwheel version 1.0.0 for building wheels.

Version 0.4.6¶

Add a wrapper function that writes a table to a CSV file.

Version 0.4.5¶

Update CleverCSV to match updated clikit dependency
Fix dependency versions for clikit and cleo

Version 0.4.4¶

Update standardize command to use CRLF line endings on all platforms.
Add work around for Tabview being unavailable on Windows.
Remove packaging and dependency management with Poetry.
Add support for building platform wheels on Travis and AppVeyor.

Version 0.4.3¶

Add optional method parameter to dialect detector.
Bugfix for clevercsv code command when the delimiter is tab.

Version 0.4.2¶

Fix a failing build due to dependency version mismatch

Version 0.4.1¶

Allow underscore in alphanumeric strings
Update unix path regular expression
Add more integration tests and log detection method

Version 0.4.0¶

Update URL regular expression and add unit tests
Add IPv4 type detection
Add tie-breaker for combined quotechar and escapechar ties

Version 0.3.7¶

Bugfix for console script code command
Update readme

Version 0.3.6¶

Cleanly handle failure to detect dialect in console application
Remove any (partial) support for Python 2

Version 0.3.5¶

Remove Python parser - this speeds up file reading and tie breaking

Version 0.3.4¶

Ensure the C parser is used in the reader.
Update integration tests to improve error handling
Readme updates

Version 0.3.3¶

Ensure detected encoding is in the generated Python code for the clevercsv code command.
Ensure encoding is detected in wrappers.detect_dialect.
Bugfix in integration test
Expand readme

Version 0.3.2¶

Add documentation on Read the Docs
Use requirements.txt file for dependencies when packaging

Version 0.3.1¶

Add help description to each CLI command
Update README
Add transpose flag for standardize and view commands

Version 0.3.0¶

Rewrite console application using Cleo
Add unit tests for console application
Add detect_dialect wrapper function
Add support for “unix_path” data type in type detection
Add encoding and num_chars options to read_csv wrapper
Add -p/--pandas flag to code command to generate Pandas output.

Version 0.2.5¶

Rename read_as_lol to read_csv.

Version 0.2.4¶

Allow setting the number of characters to read
Simplify printing of skipped potential dialects

Version 0.2.3¶

Add read_as_lol wrapper function.

Version 0.2.2¶

Add code command to clevercsv command line program.

Version 0.2.1¶

Bugfix to update executable to new name

Version 0.2.0¶

Rename package to clevercsv