clevercsv package#
Subpackages#
Submodules#
clevercsv.break_ties module#
Break ties in the data consistency measure.
Author: Gertjan van den Burg
- clevercsv.break_ties.break_ties_four(data: str, dialects: List[SimpleDialect]) SimpleDialect | None #
Break ties between four dialects.
This function works by breaking the ties between pairs of dialects that result in the same parsing result (if any). If this reduces the number of dialects, then
break_ties_three()
orbreak_ties_two()
is used, otherwise, the tie can’t be broken.Ties are only broken if all dialects have the same delimiter.
- Parameters:
data (str) – The data of the file as a string
dialects (list) – List of SimpleDialect objects
- Returns:
dialect – The chosen dialect if the tie can be broken, None otherwise.
- Return type:
Optional[SimpleDialect]
Notes
We have only observed one case during development where this function was needed. It may need to be revisited in the future if other examples are found.
- clevercsv.break_ties.break_ties_three(data: str, A: SimpleDialect, B: SimpleDialect, C: SimpleDialect) SimpleDialect | None #
Break ties between three dialects.
If the delimiters and the escape characters are all equal, then we look for the dialect that has no quotechar. The tie is broken by calling
break_ties_two()
for the dialect without quotechar and another dialect that gives the same parsing result.If only the delimiter is the same for all dialects then use
break_ties_two()
on the dialects that do not have a quotechar, provided there are only two of these.- Parameters:
data (str) – The data of the file as a string
A (SimpleDialect) – a dialect
B (SimpleDialect) – a dialect
C (SimpleDialect) – a dialect
- Returns:
dialect – The chosen dialect if the tie can be broken, None otherwise.
- Return type:
Optional[SimpleDialect]
Notes
We have only observed one tie for each case during development, so this may need to be improved in the future.
- clevercsv.break_ties.break_ties_two(data: str, A: SimpleDialect, B: SimpleDialect) SimpleDialect | None #
Break ties between two dialects.
This function breaks ties between two dialects that give the same score. We distinguish several cases:
1. If delimiter and escapechar are the same and one of the quote characters is the empty string. We parse the file with both dialects and check if the parsing result is the same. If it is, the correct dialect is the one with no quotechar, otherwise it’s the other one. 2. If quotechar and escapechar are the same and the delimiters are comma and space, then we go for comma. Alternatively, if either of the delimiters is the hyphen, we assume it’s the other dialect. 3. If the delimiter and quotechar is the same and one dialect uses the escapchar and the other doesn’t. We break this tie by checking if the escapechar has an effect and if it occurs an even or odd number of times.
If it’s none of these cases, we don’t break the tie and return None.
- Parameters:
data (str) – The data of the file as a string.
A (SimpleDialect) – A potential dialect
B (SimpleDialect) – A potential dialect
- Returns:
dialect – The chosen dialect if the tie can be broken, None otherwise.
- Return type:
SimpleDialect or None
- clevercsv.break_ties.reduce_pairwise(data: str, dialects: List[SimpleDialect]) List[SimpleDialect] | None #
Reduce the set of dialects by breaking pairwise ties
- Parameters:
data (str) – The data of the file as a string
dialects (list) – List of SimpleDialect objects
- Returns:
dialects – List of SimpleDialect objects.
- Return type:
list
- clevercsv.break_ties.tie_breaker(data: str, dialects: List[SimpleDialect]) SimpleDialect | None #
Break ties between dialects.
This function is used to break ties where possible between two, three, or four dialects that receive the same value for the data consistency measure.
- Parameters:
data (str) – The data as a single string
dialects (list) – Dialects that are tied
- Returns:
dialect – One of the dialects from the list provided or None.
- Return type:
clevercsv.consistency module#
Detect the dialect using the data consistency measure.
Author: Gertjan van den Burg
- class clevercsv.consistency.ConsistencyDetector(skip: bool = True, verbose: bool = False, cache_capacity: int = 100000)#
Bases:
object
Detect the dialect with the data consistency measure
This class uses the data consistency measure to detect the dialect. See the paper for details.
- Parameters:
skip (bool) – Skip computation of the type score for dialects with a low pattern score.
verbose (bool) – Print out the dialects considered and their scores.
cache_capacity (int) – The size of the cache for type detection. Caching the type detection result greatly speeds up the computation of the consistency measure. The size of the cache can be changed to trade off memory use and speed.
- compute_consistency_scores(data: str, dialects: List[SimpleDialect]) Dict[SimpleDialect, ConsistencyScore] #
Compute the consistency score for each dialect
This function computes the consistency score for each dialect. This is done by first computing the pattern score for a dialect. If the class is instantiated with
skip
set to False, it also computes the type score for each dialect. Ifskip
is True (the default), the type score is only computed if the pattern score is larger or equal to the current best combined score.- Parameters:
data (str) – The data of the file as a string
dialects (Iterable[SimpleDialect]) – An iterable of delimiters to consider.
- Returns:
scores – A map with a
ConsistencyScore
object for each dialect provided as input.- Return type:
Dict[SimpleDialect, ConsistencyScore]
- compute_type_score(data: str, dialect: SimpleDialect, eps=1e-10) float #
Compute the type score
- detect(data: str, delimiters: Iterable[str] | None = None) SimpleDialect | None #
Detect the dialect using the consistency measure
- Parameters:
data (str) – The data of the file as a string
delimiters (iterable) – List of delimiters to consider. If None, the
get_delimiters()
function is used to automatically detect this (as described in the paper).
- Returns:
dialect – The detected dialect. If no dialect could be detected, returns None.
- Return type:
- static get_best_dialects(scores: Dict[SimpleDialect, ConsistencyScore]) List[SimpleDialect] #
Identify the dialects with the highest consistency score
- class clevercsv.consistency.ConsistencyScore(P: float, T: float | None, Q: float | None)#
Bases:
object
Container to track the consistency score calculation
- Parameters:
P (float) – The pattern score
T (Optional[float]) – The type score. Can be None if not computed for speed.
Q (Optional[float]) – The consistency score. Can be None if not computed for speed.
- P: float#
- Q: float | None#
- T: float | None#
- clevercsv.consistency.detect_dialect_consistency(data: str, delimiters: Iterable[str] | None = None, skip: bool = True, verbose: bool = False)#
Helper function that wraps ConsistencyDetector
clevercsv.cparser_util module#
Python utility functions that wrap the C parser.
- clevercsv.cparser_util.field_size_limit(*args: Any, **kwargs: Any) int #
Get/Set the limit to the field size.
This function is adapted from the one in the Python CSV module. See the documentation there.
- clevercsv.cparser_util.parse_data(data: Iterable[str], dialect: SimpleDialect | None = None, delimiter: str | None = None, quotechar: str | None = None, escapechar: str | None = None, strict: bool | None = None, return_quoted: bool = False) Iterator[List[str] | List[Tuple[str, bool]]] #
Parse the data given a dialect using the C parser
- Parameters:
data (iterable) – The data of the CSV file as an iterable
dialect (SimpleDialect) – The dialect to use for the parsing. If None, the dialect with each component set to the empty string is used.
delimiter (str) – The delimiter to use. If not None, overwrites the delimiter in the dialect.
quotechar (str) – The quote character to use. If not None, overwrites the quote character in the dialect.
escapechar (str) – The escape character to use. If not None, overwrites the escape character in the dialect.
strict (bool) – Enable strict mode or not. If not None, overwrites the strict mode set in the dialect.
return_quoted (bool) – For each cell, return a tuple “(field, is_quoted)” where the second element indicates whether the cell was a quoted cell or not.
- Yields:
rows (list) – The rows of the file as a list of cells.
:raises Error : clevercsv.exceptions.Error: When an error occurs during parsing.
- clevercsv.cparser_util.parse_string(data: str, dialect: SimpleDialect, return_quoted: bool = False) Iterator[List[str] | List[Tuple[str, bool]]] #
Utility for when the CSV file is encoded as a single string
clevercsv.detect module#
Drop-in replacement for Python Sniffer object.
Author: Gertjan van den Burg
- class clevercsv.detect.Detector#
Bases:
object
Detect the Dialect of CSV files with normal forms or the data consistency measure. This class provides a drop-in replacement for the Python dialect Sniffer from the standard library.
Note
We call the object
Detector
just to mark the difference in the implementation and avoid naming issues. You can import it asfrom ccsv import Sniffer
nonetheless.- detect(sample, delimiters=None, verbose=False, method='auto', skip=True)#
Detect the dialect of a CSV file
This method detects the dialect of the CSV file using the specified detection method.
- Parameters:
sample (str) – A sample of text from the CSV file. For best results and if time allows, use the entire contents of the CSV file as the sample.
delimiters (Optional[Iterable[str]]) – Set of delimiters to consider for dialect detection. The potential dialects will be constructed by analyzing the sample and these delimiters. If omitted, the set of potential delimiters will be constructed from the sample.
verbose (bool) – Enable verbose mode.
method (str) – The method to use for dialect detection. Valid options are “auto” (the default), “normal”, or “consistency”. The “auto” option first attempts to detect the dialect using normal-form detection, and uses the consistency measure if normal-form detection is inconclusive. The “normal” method uses normal-form detection excllusively, and the “consistency” method uses the consistency measure exclusively.
skip (bool) – Whether to skip potential dialects that have too low a pattern score in the consistency detection. See
ConsistencyDetector.compute_consistency_scores()
for more details.
- Returns:
dialect – The detected dialect. Can be None if dialect detection was inconclusive.
- Return type:
Optional[SimpleDialect]
- has_header(sample)#
Detect if a file has a header from a sample.
This function is copied from CPython! The only change we’ve made is to use our dialect detection method.
- sniff(sample, delimiters=None, verbose=False)#
clevercsv.detect_pattern module#
Code for computing the pattern score.
Author: Gertjan van den Burg
- clevercsv.detect_pattern.fill_empties(abstract: str) str #
Fill empty cells in the abstraction
The way the row patterns are constructed assumes that empty cells are marked by the letter C as well. This function fill those in. The function also removes duplicate occurrances of
CC
and replaces these withC
.- Parameters:
abstract (str) – The abstract representation of the file.
- Returns:
abstraction – The abstract representation with empties filled.
- Return type:
str
- clevercsv.detect_pattern.make_abstraction(data: str, dialect: SimpleDialect) str #
Create an abstract representation of the CSV file based on the dialect.
This function constructs the basic abstraction used to compute the row patterns.
- Parameters:
data (str) – The data of the file as a string.
dialect (SimpleDialect) – A dialect to parse the file with.
- Returns:
abstraction – An abstract representation of the CSV file.
- Return type:
str
- clevercsv.detect_pattern.merge_with_quotechar(S: str, dialect: SimpleDialect | None = None) str #
Merge quoted blocks in the abstraction
This function takes the abstract representation and merges quoted blocks (
QC...CQ
) to a single cell (C
). The function takes nested quotes into account.- Parameters:
S (str) – The data of a file as a string
dialect (SimpleDialect) – The dialect used to make the abstraction. This is not used but kept for backwards compatibility. Will be removed in a future version.
- Returns:
abstraction – A simplified version of the abstraction with quoted blocks merged.
- Return type:
str
- clevercsv.detect_pattern.pattern_score(data: str, dialect: SimpleDialect, eps: float = 0.001) float #
Compute the pattern score for given data and a dialect.
- Parameters:
data (str) – The data of the file as a raw character string
dialect (dialect.Dialect) – The dialect object
- Returns:
score – the pattern score
- Return type:
float
- clevercsv.detect_pattern.strip_trailing(abstract: str) str #
Strip trailing row separator from abstraction.
clevercsv.detect_type module#
Code for computing the type score.
Author: Gertjan van den Burg
- class clevercsv.detect_type.TypeDetector(patterns: Dict[str, Pattern] | None = None, strip_whitespace=True)#
Bases:
object
- detect_type(cell: str, is_quoted: bool = False)#
- is_bytearray(cell: str, is_quoted: bool = False) bool #
- is_currency(cell: str, is_quoted: bool = False) bool #
- is_date(cell: str, is_quoted: bool = False) bool #
- is_datetime(cell: str, is_quoted: bool = False) bool #
- is_email(cell: str, is_quoted: bool = False) bool #
- is_empty(cell: str, is_quoted: bool = False) bool #
- is_ipv4(cell: str, is_quoted: bool = False) bool #
- is_json_obj(cell: str, is_quoted: bool = False) bool #
- is_known_type(cell: str, is_quoted: bool = False) bool #
- is_nan(cell: str, is_quoted: bool = False) bool #
- is_number(cell: str, is_quoted: bool = False) bool #
- is_percentage(cell: str, is_quoted: bool = False) bool #
- is_time(cell: str, is_quoted: bool = False) bool #
- is_unicode_alphanum(cell: str, is_quoted: bool = False) bool #
- is_unix_path(cell: str, is_quoted: bool = False) bool #
- is_url(cell: str, is_quoted: bool = False) bool #
- list_known_types() List[str] #
- clevercsv.detect_type.gen_known_type(cells)#
Utility that yields a generator over whether or not the provided cells are of a known type or not.
- clevercsv.detect_type.type_score(data, dialect, eps=1e-10)#
Compute the type score as the ratio of cells with a known type.
- Parameters:
data (str) – the data as a single string
dialect (SimpleDialect) – the dialect to use
eps (float) – the minimum value of the type score
clevercsv.dialect module#
Definitions for the dialect object.
Author: Gertjan van den Burg
- class clevercsv.dialect.SimpleDialect(delimiter: str | None, quotechar: str | None, escapechar: str | None, strict: bool = False)#
Bases:
object
The simplified dialect object.
For the delimiter, quotechar, and escapechar the empty string means no delimiter/quotechar/escapechar in the file. None is used to mark it undefined.
- Parameters:
delimiter (str) – The delimiter of the CSV file.
quotechar (str) – The quotechar of the file.
escapechar (str) – The escapechar of the file.
strict (bool) – Whether strict parsing should be enforced. Same as in the csv module.
- classmethod deserialize(obj: str) SimpleDialect #
Deserialize dialect from a JSON object
- classmethod from_csv_dialect(d: Dialect) SimpleDialect #
- classmethod from_dict(d: Dict[str, Any]) SimpleDialect #
- serialize() str #
Serialize dialect to a JSON object
- to_csv_dialect() Dialect #
- to_dict() Dict[str, str | bool | None] #
- validate() None #
clevercsv.dict_read_write module#
DictReader and DictWriter.
This code is entirely copied from the Python csv module. The only exception is that it uses the reader and writer classes from our package.
Author: Gertjan van den Burg
- class clevercsv.dict_read_write.DictReader(f: Iterable[str], fieldnames: Sequence[_T] | None = None, restkey: str | None = None, restval: str | None = None, dialect: _DialectLike = 'excel', *args: Any, **kwds: Any)#
Bases:
Generic
[_T
],Iterator
[_DictReadMapping[Union[_T, Any], Union[str, Any]]
]- property fieldnames: Sequence[_T]#
- class clevercsv.dict_read_write.DictWriter(f: SupportsWrite[str], fieldnames: Collection[_T], restval: Any | None = '', extrasaction: Literal['raise', 'ignore'] = 'raise', dialect: _DialectLike = 'excel', *args: Any, **kwds: Any)#
Bases:
Generic
[_T
]- writeheader() Any #
- writerow(rowdict: Mapping[_T, Any]) Any #
- writerows(rowdicts: Iterable[Mapping[_T, Any]]) None #
clevercsv.encoding module#
Functionality to detect file encodings
Author: G.J.J. van den Burg License: See the LICENSE file
This file is part of CleverCSV.
- clevercsv.encoding.get_encoding(filename: str | bytes | os.PathLike[str] | os.PathLike[bytes] | int, try_cchardet: bool = True) str | None #
Get the encoding of the file
This function uses the chardet package for detecting the encoding of a file.
- Parameters:
filename (str) – Path to a file
try_cchardet (bool) – Whether to run detection using cChardet if it is available. This can be faster, but may give different results than using chardet.
- Returns:
encoding – Encoding of the file.
- Return type:
str
clevercsv.escape module#
Common functions for dealing with escape characters.
Author: Gertjan van den Burg Date: 2018-11-06
- clevercsv.escape.DEFAULT_BLOCK_CHARS: Set[str] = {'!', '"', '#', '%', '&', "'", '*', ',', '.', ':', ';', '?'}#
Set of default characters to never consider as escape character
- clevercsv.escape.UNICODE_PO_CHARS: Set[str] = {'!', '"', '#', '%', '&', "'", '*', ',', '.', '/', ':', ';', '?', '@', '\\', '¡', '§', '¶', '·', '¿', ';', '·', '՚', '՛', '՜', '՝', '՞', '՟', '։', '׀', '׃', '׆', '׳', '״', '؉', '؊', '،', '؍', '؛', '؞', '؟', '٪', '٫', '٬', '٭', '۔', '܀', '܁', '܂', '܃', '܄', '܅', '܆', '܇', '܈', '܉', '܊', '܋', '܌', '܍', '߷', '߸', '߹', '࠰', '࠱', '࠲', '࠳', '࠴', '࠵', '࠶', '࠷', '࠸', '࠹', '࠺', '࠻', '࠼', '࠽', '࠾', '࡞', '।', '॥', '॰', '৽', '੶', '૰', '౷', '಄', '෴', '๏', '๚', '๛', '༄', '༅', '༆', '༇', '༈', '༉', '༊', '་', '༌', '།', '༎', '༏', '༐', '༑', '༒', '༔', '྅', '࿐', '࿑', '࿒', '࿓', '࿔', '࿙', '࿚', '၊', '။', '၌', '၍', '၎', '၏', '჻', '፠', '፡', '።', '፣', '፤', '፥', '፦', '፧', '፨', '᙮', '᛫', '᛬', '᛭', '᜵', '᜶', '។', '៕', '៖', '៘', '៙', '៚', '᠀', '᠁', '᠂', '᠃', '᠄', '᠅', '᠇', '᠈', '᠉', '᠊', '᥄', '᥅', '᨞', '᨟', '᪠', '᪡', '᪢', '᪣', '᪤', '᪥', '᪦', '᪨', '᪩', '᪪', '᪫', '᪬', '᪭', '᭚', '᭛', '᭜', '᭝', '᭞', '᭟', '᭠', '᯼', '᯽', '᯾', '᯿', '᰻', '᰼', '᰽', '᰾', '᰿', '᱾', '᱿', '᳀', '᳁', '᳂', '᳃', '᳄', '᳅', '᳆', '᳇', '᳓', '‖', '‗', '†', '‡', '•', '‣', '․', '‥', '…', '‧', '‰', '‱', '′', '″', '‴', '‵', '‶', '‷', '‸', '※', '‼', '‽', '‾', '⁁', '⁂', '⁃', '⁇', '⁈', '⁉', '⁊', '⁋', '⁌', '⁍', '⁎', '⁏', '⁐', '⁑', '⁓', '⁕', '⁖', '⁗', '⁘', '⁙', '⁚', '⁛', '⁜', '⁝', '⁞', '⳹', '⳺', '⳻', '⳼', '⳾', '⳿', '⵰', '⸀', '⸁', '⸆', '⸇', '⸈', '⸋', '⸎', '⸏', '⸐', '⸑', '⸒', '⸓', '⸔', '⸕', '⸖', '⸘', '⸙', '⸛', '⸞', '⸟', '⸪', '⸫', '⸬', '⸭', '⸮', '⸰', '⸱', '⸲', '⸳', '⸴', '⸵', '⸶', '⸷', '⸸', '⸹', '⸼', '⸽', '⸾', '⸿', '⹁', '⹃', '⹄', '⹅', '⹆', '⹇', '⹈', '⹉', '⹊', '⹋', '⹌', '⹍', '⹎', '⹏', '、', '。', '〃', '〽', '・', '꓾', '꓿', '꘍', '꘎', '꘏', '꙳', '꙾', '꛲', '꛳', '꛴', '꛵', '꛶', '꛷', '꡴', '꡵', '꡶', '꡷', '꣎', '꣏', '꣸', '꣹', '꣺', '꣼', '꤮', '꤯', '꥟', '꧁', '꧂', '꧃', '꧄', '꧅', '꧆', '꧇', '꧈', '꧉', '꧊', '꧋', '꧌', '꧍', '꧞', '꧟', '꩜', '꩝', '꩞', '꩟', '꫞', '꫟', '꫰', '꫱', '꯫', '︐', '︑', '︒', '︓', '︔', '︕', '︖', '︙', '︰', '﹅', '﹆', '﹉', '﹊', '﹋', '﹌', '﹐', '﹑', '﹒', '﹔', '﹕', '﹖', '﹗', '﹟', '﹠', '﹡', '﹨', '﹪', '﹫', '!', '"', '#', '%', '&', ''', '*', ',', '.', '/', ':', ';', '?', '@', '\', '。', '、', '・', '𐄀', '𐄁', '𐄂', '𐎟', '𐏐', '𐕯', '𐡗', '𐤟', '𐤿', '𐩐', '𐩑', '𐩒', '𐩓', '𐩔', '𐩕', '𐩖', '𐩗', '𐩘', '𐩿', '𐫰', '𐫱', '𐫲', '𐫳', '𐫴', '𐫵', '𐫶', '𐬹', '𐬺', '𐬻', '𐬼', '𐬽', '𐬾', '𐬿', '𐮙', '𐮚', '𐮛', '𐮜', '𐽕', '𐽖', '𐽗', '𐽘', '𐽙', '𑁇', '𑁈', '𑁉', '𑁊', '𑁋', '𑁌', '𑁍', '𑂻', '𑂼', '𑂾', '𑂿', '𑃀', '𑃁', '𑅀', '𑅁', '𑅂', '𑅃', '𑅴', '𑅵', '𑇅', '𑇆', '𑇇', '𑇈', '𑇍', '𑇛', '𑇝', '𑇞', '𑇟', '𑈸', '𑈹', '𑈺', '𑈻', '𑈼', '𑈽', '𑊩', '𑑋', '𑑌', '𑑍', '𑑎', '𑑏', '𑑛', '𑑝', '𑓆', '𑗁', '𑗂', '𑗃', '𑗄', '𑗅', '𑗆', '𑗇', '𑗈', '𑗉', '𑗊', '𑗋', '𑗌', '𑗍', '𑗎', '𑗏', '𑗐', '𑗑', '𑗒', '𑗓', '𑗔', '𑗕', '𑗖', '𑗗', '𑙁', '𑙂', '𑙃', '𑙠', '𑙡', '𑙢', '𑙣', '𑙤', '𑙥', '𑙦', '𑙧', '𑙨', '𑙩', '𑙪', '𑙫', '𑙬', '𑜼', '𑜽', '𑜾', '𑠻', '𑧢', '𑨿', '𑩀', '𑩁', '𑩂', '𑩃', '𑩄', '𑩅', '𑩆', '𑪚', '𑪛', '𑪜', '𑪞', '𑪟', '𑪠', '𑪡', '𑪢', '𑱁', '𑱂', '𑱃', '𑱄', '𑱅', '𑱰', '𑱱', '𑻷', '𑻸', '𑿿', '𒑰', '𒑱', '𒑲', '𒑳', '𒑴', '𖩮', '𖩯', '𖫵', '𖬷', '𖬸', '𖬹', '𖬺', '𖬻', '𖭄', '𖺗', '𖺘', '𖺙', '𖺚', '𖿢', '𛲟', '𝪇', '𝪈', '𝪉', '𝪊', '𝪋', '𞥞', '𞥟'}#
Set of characters in the Unicode “Po” category
- clevercsv.escape.is_potential_escapechar(char: str, encoding: str, block_char: Iterable[str] | None = None) bool #
Check if a character is a potential escape character.
A character is considered a potential escape character if it is in the “Punctuation, Other” Unicode category and not in the list of blocked characters.
- Parameters:
char (str) – The character to check
encoding (str) – The encoding of the character
block_char (Optional[Iterable[str]]) – Characters that are in the Punctuation Other category but that should not be considered as escape character. If None, the default set is used, which is defined in
DEFAULT_BLOCK_CHARS
.
- Returns:
is_escape – Whether the character is considered a potential escape or not.
- Return type:
bool
clevercsv.exceptions module#
Exceptions for CleverCSV
Author: Gertjan van den Burg
- exception clevercsv.exceptions.Error#
Bases:
Error
- exception clevercsv.exceptions.NoDetectionResult#
Bases:
Exception
clevercsv.normal_form module#
Detect the dialect with very strict functional tests.
This module uses so-called “normal forms” to detect the dialect of CSV files. Normal forms are detected with strict functional tests. The normal forms are used as a pre-test to check if files are simple enough that computing the data consistency measure is not necessary.
Author: Gertjan van den Burg
- clevercsv.normal_form.detect_dialect_normal(data, encoding='UTF-8', delimiters=None, verbose=False)#
Detect the normal form of a file from a given sample
- Parameters:
data (str) – The data as a single string
encoding (str) – The encoding of the data
- Returns:
dialect – The dialect detected using normal forms, or None if no such dialect can be found.
- Return type:
- clevercsv.normal_form.even_rows(rows, dialect)#
- clevercsv.normal_form.every_row_has_delim(rows, dialect)#
- clevercsv.normal_form.has_delimiter(string, delim)#
- clevercsv.normal_form.has_nested_quotes(string, quotechar)#
- clevercsv.normal_form.is_any_empty(cell)#
- clevercsv.normal_form.is_any_partial_quoted_cell(cell)#
- clevercsv.normal_form.is_any_quoted_cell(cell)#
- clevercsv.normal_form.is_elementary(cell)#
- clevercsv.normal_form.is_empty_quoted(cell, quotechar)#
- clevercsv.normal_form.is_empty_unquoted(cell)#
- clevercsv.normal_form.is_form_1(data, dialect=None)#
- clevercsv.normal_form.is_form_2(data, dialect)#
- clevercsv.normal_form.is_form_3(data, dialect)#
- clevercsv.normal_form.is_form_4(data, dialect)#
- clevercsv.normal_form.is_form_5(data, dialect)#
- clevercsv.normal_form.is_quoted_cell(cell, quotechar)#
- clevercsv.normal_form.maybe_has_escapechar(data, encoding, delim, quotechar)#
- clevercsv.normal_form.split_file(data)#
- clevercsv.normal_form.split_row(row, dialect)#
- clevercsv.normal_form.strip_trailing_crnl(data)#
clevercsv.potential_dialects module#
Code for selecting the potential dialects of a file.
Author: Gertjan van den Burg
- clevercsv.potential_dialects.filter_urls(data)#
Filter URLs from the data
- clevercsv.potential_dialects.get_delimiters(data, encoding, delimiters=None, block_cat=None, block_char=None)#
Get potential delimiters
The set of potential delimiters is constructed as follows. For each unique character of the file, we check if its Unicode character category is in the set
block_cat
of prohibited categories. If it is, we don’t allow it to be a delimiter, with the exception of Tab (which is in the Control category). We furthermore block characters inblock_char
from being delimiters.- Parameters:
data (str) – The data of the file
encoding (str) – The encoding of the file
delimiters (iterable) – Allowed delimiters. If provided, it overrides the block_cat/block_char mechanism and only the provided characters will be considered delimiters (if they occur in the file). If None, all characters can be considered delimiters subject to the
block_cat
andblock_char
parameters.block_cat (list) –
List of Unicode categories (2-letter abbreviations) for characters that should not be considered as delimiters. If None, the following default set is used:
["Lu", "Ll", "Lt", "Lm", "Lo", "Nd", "Nl", "No", "Ps", "Pe", "Co"]
block_char (list) –
Explicit list of characters that should not be considered delimiters. If None, the following default set is used:
[".", "/", '"', "'", "\n", "\r"]
- Returns:
delims – Set of potential delimiters. The empty string is added by default.
- Return type:
set
- clevercsv.potential_dialects.get_dialects(data, encoding='UTF-8', delimiters=None, test_masked_by_quotes=False)#
Return the possible dialects for the given data.
We consider as escape characters those characters for which is_potential_escapechar() is True and that occur at least once before a quote character or delimiter in the dialect.
One may wonder if self-escaping is an issue here (i.e. “\”, two times backslash). It is not. In a file where a single backslash is desired and escaping with a backslash is used, then it only makes sense to do this in a file where the backslash is already used as an escape character (in which case we include it). If it is never used as escape for the delimiter or quotechar, then it is not necessary to self-escape. This is an assumption, but it holds in general and it reduces noise.
- Parameters:
data (str) – The data for the file
encoding (str) – The encoding of the file
delimiters (iterable) – Set of delimiters to consider. See
get_delimiters()
for more info.test_masked_by_quotes (bool) – Remove dialects where the delimiter is always masked by the quote character. Enabling this typically removes a number of potential dialects from the list, which can remove false positives. It however not a very fast operation, so it is disabled by default.
- Returns:
dialects – List of SimpleDialect objects that are considered potential dialects.
- Return type:
list
- clevercsv.potential_dialects.get_quotechars(data, quote_chars=None)#
Get potential quote characters
Quote characters are those that occur in the
quote_chars
set and are found at least once in the file.- Parameters:
data (str) – The data of the file as a string
quote_chars (iterable) –
Characters that should be considered quote characters. If it is None, the following default set is used:
["'", '"', "~", "`"]
- Returns:
quotes – Set of potential quote characters. The empty string is added by default.
- Return type:
set
- clevercsv.potential_dialects.masked_by_quotechar(data, quotechar, escapechar, test_char)#
Test if a character is always masked by quote characters
This function tests if a given character is always within quoted segments (defined by the quote character). Double quoting and escaping is supported.
- Parameters:
data (str) – The data of the file as a string
quotechar (str) – The quote character
escapechar (str) – The escape character
test_char (str) – The character to test
- Returns:
masked – Returns True if the test character is never outside quoted segements, False otherwise.
- Return type:
bool
- clevercsv.potential_dialects.unicode_category(x, encoding=None)#
Return the Unicode category of a character
- Parameters:
x (str) – character
encoding (str) – Encoding of the character
- Returns:
category – The Unicode category of the character.
- Return type:
str
clevercsv.read module#
Drop-in replacement for the Python csv reader class. This is a wrapper for the
Parser class, defined in cparser
.
Author: Gertjan van den Burg
- class clevercsv.read.reader(csvfile: Iterable[str], dialect: str | Dialect | Type[Dialect] | SimpleDialect = 'excel', **fmtparams: Any)#
Bases:
object
- property dialect: Dialect#
clevercsv.utils module#
Various utilities
Author: Gertjan van den Burg
- clevercsv.utils.pairwise(iterable)#
s - > (s0, s1), (s1, s2), (s2, s3), …
- clevercsv.utils.sha1sum(filename)#
Compute the SHA1 checksum of a given file
- Parameters:
filename (str) – Path to a file
- Returns:
checksum – The SHA1 checksum of the file contents.
- Return type:
str
clevercsv.wrappers module#
Wrappers for some loading/saving functionality.
Author: Gertjan van den Burg
- clevercsv.wrappers.detect_dialect(filename: FileDescriptorOrPath, num_chars: int | None = None, encoding: str | None = None, verbose: bool = False, method: str = 'auto', skip: bool = True) SimpleDialect #
Detect the dialect of a CSV file
This is a utility function that simply returns the detected dialect of a given CSV file.
- Parameters:
filename (str) – The filename of the CSV file.
num_chars (int) – Number of characters to read for the detection. If None, the entire file will be read. Note that limiting the number of characters can reduce the accuracy of the detected dialect.
encoding (str) – The file encoding of the CSV file. If None, it is detected.
verbose (bool) – Enable verbose mode during detection.
method (str) – Dialect detection method to use. Either ‘normal’ for normal form detection, ‘consistency’ for the consistency measure, or ‘auto’ for first normal and then consistency.
skip (bool) – Skip computation of the type score for dialects with a low pattern score.
- Returns:
dialect – The detected dialect as a
SimpleDialect
, or None if detection failed.- Return type:
- clevercsv.wrappers.read_dataframe(filename: FileDescriptorOrPath, *args: Any, num_chars: int | None = None, **kwargs: Any) pd.DataFrame #
Read a CSV file to a Pandas dataframe
This function uses CleverCSV to detect the dialect, and then passes this to the
read_csv
function in pandas. Additional arguments and keyword arguments are passed toread_csv
as well.- Parameters:
filename (str) – The filename of the CSV file. At the moment, only local files are supported.
*args – Additional arguments for the
pandas.read_csv
function.num_chars (int) –
Number of characters to use for dialect detection. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
**kwargs – Additional keyword arguments for the
pandas.read_csv
function. You can specify the file encoding here if needed, and it will be used during dialect detection.
- clevercsv.wrappers.read_dicts(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) List[_DictReadMapping] #
Read a CSV file as a list of dictionaries
This function returns the rows of the CSV file as a list of dictionaries. The keys of the dictionaries are assumed to be in the first row of the CSV file. The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the Clevercsv
clevercsv.DictReader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns rows of the file as a list of dictionaries.
- Return type:
list
- Raises:
NoDetectionResult – When the dialect detection fails.
- clevercsv.wrappers.read_table(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) List[List[str]] #
Read a CSV file as a table (a list of lists)
This is a convenience function that reads a CSV file and returns the data as a list of lists (= rows). The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the CleverCSV
clevercsv.reader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns rows as a list of lists.
- Return type:
list
- Raises:
NoDetectionResult – When the dialect detection fails.
- clevercsv.wrappers.stream_dicts(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) Iterator['_DictReadMapping'] #
Read a CSV file as a generator over dictionaries
This function streams the rows of the CSV file as dictionaries. The keys of the dictionaries are assumed to be in the first row of the CSV file. The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the Clevercsv
clevercsv.DictReader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns file as a generator over rows as dictionaries.
- Return type:
generator
- Raises:
NoDetectionResult – When the dialect detection fails.
- clevercsv.wrappers.stream_table(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) Iterator[List[str]] #
Read a CSV file as a generator over rows of a table
This is a convenience function that reads a CSV file and returns the data as a generator of rows. The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the CleverCSV
clevercsv.reader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns file as a generator over rows.
- Return type:
generator
- Raises:
NoDetectionResult – When the dialect detection fails.
- clevercsv.wrappers.write_dicts(items: Iterable[Mapping[_T, Any]], filename: FileDescriptorOrPath, dialect: _DialectLike = 'excel', encoding: str | None = None) None #
Write a list of dicts to a file
This is a convenience function to write dicts to a file. The header is extracted from the keys of the first item, so an OrderedDict is recommended to control the order of the headers in the output. If the list of items is empty, no output file is created.
- Parameters:
items (list) – List of dicts to export
filename (str) – The filename of the CSV file to write the table to
dialect (str, SimpleDialect, or csv.Dialect) – The dialect to use. The default is the ‘excel’ dialect, which corresponds to RFC4180.
encoding (str) – Encoding to use to write the data to the file. Note that the default encoding is platform dependent, which ensures compatibility with the Python open() function. It thus defaults to locale.getpreferredencoding().
- clevercsv.wrappers.write_table(table: Iterable[Iterable[Any]], filename: FileDescriptorOrPath, dialect: _DialectLike = 'excel', transpose: bool = False, encoding: str | None = None) None #
Write a table (a list of lists) to a file
This is a convenience function for writing a table to a CSV file. If the table has no rows, no output file is created.
- Parameters:
table (list) – A table as a list of lists. The table must have the same number of cells in each row (taking the
transpose
flag into account).filename (str) – The filename of the CSV file to write the table to.
dialect (SimpleDialect or csv.Dialect) – The dialect to use. The default is the ‘excel’ dialect, which corresponds to RFC4180. This is done to encourage more standardized CSV files.
transpose (bool) – Transpose the table before writing.
encoding (str) – Encoding to use to write the data to the file. Note that the default encoding is platform dependent, which ensures compatibility with the Python open() function. It thus defaults to locale.getpreferredencoding().
- Raises:
ValueError: – When the length of the rows is not constant.
clevercsv.write module#
Drop-in replacement for the Python csv writer class.
Author: Gertjan van den Burg
Module contents#
- class clevercsv.Detector#
Bases:
object
Detect the Dialect of CSV files with normal forms or the data consistency measure. This class provides a drop-in replacement for the Python dialect Sniffer from the standard library.
Note
We call the object
Detector
just to mark the difference in the implementation and avoid naming issues. You can import it asfrom ccsv import Sniffer
nonetheless.- detect(sample, delimiters=None, verbose=False, method='auto', skip=True)#
Detect the dialect of a CSV file
This method detects the dialect of the CSV file using the specified detection method.
- Parameters:
sample (str) – A sample of text from the CSV file. For best results and if time allows, use the entire contents of the CSV file as the sample.
delimiters (Optional[Iterable[str]]) – Set of delimiters to consider for dialect detection. The potential dialects will be constructed by analyzing the sample and these delimiters. If omitted, the set of potential delimiters will be constructed from the sample.
verbose (bool) – Enable verbose mode.
method (str) – The method to use for dialect detection. Valid options are “auto” (the default), “normal”, or “consistency”. The “auto” option first attempts to detect the dialect using normal-form detection, and uses the consistency measure if normal-form detection is inconclusive. The “normal” method uses normal-form detection excllusively, and the “consistency” method uses the consistency measure exclusively.
skip (bool) – Whether to skip potential dialects that have too low a pattern score in the consistency detection. See
ConsistencyDetector.compute_consistency_scores()
for more details.
- Returns:
dialect – The detected dialect. Can be None if dialect detection was inconclusive.
- Return type:
Optional[SimpleDialect]
- has_header(sample)#
Detect if a file has a header from a sample.
This function is copied from CPython! The only change we’ve made is to use our dialect detection method.
- sniff(sample, delimiters=None, verbose=False)#
- class clevercsv.DictReader(f: Iterable[str], fieldnames: Sequence[_T] | None = None, restkey: str | None = None, restval: str | None = None, dialect: _DialectLike = 'excel', *args: Any, **kwds: Any)#
Bases:
Generic
[_T
],Iterator
[_DictReadMapping[Union[_T, Any], Union[str, Any]]
]- property fieldnames: Sequence[_T]#
- class clevercsv.DictWriter(f: SupportsWrite[str], fieldnames: Collection[_T], restval: Any | None = '', extrasaction: Literal['raise', 'ignore'] = 'raise', dialect: _DialectLike = 'excel', *args: Any, **kwds: Any)#
Bases:
Generic
[_T
]- writeheader() Any #
- writerow(rowdict: Mapping[_T, Any]) Any #
- writerows(rowdicts: Iterable[Mapping[_T, Any]]) None #
- exception clevercsv.Error#
Bases:
Error
- clevercsv.detect_dialect(filename: FileDescriptorOrPath, num_chars: int | None = None, encoding: str | None = None, verbose: bool = False, method: str = 'auto', skip: bool = True) SimpleDialect #
Detect the dialect of a CSV file
This is a utility function that simply returns the detected dialect of a given CSV file.
- Parameters:
filename (str) – The filename of the CSV file.
num_chars (int) – Number of characters to read for the detection. If None, the entire file will be read. Note that limiting the number of characters can reduce the accuracy of the detected dialect.
encoding (str) – The file encoding of the CSV file. If None, it is detected.
verbose (bool) – Enable verbose mode during detection.
method (str) – Dialect detection method to use. Either ‘normal’ for normal form detection, ‘consistency’ for the consistency measure, or ‘auto’ for first normal and then consistency.
skip (bool) – Skip computation of the type score for dialects with a low pattern score.
- Returns:
dialect – The detected dialect as a
SimpleDialect
, or None if detection failed.- Return type:
- class clevercsv.excel#
Bases:
Dialect
Describe the usual properties of Excel-generated CSV files.
- delimiter = ','#
- doublequote = True#
- lineterminator = '\r\n'#
- quotechar = '"'#
- quoting = 0#
- skipinitialspace = False#
- class clevercsv.excel_tab#
Bases:
excel
Describe the usual properties of Excel-generated TAB-delimited files.
- delimiter = '\t'#
- clevercsv.field_size_limit(*args: Any, **kwargs: Any) int #
Get/Set the limit to the field size.
This function is adapted from the one in the Python CSV module. See the documentation there.
- clevercsv.read_dataframe(filename: FileDescriptorOrPath, *args: Any, num_chars: int | None = None, **kwargs: Any) pd.DataFrame #
Read a CSV file to a Pandas dataframe
This function uses CleverCSV to detect the dialect, and then passes this to the
read_csv
function in pandas. Additional arguments and keyword arguments are passed toread_csv
as well.- Parameters:
filename (str) – The filename of the CSV file. At the moment, only local files are supported.
*args – Additional arguments for the
pandas.read_csv
function.num_chars (int) –
Number of characters to use for dialect detection. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
**kwargs – Additional keyword arguments for the
pandas.read_csv
function. You can specify the file encoding here if needed, and it will be used during dialect detection.
- clevercsv.read_dicts(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) List[_DictReadMapping] #
Read a CSV file as a list of dictionaries
This function returns the rows of the CSV file as a list of dictionaries. The keys of the dictionaries are assumed to be in the first row of the CSV file. The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the Clevercsv
clevercsv.DictReader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns rows of the file as a list of dictionaries.
- Return type:
list
- Raises:
NoDetectionResult – When the dialect detection fails.
- clevercsv.read_table(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) List[List[str]] #
Read a CSV file as a table (a list of lists)
This is a convenience function that reads a CSV file and returns the data as a list of lists (= rows). The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the CleverCSV
clevercsv.reader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns rows as a list of lists.
- Return type:
list
- Raises:
NoDetectionResult – When the dialect detection fails.
- class clevercsv.reader(csvfile: Iterable[str], dialect: str | Dialect | Type[Dialect] | SimpleDialect = 'excel', **fmtparams: Any)#
Bases:
object
- property dialect: Dialect#
- clevercsv.stream_dicts(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) Iterator['_DictReadMapping'] #
Read a CSV file as a generator over dictionaries
This function streams the rows of the CSV file as dictionaries. The keys of the dictionaries are assumed to be in the first row of the CSV file. The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the Clevercsv
clevercsv.DictReader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns file as a generator over rows as dictionaries.
- Return type:
generator
- Raises:
NoDetectionResult – When the dialect detection fails.
- clevercsv.stream_table(filename: FileDescriptorOrPath, dialect: _DialectLike | None = None, encoding: str | None = None, num_chars: int | None = None, verbose: bool = False) Iterator[List[str]] #
Read a CSV file as a generator over rows of a table
This is a convenience function that reads a CSV file and returns the data as a generator of rows. The dialect will be detected automatically, unless it is provided.
- Parameters:
filename (str) – Path of the CSV file
dialect (str, SimpleDialect, or csv.Dialect object) – If the dialect is known, it can be provided here. This function uses the CleverCSV
clevercsv.reader
object, which supports various dialect types (string, SimpleDialect, or csv.Dialect). If None, the dialect will be detected.encoding (str) – The encoding of the file. If None, it is detected.
num_chars (int) –
Number of characters to use to detect the dialect. If None, use the entire file.
Note that using less than the entire file will speed up detection, but can reduce the accuracy of the detected dialect.
verbose (bool) – Whether or not to show detection progress.
- Returns:
rows – Returns file as a generator over rows.
- Return type:
generator
- Raises:
NoDetectionResult – When the dialect detection fails.
- class clevercsv.unix_dialect#
Bases:
Dialect
Describe the usual properties of Unix-generated CSV files.
- delimiter = ','#
- doublequote = True#
- lineterminator = '\n'#
- quotechar = '"'#
- quoting = 1#
- skipinitialspace = False#
- clevercsv.write_table(table: Iterable[Iterable[Any]], filename: FileDescriptorOrPath, dialect: _DialectLike = 'excel', transpose: bool = False, encoding: str | None = None) None #
Write a table (a list of lists) to a file
This is a convenience function for writing a table to a CSV file. If the table has no rows, no output file is created.
- Parameters:
table (list) – A table as a list of lists. The table must have the same number of cells in each row (taking the
transpose
flag into account).filename (str) – The filename of the CSV file to write the table to.
dialect (SimpleDialect or csv.Dialect) – The dialect to use. The default is the ‘excel’ dialect, which corresponds to RFC4180. This is done to encourage more standardized CSV files.
transpose (bool) – Transpose the table before writing.
encoding (str) – Encoding to use to write the data to the file. Note that the default encoding is platform dependent, which ensures compatibility with the Python open() function. It thus defaults to locale.getpreferredencoding().
- Raises:
ValueError: – When the length of the rows is not constant.