parseval
: A pythonic data validator
parseval is a data validation tool for python. Following are the available parsers:
FieldParser:
Signature: FieldParser(start: int = 0, end: int = 0, quoted: int = 0, enforce_type: bool = True)
Parameters:
-
start
: Start position of the data in row -
end
: End position of the data in row -
quoted
: Data quotation options - {0: Not Quoted, 1: Double Quoted, 2: Single Quoted} -
enforce_type
: Type conversion control - {True: Output data type will bestr
, False: Datatype of input will be preserved}. By-default it is set otTrue
.
Available APIs:
Signature: not_null(default_value: any = None)
Parameters:
default_value
: Default value for a column which should be not nullSignature: value_set(values: typing.List, nullable: bool = True)
Parameters:
values
: Set of valid values for this columnnullable
: If set toTrue
thenempty string
andNone
will be treated as valid value, along with the provided value listSignature: max_value(value: any)
Parameters:
values
: Maximum allowed value for the columnSignature: min_value(value: any)
Parameters:
values
: Minimum allowed value for the columnSignature: range(lower_bound: any, upper_bound: any)
Parameters:
lower_bound
: Minimum allowed value for the columnupper_bound
: Maximum allowed value for the columnSignature: add_func(f: function)
Parameters:
f
: Custom function to be added to the parser
StringParser:
Signature: StringParser(start: int = 0, end: int = 0, quoted: int = 0, enforce_type: bool = True)
Parameters:
-
start
: Start position for the column in the row -
end
: End position for the column in the row -
quoted
: Data quotation options - {0: Not Quoted, 1: Double Quoted, 2: Single Quoted} -
enforce_type
: Type conversion control - {True: Output data type will bestr
, False: Datatype of input will be preserved}. By-default it is set otTrue
.
Available APIs:
Signature: not_null(default_value: str = None, allow_white_space: bool = False)
Parameters:
default_value
: Default value for a column which should be not nullallow_white_space
: If set toTrue
, whitespaces will not be treated asNull
value.Signature: value_set(values: typing.List[str], nullable: bool = True)
Parameters:
values
: Set of valid values for this columnnullable
: If set toTrue
then empty string and None will be treated as valid value, along with the provided value listSignature: max_value(value: str)
Parameters:
values
: Maximum allowed value for the columnSignature: min_value(value: str)
Parameters:
values
: Minimum allowed value for the columnSignature: regex_match(pattern: str, nullable=True)
Parameters:
pattern
: Patter to match with the datanullable
: If set toTrue
then empty string and None will be treated as valid value, along with the values that matches providedpattern
Signature: change_case(case_type: str = 'S')
Parameters:
case_type
: Target case: {'U'/'u': UPPERCASE, 'L'/'l': lowercase, 'S'/'s': Sentence Case}Signature: range(lower_bound: str, upper_bound: str)
Parameters:
lower_bound
: Minimum allowed value for the columnupper_bound
: Maximum allowed value for the columnSignature: add_func(f: function)
Parameters:
f
: Custom function to be added to the parser
FloatParser:
Signature: FloatParser(start: int = 0, end: int = 0, quoted: int = 0, enforce_type: bool = True)
Parameters:
-
start
: Start position for the column in the row -
end
: End position for the column in the row -
quoted
: Data quotation options - {0: Not Quoted, 1: Double Quoted, 2: Single Quoted} -
enforce_type
: Type conversion control - {True: Output data type will befloat
, False: Datatype of input will be preserved}. By-default it is set otTrue
.
Available APIs:
Signature: not_null(default_value: float = None)
Parameters:
default_value
: Default value for a column which should be not nullSignature: value_set(values: typing.List[float], nullable: bool = True)
Parameters:
values
: Set of valid values for this columnnullable
: If set toTrue
then empty string and None will be treated as valid value, along with the provided value listSignature: max_value(value: float)
Parameters:
values
: Maximum allowed value for the columnSignature: min_value(value: float)
Parameters:
values
: Minimum allowed value for the columnSignature: range(lower_bound: float, upper_bound: float)
Parameters:
lower_bound
: Minimum allowed value for the columnupper_bound
: Maximum allowed value for the columnSignature: add_func(f: function)
Parameters:
f
: Custom function to be added to the parser
IntegerParser:
Signature: IntegerParser(start: int = 0, end: int = 0, quoted: int = 0, enforce_type: bool = True)
Parameters:
-
start
: Start position for the column in the row -
end
: End position for the column in the row -
quoted
: Data quotation options - {0: Not Quoted, 1: Double Quoted, 2: Single Quoted} -
enforce_type
: Type conversion control - {True: Output data type will beint
, False: Datatype of input will be preserved}. By-default it is set otTrue
.
Available APIs:
Signature: not_null(default_value: int = None)
Parameters:
default_value
: Default value for a column which should be not nullSignature: value_set(values: typing.List[int], nullable: bool = True)
Parameters:
values
: Set of valid values for this columnnullable
: If set toTrue
then empty string and None will be treated as valid value, along with the provided value listSignature: max_value(value: int)
Parameters:
values
: Maximum allowed value for the columnSignature: min_value(value: int)
Parameters:
values
: Minimum allowed value for the columnSignature: range(lower_bound: int, upper_bound: int)
Parameters:
lower_bound
: Minimum allowed value for the columnupper_bound
: Maximum allowed value for the columnSignature: add_func(f: function)
Parameters:
f
: Custom function to be added to the parser
BooleanParser:
Signature: BooleanParser(start: int = 0, end: int = 0, quoted: int = 0, enforce_type: bool = True)
Parameters:
-
start
: Start position for the column in the row -
end
: End position for the column in the row -
quoted
: Data quotation options - {0: Not Quoted, 1: Double Quoted, 2: Single Quoted} -
enforce_type
: Type conversion control - {True: Output data type will beint
, False: Datatype of input will be preserved}. By-default it is set otTrue
.
DatetimeParser:
Signature: DatetimeParser(start: int = 0, end: int = 0, formats: typing.List =['%Y%m%d', '%Y%m%d%H%M%S'], quoted: int = 0, enforce_type: bool = True)
Parameters:
-
start
: Start position for the column in the row -
end
: End position for the column in the row -
formats
: Format of date/datetime used in the input data -
quoted
: Data quotation options - {0: Not Quoted, 1: Double Quoted, 2: Single Quoted} -
enforce_type
: Type conversion control - {True: Output data type will bedatetime.datetime
, False: Datatype of input will be preserved}. By-default it is set otTrue
.
Available APIs:
Signature: not_null(default_value: typing.Union[str, datetime.datetime] = None, format: str = '%Y%m%d%H%M%S')
Parameters:
default_value
: Default value for a column which should be not nullformat
: Provided default value format, if a datetime object is provided as default value, then this parameter has no effect.Signature: value_set(values: typing.List[typing.Union[str, datetime.datetime]], format='%Y%m%d%H%M%S', nullable: bool = True)
Parameters:
values
: Set of valid values for this columnformat
: Provided allowed value's format, if a datetime object is provided as allowed value, then this parameter has no effect.nullable
: If set toTrue
then empty string and None will be treated as valid value, along with the provided value listSignature: max_value(value: typing.Union[str, datetime.datetime], format: str = '%Y%m%d%H%M%S')
Parameters:
values
: Maximum allowed value for the columnformat
: Provided allowed value's format, if a datetime object is provided as allowed value, then this parameter has no effect.Signature: min_value(value: typing.Union[str, datetime.datetime], format: str = '%Y%m%d%H%M%S')
Parameters:
values
: Minimum allowed value for the columnformat
: Provided allowed value's format, if a datetime object is provided as allowed value, then this parameter has no effect.Signature: range(lower_bound: typing.Union[str, datetime.datetime], upper_bound: typing.Union[str, datetime.datetime], format='%Y%m%d%H%M%S')
Parameters:
lower_bound
: Minimum allowed value for the columnupper_bound
: Maximum allowed value for the columnformat
: Provided allowed value's format, if a datetime object is provided as allowed value, then this parameter has no effect.Signature: add_func(f: function)
Parameters:
f
: Custom function to be added to the parser
ConstantParser:
Signature: ConstantParser(value)
Parameters:
value
: The parser will always return this value irrespective of the input data and called methods.
Parser:
Signature: Parser(schema: typing.List[typing.Tuple] = [], input_row_format: str = "delimited", input_row_sep: str = "|", parsed_row_format: str = "delimited", parsed_row_sep: str = None, stop_on_error: int = 0)
Parameters:
-
schema
: Parser schema for the row. -
input_row_format
: Type of row, for json row providejson
and for delimited rows providedelimited
. -
input_row_sep
: Delimiter of input row, not required forinput_row_format = "json"
scenarios. -
parsed_row_format
: Type of output row, if output row is required in json format then providejson
and if delimited rows are required providedelimited
. Note: "fixed-width" output is not supported. -
parsed_row_sep
: Delimiter of output row, not required forparsed_row_format = "json"
scenarios. -
stop_on_error
: Whenstop_on_error
value is set to 0, the process will stop on encountering an validation error. Whenstop_on_error
value is set to any negative number, the process will skip all erroneous rows and return only valid rows and whenstop_on_error
value is set to any specific positive number, the process will allow those many erroneous rows, if erroneous rows exceeds that number, the process will fail.
Available APIs:
Signature: parse(data: typing.Union[typing.List[typing.Union[str, typing.Dict]], typing.TextIO])
Parameters:
data
: Input data set
For detailed installation and usage information please check https://github.com/saumalya75/parseval here.
For any further queries reach out to saumalya75@gmail.com or http://linkedin.com/in/saumalya-sarkar-b3712817b .