Miscellaneous Functions¶
Utility functions for validating and formatting Brazilian tax identifiers (CNPJ) and international securities identifiers (ISIN). Functions marked vectorized operate element-wise on NumPy arrays.
CNPJ¶
Brazilian companies are identified by a 14-digit CNPJ number. The functions below work with both the raw numeric string ('17210185000161') and the formatted mask ('17.210.185/0001-61').
is_valid_cnpj¶
Vectorized. Returns True if value is a structurally valid CNPJ. When check_dv=True, also validates the two check digits.
expression_engine.solve('is_valid_cnpj("17210185000161")', {})
# True
expression_engine.solve('is_valid_cnpj("00000000000000")', {})
# True # structurally valid, even if not officially issued
expression_engine.solve('is_valid_cnpj("INVALID")', {})
# False
import numpy as np
expression_engine.solve('is_valid_cnpj(values)', {
'values': np.array(['17210185000161', '12345678000195', 'INVALID', None])
})
# [True, True, False, False]
sanitize_cnpj¶
Vectorized. Strips all non-numeric characters from value and returns the raw 14-digit string. When zfill=True (default), left-pads the result with zeros to ensure 14 digits.
expression_engine.solve('sanitize_cnpj("17.210.185/0001-61")')
# '17210185000161'
expression_engine.solve('sanitize_cnpj(17210185000161)')
# '17210185000161'
normalize_cnpj¶
Vectorized. Normalizes value to a canonical 14-digit CNPJ string. Unlike sanitize_cnpj, this function validates the result and raises or returns None on invalid input, depending on errors.
errors |
Behavior on invalid input |
|---|---|
'raise' |
Raises an exception |
'coerce' |
Returns None |
expression_engine.solve('normalize_cnpj("17.210.185/0001-61")')
# '17210185000161'
expression_engine.solve('normalize_cnpj("invalid", "coerce")')
# None
format_cnpj¶
Vectorized. Formats value as XX.XXX.XXX/XXXX-XX. On invalid input, raises or returns None depending on errors.
expression_engine.solve('format_cnpj("17210185000161")')
# '17.210.185/0001-61'
expression_engine.solve('format_cnpj("invalid", "coerce")')
# None
import numpy as np
expression_engine.solve('format_cnpj(values, "coerce")', {
'values': np.array(['17210185000161', 'invalid'])
})
# ['17.210.185/0001-61', None]
generate_cnpj¶
Generates a random 14-digit CNPJ string for testing. When valid_dv=True, the generated CNPJ will have correct check digits.
expression_engine.solve('generate_cnpj()')
# '48291037000158' (example — output varies)
expression_engine.solve('generate_cnpj(True)')
# a 14-digit string that passes is_valid_cnpj with check_dv=True
ISIN¶
An ISIN (International Securities Identification Number) is a 12-character code: 2-letter country prefix, 9-character national identifier, and 1 check digit.
is_valid_isin¶
Vectorized. Returns True if value is a valid ISIN. When check_digit=True (default), also validates the check digit using the Luhn algorithm.
expression_engine.solve('is_valid_isin("US0378331005")')
# True (Apple Inc.)
expression_engine.solve('is_valid_isin("BRPETRACNPR6")')
# True (Petrobras PN)
expression_engine.solve('is_valid_isin("INVALID")')
# False
expression_engine.solve('is_valid_isin("US0378331004")')
# False (wrong check digit)
expression_engine.solve('is_valid_isin("US0378331004", False)')
# True (valid format, check digit skipped)
import numpy as np
expression_engine.solve('is_valid_isin(values)', {
'values': np.array(['US0378331005', 'BRPETRACNPR6', 'INVALID', None])
})
# [True, True, False, False]