xoutil.string - Common string operations.

Exposes all original string module functionalities, with some general additions.

In this module str and unicode types are not used because Python 2.x and Python 3.x treats strings differently. bytes and text_type will be used instead with the following conventions:

  • In Python 2.x str is synonym of bytes and both (unicode and ‘str’) are both string types inheriting form basestring.

  • In Python 3.x str is always unicode but unicode and basestring types doesn’t exists. bytes type can be used as an array of one byte each item.

    Many methods are readjusted to these conditions.

xoutil.string.capitalize(value, title=True)[source]

Capitalizes value according to whether it should be title-like.

Title-like means it will capitalize every word but the 3-letters or less unless its the first word:

>>> capitalize('a group is its own worst enemy')
'A Group is its own Worst Enemy'

(This may be odd because, in the example above, own should be capitalized.)

Return bytes or unicode depending on type of value.

>>> from xoutil.eight import text_type
>>> type(capitalize(text_type('something'))) is text_type
True
>>> type(capitalize(str('something'))) is str
True
xoutil.string.capitalize_word(value)[source]

Capitalizes the first char of value

xoutil.string.cut_any_prefix(value, *prefixes)[source]

Apply cut_prefix() for the first matching prefix.

xoutil.string.cut_any_suffix(value, *suffixes)[source]

Apply cut_suffix() for the first matching suffix.

xoutil.string.cut_prefix(value, prefix)[source]

Removes the leading prefix if exists, else return value unchanged.

xoutil.string.cut_prefixes(value, *prefixes)[source]

Apply cut_prefix() for all provided prefixes in order.

xoutil.string.cut_suffix(value, suffix)[source]

Removes the tailing suffix if exists, else return value unchanged.

xoutil.string.cut_suffixes(value, *suffixes)[source]

Apply cut_suffix() for all provided suffixes in order.

xoutil.string.error2str(error)[source]

Convert an error to string.

xoutil.string.force_encoding(encoding=None)[source]

Validates an encoding value; if None use locale.getlocale()[1]; else return the same value.

New in version 1.2.0.

xoutil.string.force_str(value, encoding=None)[source]

Force to string, the type is different in Python 2 or 3 (bytes or unicode).

Parameters:
  • value – The value to convert to str.
  • encoding

    The encoding which should be used if either encoding or decoding should be performed on value.

    The default is to use the same default as safe_encode() or safe_decode().

New in version 1.2.0.

xoutil.string.hyphen_name(name)[source]

Convert a name, normally an identifier, to a hyphened slug.

All transitions from lower to upper capitals (or from digits to letters) are joined with a hyphen.

Also, all invalid characters (those invalid in Python identifiers) are converted to hyphens.

For example:

>>> hyphen_name('BaseNode') == 'base-node'
True
xoutil.string.make_a10z(string)[source]

Utility to find out that “internationalization” is “i18n”.

Examples:

>>> print(make_a10z('parametrization'))
p13n
xoutil.string.normalize_ascii(value)[source]

Return the string normal form for the value

Convert all non-ascii to valid characters using unicode ‘NFKC’ normalization.

xoutil.string.normalize_name(value)[source]
xoutil.string.normalize_slug(value, replacement='-', invalids=None, valids=None)[source]

Return the string normal form, valid for slugs, for the value

Convert all non-ascii to valid characters using unicode ‘NFKC’ normalization.

Lower-case the result.

Replace unwanted characters by replacement, repetition of given pattern will be converted to only one instance.

Warning

There’s a known bug when replacement contains ‘’.

[_a-z0-9] are assumed as valid characters. Extra arguments can modify this standard behaviour:

Parameters:
  • invalids – Any collection of characters added to these that are normally invalid in the provided value. (non-ascii or not included in valid characters). Boolean True can be passed as a synonymous of "_" for compatibility with old invalid_underscore argument. False or None are assumed as an empty set for invalid characters.
  • valids – A collection of extra valid characters (all non-ascii characters are ignored). This parameter could be either a valid string, any iterator of valid strings of characters, or None to use only default valid characters (See above).

Warning

The result may contain characters in invalids if replacements does.

Parameters value and replacement could be of any (non-string) type, these values are normalized and converted to lower-case ASCII strings.

Examples:

>>> normalize_slug('  Á.e i  Ó  u  ') == 'a-e-i-o-u'
True

>>> normalize_slug('  Á.e i  Ó  u  ', '.', invalids='AU') == 'e.i.o'
True

>>> normalize_slug('  Á.e i  Ó  u  ', valids='.') == 'a.e-i-o-u'
True

>>> normalize_slug('_x', '_') == '_x'
True

>>> normalize_slug('-x', '_') == 'x'
True

>>> normalize_slug(None) == 'none'
True

>>> normalize_slug(1 == 1)  == 'true'
True

>>> normalize_slug(1.0) == '1-0'
True

>>> normalize_slug(135) == '135'
True

>>> normalize_slug(123456, '', invalids='52') == '1346'
True

>>> normalize_slug('_x', '_') == '_x'
True

Changed in version 1.5.5: Added the invalid_underscore parameter.

Changed in version 1.6.6: Replaced the invalid_underscore paremeter by invalids. Added the valids parameter.

Changed in version 1.7.2: Clarified the role of invalids with regards to replacement.

xoutil.string.normalize_str(value)[source]
xoutil.string.normalize_title(value)[source]
xoutil.string.normalize_unicode(value)[source]
xoutil.string.parse_boolean(value)[source]

Parse a boolean from any value given a special treatment to strings.

>>> parse_boolean('trUe')
True
>>> parse_boolean('faLSe')
False
xoutil.string.parse_url_int(value, default=None)[source]

Parse an integer URL argument. Some operations treat simple arguments as a list of one element.

xoutil.string.safe_decode(s, encoding=None)[source]

Similar to bytes decode method returning unicode.

Decodes s using the given encoding, or determining one from the system.

Returning type depend on python version; if 2.x is unicode if 3.x str.

New in version 1.1.3.

xoutil.string.safe_encode(u, encoding=None)[source]

Similar to unicode encode method returning bytes.

Encodes u using the given encoding, or determining one from the system.

Returning type is always bytes; but in python 2.x is also str.

New in version 1.1.3.

xoutil.string.safe_join(separator, iterable, encoding=None)[source]

Similar to join method in string objects separator.join(iterable), a string which is the concatenation of the strings in the iterable with separator as intermediate between elements. Return unicode or bytes depending on type of separator and each item in iterable.

encoding is used in case of error to concatenate bytes + unicode.

This function must be deprecated in Python 3.

New in version 1.1.3.

Warning

The force_separator_type was removed in version 1.2.0.

xoutil.string.safe_str(obj='')[source]

Convert to normal string type in a safe way.

Most of our Python 2.x code uses unicode as normal string, also in Python 3 converting bytes or byte-arrays to strings includes the “b” prefix in the resulting value.

This function is useful in some scenarios that require str type (for example attribute __name__ in functions and types).

As str is bytes in Python2, using str(value) assures correct these scenarios in most cases, but in other is not enough, for example:

>>> from xoutil.string import safe_str as sstr
>>> def inverted_partial(func, *args, **keywords):
...     def inner(*a, **kw):
...         a += args
...         kw.update(keywords)
...         return func(*a, **kw)
...     inner.__name__ = sstr(func.__name__.replace('lambda', u'λ'))
...     return inner

New in version 1.7.0.

xoutil.string.safe_strip(value)[source]

Removes the leading and tailing space-chars from value if string, else return value unchanged.

New in version 1.1.3.

xoutil.string.strfnumber(number, format_spec='%0.2f')[source]