xoutil.string
- Common string operations.¶
Exposes all original string module functionalities, with some general additions.
In this module str and unicode types are not used because Python 2.x and Python 3.x treats strings differently. bytes and text_type will be used instead with the following conventions:
In Python 2.x str is synonym of bytes and both (unicode and ‘str’) are both string types inheriting form basestring.
In Python 3.x str is always unicode but unicode and basestring types doesn’t exists. bytes type can be used as an array of one byte each item.
Many methods are readjusted to these conditions.
-
xoutil.string.
capitalize
(value, title=True)[source]¶ Capitalizes value according to whether it should be title-like.
Title-like means it will capitalize every word but the 3-letters or less unless its the first word:
>>> capitalize('a group is its own worst enemy') 'A Group is its own Worst Enemy'
(This may be odd because, in the example above, own should be capitalized.)
Return bytes or unicode depending on type of value.
>>> from xoutil.eight import text_type >>> type(capitalize(text_type('something'))) is text_type True
>>> type(capitalize(str('something'))) is str True
-
xoutil.string.
cut_any_prefix
(value, *prefixes)[source]¶ Apply
cut_prefix()
for the first matching prefix.
-
xoutil.string.
cut_any_suffix
(value, *suffixes)[source]¶ Apply
cut_suffix()
for the first matching suffix.
-
xoutil.string.
cut_prefix
(value, prefix)[source]¶ Removes the leading prefix if exists, else return value unchanged.
-
xoutil.string.
cut_prefixes
(value, *prefixes)[source]¶ Apply
cut_prefix()
for all provided prefixes in order.
-
xoutil.string.
cut_suffix
(value, suffix)[source]¶ Removes the tailing suffix if exists, else return value unchanged.
-
xoutil.string.
cut_suffixes
(value, *suffixes)[source]¶ Apply
cut_suffix()
for all provided suffixes in order.
-
xoutil.string.
force_encoding
(encoding=None)[source]¶ Validates an encoding value; if None use locale.getlocale()[1]; else return the same value.
New in version 1.2.0.
-
xoutil.string.
force_str
(value, encoding=None)[source]¶ Force to string, the type is different in Python 2 or 3 (bytes or unicode).
Parameters: - value – The value to convert to str.
- encoding –
The encoding which should be used if either encoding or decoding should be performed on value.
The default is to use the same default as
safe_encode()
orsafe_decode()
.
New in version 1.2.0.
-
xoutil.string.
hyphen_name
(name)[source]¶ Convert a name, normally an identifier, to a hyphened slug.
All transitions from lower to upper capitals (or from digits to letters) are joined with a hyphen.
Also, all invalid characters (those invalid in Python identifiers) are converted to hyphens.
For example:
>>> hyphen_name('BaseNode') == 'base-node' True
-
xoutil.string.
make_a10z
(string)[source]¶ Utility to find out that “internationalization” is “i18n”.
Examples:
>>> print(make_a10z('parametrization')) p13n
-
xoutil.string.
normalize_ascii
(value)[source]¶ Return the string normal form for the value
Convert all non-ascii to valid characters using unicode ‘NFKC’ normalization.
-
xoutil.string.
normalize_slug
(value, replacement='-', invalids=None, valids=None)[source]¶ Return the string normal form, valid for slugs, for the value
Convert all non-ascii to valid characters using unicode ‘NFKC’ normalization.
Lower-case the result.
Replace unwanted characters by replacement, repetition of given pattern will be converted to only one instance.
Warning
There’s a known bug when replacement contains ‘’.
[_a-z0-9]
are assumed as valid characters. Extra arguments can modify this standard behaviour:Parameters: - invalids – Any collection of characters added to these that are
normally invalid in the provided value. (non-ascii or not
included in valid characters). Boolean
True
can be passed as a synonymous of"_"
for compatibility with oldinvalid_underscore
argument.False
orNone
are assumed as an empty set for invalid characters. - valids – A collection of extra valid characters (all non-ascii
characters are ignored). This parameter could be either a valid
string, any iterator of valid strings of characters, or
None
to use only default valid characters (See above).
Warning
The result may contain characters in invalids if replacements does.
Parameters value and replacement could be of any (non-string) type, these values are normalized and converted to lower-case ASCII strings.
Examples:
>>> normalize_slug(' Á.e i Ó u ') == 'a-e-i-o-u' True >>> normalize_slug(' Á.e i Ó u ', '.', invalids='AU') == 'e.i.o' True >>> normalize_slug(' Á.e i Ó u ', valids='.') == 'a.e-i-o-u' True >>> normalize_slug('_x', '_') == '_x' True >>> normalize_slug('-x', '_') == 'x' True >>> normalize_slug(None) == 'none' True >>> normalize_slug(1 == 1) == 'true' True >>> normalize_slug(1.0) == '1-0' True >>> normalize_slug(135) == '135' True >>> normalize_slug(123456, '', invalids='52') == '1346' True >>> normalize_slug('_x', '_') == '_x' True
Changed in version 1.5.5: Added the invalid_underscore parameter.
Changed in version 1.6.6: Replaced the invalid_underscore paremeter by invalids. Added the valids parameter.
Changed in version 1.7.2: Clarified the role of invalids with regards to replacement.
- invalids – Any collection of characters added to these that are
normally invalid in the provided value. (non-ascii or not
included in valid characters). Boolean
-
xoutil.string.
parse_boolean
(value)[source]¶ Parse a boolean from any value given a special treatment to strings.
>>> parse_boolean('trUe') True
>>> parse_boolean('faLSe') False
-
xoutil.string.
parse_url_int
(value, default=None)[source]¶ Parse an integer URL argument. Some operations treat simple arguments as a list of one element.
-
xoutil.string.
safe_decode
(s, encoding=None)[source]¶ Similar to bytes decode method returning unicode.
Decodes s using the given encoding, or determining one from the system.
Returning type depend on python version; if 2.x is unicode if 3.x str.
New in version 1.1.3.
-
xoutil.string.
safe_encode
(u, encoding=None)[source]¶ Similar to unicode encode method returning bytes.
Encodes u using the given encoding, or determining one from the system.
Returning type is always bytes; but in python 2.x is also str.
New in version 1.1.3.
-
xoutil.string.
safe_join
(separator, iterable, encoding=None)[source]¶ Similar to join method in string objects separator.join(iterable), a string which is the concatenation of the strings in the iterable with separator as intermediate between elements. Return unicode or bytes depending on type of separator and each item in iterable.
encoding is used in case of error to concatenate bytes + unicode.
This function must be deprecated in Python 3.
New in version 1.1.3.
Warning
The force_separator_type was removed in version 1.2.0.
-
xoutil.string.
safe_str
(obj='')[source]¶ Convert to normal string type in a safe way.
Most of our Python 2.x code uses unicode as normal string, also in Python 3 converting bytes or byte-arrays to strings includes the “b” prefix in the resulting value.
This function is useful in some scenarios that require str type (for example attribute
__name__
in functions and types).As
str is bytes
in Python2, using str(value) assures correct these scenarios in most cases, but in other is not enough, for example:>>> from xoutil.string import safe_str as sstr >>> def inverted_partial(func, *args, **keywords): ... def inner(*a, **kw): ... a += args ... kw.update(keywords) ... return func(*a, **kw) ... inner.__name__ = sstr(func.__name__.replace('lambda', u'λ')) ... return inner
New in version 1.7.0.