Python gracefully dumps nonstandard types

One thing that is often done in Python is the conversion of Python data types and JSON data types.
However, there is an obvious problem. As a data exchange format, JSON has fixed data types, but Python as a programming language can write custom data types in addition to the built-in data types.

For example, you must have encountered similar problems:

>>> import json
>>> import decimal
>>> 
>>> data = {'key1': 'string', 'key2': 10, 'key3': decimal.Decimal('1.45')}
>>> json.dumps(data)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    json.dumps(data)
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'Decimal' is not JSON serializable

Then the problem comes, how to convert various Python data types into JSON data types. A very non Python approach is to first convert to a value that can be directly converted with JSON data types, and then dump. This is very direct and violent, but it is very weak in front of various fancy data types.
Google is one of the important ways to solve the problem. After a search, you will find that you can actually convert data at the encoding stage of dumps.
So you must have done that and solved the problem perfectly.

>>> class DecimalEncoder(json.JSONEncoder):
...     def default(self, obj):
...         if isinstance(obj, decimal.Decimal):
...             return float(obj)
...         return super(DecimalEncoder, self).default(obj)
...     
... 
>>> 
>>> json.dumps(data, cls=DecimalEncoder)
'{"key1": "string", "key2": 10, "key3": 1.45}'

Encoding process of JSON
The code is extracted from GitHub com/python/cpyt…
Almost all docstring s are deleted. Because the code is too long, important fragments are directly intercepted. You can view the complete code at the top link of the fragment.

Those familiar with the json library know that there are only four commonly used API s, namely dump, dumps, load and loads.

The source code is located in cpython/Lib/json

# https://github.com/python/cpython/blob/master/Lib/json/__init__.py#L183-L238

def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        default=None, sort_keys=False, **kw):

     # cached encoder
    if (not skipkeys and ensure_ascii and
        check_circular and allow_nan and
        cls is None and indent is None and separators is None and
        default is None and not sort_keys and not kw):
        return _default_encoder.encode(obj)

    if cls is None:
        cls = JSONEncoder

    # a key
    return cls(
        skipkeys=skipkeys, ensure_ascii=ensure_ascii,
        check_circular=check_circular, allow_nan=allow_nan, indent=indent,
        separators=separators, default=default, sort_keys=sort_keys,
        **kw).encode(obj)

See the last return directly. It can be found that if cls is not provided, JSONEncoder is used by default, then the instance method encode of the class is called.
The encode method is also very simple:

# https://github.com/python/cpython/blob/191e993365ac3206f46132dcf46236471ec54bfa/Lib/json/encoder.py#L182-L202
def encode(self, o):
    # str type is returned directly after encoding
    if isinstance(o, str):
        if self.ensure_ascii:
            return encode_basestring_ascii(o)
        else:
            return encode_basestring(o)

    # chunks are parts of the data
    chunks = self.iterencode(o, _one_shot=True)
    if not isinstance(chunks, (list, tuple)):
        chunks = list(chunks)
    return ''.join(chunks)

It can be seen that the final JSON we get is spliced by chunks, which calls self Obtained by iterencode method.

# https://github.com/python/cpython/blob/191e993365ac3206f46132dcf46236471ec54bfa/Lib/json/encoder.py#L204-257
    if (_one_shot and c_make_encoder is not None
            and self.indent is None):
        _iterencode = c_make_encoder(
            markers, self.default, _encoder, self.indent,
            self.key_separator, self.item_separator, self.sort_keys,
            self.skipkeys, self.allow_nan)
    else:
        _iterencode = _make_iterencode(
            markers, self.default, _encoder, self.indent, floatstr,
            self.key_separator, self.item_separator, self.sort_keys,
            self.skipkeys, _one_shot)
return _iterencode(o, 0)

The iterencode method is relatively long, and we only care about the last few lines.
Return value_ iterencode is C in the function_ make_ Encoder or_ make_iterencode is the return value of these two higher-order functions.
c_make_encoder is from_ json is a module. This module is a C module. We don't care how to implement this module. To study the equivalent_ make_iterencode method.

# https://github.com/python/cpython/blob/191e993365ac3206f46132dcf46236471ec54bfa/Lib/json/encoder.py#L259-441
def _iterencode(o, _current_indent_level):
    if isinstance(o, str):
        yield _encoder(o)
    elif o is None:
        yield 'null'
    elif o is True:
        yield 'true'
    elif o is False:
        yield 'false'
    elif isinstance(o, int):
        # see comment for int/float in _make_iterencode
        yield _intstr(o)
    elif isinstance(o, float):
        # see comment for int/float in _make_iterencode
        yield _floatstr(o)
    elif isinstance(o, (list, tuple)):
        yield from _iterencode_list(o, _current_indent_level)
    elif isinstance(o, dict):
        yield from _iterencode_dict(o, _current_indent_level)
    else:
        if markers is not None:
            markerid = id(o)
            if markerid in markers:
                raise ValueError("Circular reference detected")
            markers[markerid] = o
        o = _default(o)
        yield from _iterencode(o, _current_indent_level)
        if markers is not None:
            del markers[markerid]
return _iterencode

The only thing that needs to be concerned about is the returned function. Various if elif else in the code convert the built-in types into JSON types one by one. It is used when the opposite type is unrecognized_ default() method, and then recursively call to parse each value.
_ Default is the first overridden default.
Here you can fully understand how Python encode s JSON data.
Summarize the process, JSON Dumps () calls the instance method encode() of JSONEncoder, then recursively converts various types using iterencode(), and finally splices chunks into strings and returns.
After the elegant solution passes the previous process analysis, you can know why to inherit JSONEncoder and override the default method to complete custom type resolution.
Maybe you need to parse datetime type data in the future, and you will do so:

class ExtendJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, decimal.Decimal):
            return int(obj)

        if isinstance(obj, datetime.datetime):
            return obj.strftime(DATETIME_FORMAT) 

        return super(ExtendJSONEncoder, self).default(obj)

Finally, calling the parent class is the default() method purely for triggering exceptions.
Python can use single dispatch to solve this single generic problem.

import json

from datetime import datetime
from decimal import Decimal
from functools import singledispatch

class MyClass:
    def __init__(self, value):
        self._value = value

    def get_value(self):
        return self._value

# Create three instances of non built-in types
mc = MyClass('i am class MyClass ')
dm = Decimal('11.11')
dt = datetime.now()

@singledispatch
def convert(o):
    raise TypeError('can not convert type')

@convert.register(datetime)
def _(o):
    return o.strftime('%b %d %Y %H:%M:%S') 

@convert.register(Decimal)
def _(o):
    return float(o)

@convert.register(MyClass)
def _(o):
    return o.get_value()

class ExtendJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        try:
            return convert(obj)
        except TypeError:
            return super(ExtendJSONEncoder, self).default(obj)

data = {
    'mc': mc,
    'dm': dm,
    'dt': dt
}

json.dumps(data, cls=ExtendJSONEncoder)

# {"mc": "i am class MyClass ", "dm": 11.11, "dt": "Nov 10 2017 17:31:25"}

This writing method is more in line with the specifications of design patterns. If you have a new type in the future, you don't need to modify the ExtendJSONEncoder class. You just need to add an appropriate singledispatch method. Compare python.

Keywords: Python

Added by Nukeum66 on Thu, 30 Dec 2021 11:20:56 +0200

Programming VIP

Python gracefully dumps nonstandard types

Popular Keywords