Official document of country converter -- manual confirmation of machine turnover

Project description

Country converter (coco) is a Python package that can convert and match country names between different classifications and different named versions. Internally, it uses conventional expressions to match country names. Coco can also be used to build aggregation harmony matrix between different classification schemes.

table of content

effect

so far, stay In the (meta) data description, there is no single standard on how to name or specify individual countries. Although some data sources follow ISO 3166, this standard also defines two letter and three letter codes in addition to numerical classification. What makes the problem more complicated is that many databases do not use one of the existing standards, but use substandard country names to classify countries .

National converter (cocoa) automatically convert country names from different standards and versions. Internally, Coco is based on a table that specifies different ISO and UN standards for each country / region, as well as official names and conventional expressions, which is designed to match all English versions of specific country names. In addition, cocoa also includes those based on the United Nations, the European Union, OECD member countries and the United Nations Country regional specifications, continent and classification of various MRIO and IAM databases (see below) Classification scheme of).

install

Country_ The converter is registered in Pypi. From the command line:

pip install country_converter --upgrade

The national converter can also be from conda , you can install and use CONDA (if you do not have a conda_forge channel to add "- cconda" to your CONDA configuration, see Installation instructions here):

conda install country_converter

Alternatively, the source code can be found in GitHub Found on.

Package depends on pandas Library : Test test Is required. For more information about running tests, see Reading contribution rst.

usage

Basic use

Use in Python

Convert various country names to some standard names:

import country_converter as coco
some_names = ['United Rep. of Tanzania', 'DE', 'Cape Verde', '788', 'Burma', 'COG',
              'Iran (Islamic Republic of)', 'Korea, Republic of',
              "Dem. People's Rep. of Korea"]
standard_names = coco.convert(names=some_names, to='name_short')
print(standard_names)

This results in [Tanzania ',' Germany ',' Cape Verde ',' Tunisia ',' Myanmar ',' Republic of the Congo ',' Iran ',' South Korea ',' North Korea '. The input format is automatically determined according to the two letters of ISO, three letters of ISO, ISO numbers or conventional expression matching. If there is any ambiguity, the source format can be specified with the parameter "src".

In the case of multiple transformations, better performance can be achieved by instantiating a single country transformation object for all transformations:

import country_converter as coco
cc = coco.CountryConverter()

some_names = ['United Rep. of Tanzania', 'Cape Verde', 'Burma',
              'Iran (Islamic Republic of)', 'Korea, Republic of',
              "Dem. People's Rep. of Korea"]

standard_names = cc.convert(names = some_names, to = 'name_short')
UNmembership = cc.convert(names = some_names, to = 'UNmember')
print(standard_names)
print(UNmembership)

Conversion between classification schemes:

iso3_codes = ['USA', 'VUT', 'TKL', 'AUT', 'XXX' ]
iso2_codes = coco.convert(names=iso3_codes, to='ISO2')
print(iso2_codes)

This leads to [US], 'Vu', 'Tk', 'AT', 'not found'

The indication not found can be specified (for example, not_found = "does not exist"), if "not_found" fails, the original entry will pass:

iso2_codes = coco.convert(names=iso3_codes, to='ISO2', not_found=None)
print(iso2_codes)

Results in ['USA', 'VU', 'Tk', 'AT', 'XX'

The internal data is stored in the pandas data frame and can be accessed directly. For example, this can be used to filter the countries of Member States and organizations (annually). Note: for this purpose, an instance of "country converter" is required.

import country_converter as coco
cc = coco.CountryConverter()

some_countries = ['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic',
                  'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary',
                  'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania',
                  'Luxembourg', 'Malta', 'Romania', 'Russia', 'Turkey', 'United Kingdom',
                  'United States']

oecd_since_1995 = cc.data[(cc.data.OECD >= 1995) & cc.data.name_short.isin(some_countries)].name_short
eu_until_1980 = cc.data[(cc.data.EU <= 1980) & cc.data.name_short.isin(some_countries)].name_short
print(oecd_since_1995)
print(eu_until_1980)

All classifications are available through:

cc.EU28
cc.OECD

cc.EU27as('ISO3')

And available classification schemes:

cc.valid_class

There is also a type a gland, which is only classified by country (so any group of countries is omitted):

cc.valid_country_classifications

If you need a dictionary to describe the classification / Membership:

import country_converter as coco
cc = coco.CountryConverter()
cc.get_correspondence_dict('EXIO3', 'ISO3')

It also includes countries not allocated in the use of specific classifications:

cc.get_correspondence_dict('EU27', 'ISO2', replace_nan='NonEU')

Regular expressions can also be used to match any list of countries with any other list of countries. For example:

match_these = ['norway', 'united_states', 'china', 'taiwan']
master_list = ['USA', 'The Swedish Kingdom', 'Norway is a Kingdom too',
               'Peoples Republic of China', 'Republic of China' ]

matching_dict = coco.match(match_these, master_list)

By default, if no match is found, the country converter warns the giant snake recorder. The following example demonstrates how to configure cocoa logging behavior.

import logging
import country_converter as coco
logging.basicConfig(level=logging.INFO)
coco.convert("asdf")
# WARNING:country_converter.country_converter:asdf not found in regex
# Out: 'not found'

coco_logger = coco.logging.getLogger()
coco_logger.setLevel(logging.CRITICAL)
coco.convert("asdf")
# Out: 'not found'

For more information, see the IPython notebook (country_converter_examples.ipynb).

Command line usage

The national converter package also provides a command line interface called coco.

Minimum example:

coco Cyprus DE Denmark Estonia 4 'United Kingdom' AUT

Converts a given name to an ISO3 code based on the input that matches the input to ISO2, ISO3, isonumeraic, or a regular expression. The name list must be separated by spaces, and the country name composed of multiple words must be placed in quotation marks (').

The input classification can be specified with "src" or "- s" (or will be determined automatically) and target classification ("to" or "- t").

The default output is a spatially separated list, which can be changed by "+ output_sep" or "- o" (for example, - o "|).

Therefore, to convert from ISO3 to UN number code and receive the output as a comma separated list:

coco AUT DEU VAT AUS -s ISO3 -t UNcode -o ', '

The command line tool also allows you to specify the output of items that are not found, including passing them to the output through none:

coco CAN Peru US Mexico Venezuela UK Arendelle --not_found=None

And specify an additional data file, which will overwrite the existing country match

coco Congo --additional_data path/to/datafile.csv

please see https://github.com/konstantinstadler/country_converter/tree/master/tests/custom_data_example.txt , Learn about other data file examples.

Flag nonmember_ Only (- u) and include_obsolete (- i) restrict the search to United Nations Member States only, or extend it to include countries that are currently obsolete. For example, Netherlands Antilles Dissolved in 2010.

Therefore:

coco "Netherlands Antilles"

Cause not found. However, the search can be extended to recently dissolved countries:

coco "Netherlands Antilles" -i

This leads to "ants".

In addition to these countries, the cocoa command line tool also accepts various country classifications (EXIO1, EXIO2, EXIO3, WIOD, Eora, MESSAGE, OECD, EU27, EU28, UN, obsolete, Cecilia 2050, BRICs, APEC, foundation, CIS, G7, G20). One of them can be passed

coco G20

List all countries in the classification/

For classifications covering almost all countries (MRIO and IAM classifications)

coco EXIO3

Lists unique category names. When passing parameters, print the simplified letter of the selected classification:

coco EXIO3 --to ISO3

For more information, please call

coco -h

Used in Matlab

The newer (tested in 2016) version of Matlab allows direct calls to Python functions and libraries. This requires the python version > = 3.4 installed in the system path (e.g. via Anaconda).

To test, try this in Matlab:

py.print(py.sys.version)

If this works, you can also use coco after installation through the point (on the window command line) - see the installation instructions above):

pip install country_converter --upgrade

In matlab:

coco = py.country_converter.CountryConverter()
countries = {'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China'};
ISO2_pythontype = coco.convert(countries, pyargs('to', 'ISO2'));
ISO2_cellarray = cellfun(@char,cell(ISO2_pythontype),'UniformOutput',false);

Or, as a long single line:

short_names = cellfun(@char, cell(py.country_converter.convert({56, 276}, pyargs('src', 'UNcode', 'to', 'name_short'))), 'UniformOutput',false);

As mentioned above, all properties of cocoa can also be provided in Matlab:

coco = py.country_converter.CountryConverter();
coco.EU27
EU27ISO3 = coco.EU27as('ISO3');

These functions return pandas data frames. The base values can be accessed with. Value (e.g

EU27ISO3.values

I leave it to professional Matlab users to find out how to deal with them further.

See also IPython notebook (country_converter_examples.ipynb) For more information - all the functions provided in Python (for example, specifying the output in case of data loss by attaching a data file) also work in Matlab, passing parameters through the pyargs function.

Building harmony for the country

Coco provides the function of building harmonious carriers, matrices and dictionaries between different classifications. This can be used for giant snakes and pads. For more information, see (country_converter_aggregation_helper.ipynb)

Classification scheme

There are currently the following classification schemes (see also the data sources below for more information):

  1. ISO2 (ISO 3166-1 alpha-2)
  2. ISO3 (ISO 3166-1 Alpha-3)
  3. ISO - Digital (ISO 3166-1 digital)
  4. UN numeric code (M.49 - follow extended ISO numbers)
  5. Standard name or short name
  6. "Official" name
  7. mainland
  8. United Nations region
  9. EXIOBASE 1 Classification
  10. EXIOBASE 2 classification
  11. EXIOBASE 3 Classification
  12. WIOD classification
  13. Euler
  14. OECD Members (per year)
  15. news 11 regional classification
  16. image
  17. remind
  18. the United Nations Member states (annually)
  19. european union Members (including EU 12, EU 15, EU 25, EU 27, EU 27_2007, EU 28)
  20. European Economic Area member
  21. Schengen area
  22. [Cecilia]( https://cecilia2050.eu/system/files/De Koning et al. (2014)_ Scenarios for 2050_ 0. PDF) 2050 classification
  23. APEC
  24. BRIC countries
  25. basic
  26. CIS (as of 2019, except Turkmenistan)
  27. G7
  28. G20( (list all EU Member States as individual member states)
  29. FAO code (number)
  30. GBD code (number - global burden of disease country code)

Coco contains officially recognized codes and non-standard codes of countries in dispute or dissolution. Limit the setting to only officially recognized United Nations Member States or include outdated countries, please pass

import country_converter as coco
cc = coco.CountryConverter()
cc_UN = coco.CountryConverter(only_UNmember=True)
cc_all = coco.CountryConverter(include_obsolete=True)

cc.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short')
cc_UN.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short')
cc_all.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short')

CC leads to [Palestine ',' Kosovo ',' not found ',' France ', and cc_UN is converted to [not found', 'not found', 'not found', 'France] and CC_ All is converted to ['Palestine', 'Kosovo', 'Zanzibar', 'France]. Please note that the basic data frame can be in the attribute Data (e.g. cc_all. Data).

Data sources and further reading

Most of the basic data can be found and described in Wikipedia Of ISO 3166-1 Page is a good starting point. The United Nations region / code is in the United Nations Statistics Division( Not counted )Available on the web page. stay It's also explained on Wikipedia Differences between ISO numeric codes and UN (M.49) codes. Extracted from their respective databases EXIOBASE,WIOD and Eora Classification. about Eora, These names are based on the "country name" csv file provided on the web page, but update the different names used in the Eora26 database. Message classification follows Message model area The 11 regions given in the description are aggregated. IMAGE Classification based“ Regional classification map",To remind We received the country map from the model developer.

OECD and the United Nations Members can be found on the web pages of member organizations, and information about outdated country codes can be found in Statoids On the web page.

Due to the brexit process, european union The situation has become complicated. In naming, cocoa follows Eurostat Glossary Therefore, EU27 refers to the EU without Britain, and EU27_2007 refers to the EU without Croatia (expanded status in 2007). EU shortcuts are always linked to recent classifications. European Economic Area The agreement still applies to the UK (September 2020, UK brexit transition period)- As described here )Therefore, the UK is currently included in the European economic area.

The global burden of disease country code is from Extracted from the GBD codebook available here.

Communication, problems, errors and enhancements

Use the problem tracker to log errors, suggested enhancements, and all other communications related to coco.

You can On twitter Follow me and get the latest news about all my open source and research projects (and occasionally some random tweets).

Related software

package It is planned to provide countries with an ISO official database of historical countries, country segments, languages and currencies. If you need to change your non English country name, Country name It includes an extensive database of country names translated into different ISO 3166 standards in different languages and functions. Python-iso3166 Focus on the conversion between two letter, three letter and three digit codes defined in ISO 3166 standard.

If you are using R, you should view Country code.

Keywords: Python

Added by blink359 on Tue, 21 Dec 2021 12:06:22 +0200