Project description
Country converter (coco) is a Python package that can convert and match country names between different classifications and different named versions. Internally, it uses conventional expressions to match country names. Coco can also be used to build aggregation harmony matrix between different classification schemes.
table of content
- motivation
- install
- usage
- Classification scheme
- Data sources and further reading
- Communication, problems, errors and enhancements
- contribution
- Related software
- Reference country converter
- confirm
effect
so far, stay In the (meta) data description, there is no single standard on how to name or specify individual countries. Although some data sources follow ISO 3166, this standard also defines two letter and three letter codes in addition to numerical classification. What makes the problem more complicated is that many databases do not use one of the existing standards, but use substandard country names to classify countries .
National converter (cocoa) automatically convert country names from different standards and versions. Internally, Coco is based on a table that specifies different ISO and UN standards for each country / region, as well as official names and conventional expressions, which is designed to match all English versions of specific country names. In addition, cocoa also includes those based on the United Nations, the European Union, OECD member countries and the United Nations Country regional specifications, continent and classification of various MRIO and IAM databases (see below) Classification scheme of).
install
Country_ The converter is registered in Pypi. From the command line:
pip install country_converter --upgrade
The national converter can also be from conda , you can install and use CONDA (if you do not have a conda_forge channel to add "- cconda" to your CONDA configuration, see Installation instructions here):
conda install country_converter
Alternatively, the source code can be found in GitHub Found on.
Package depends on pandas Library : Test test Is required. For more information about running tests, see Reading contribution rst.
usage
Basic use
Use in Python
Convert various country names to some standard names:
import country_converter as coco some_names = ['United Rep. of Tanzania', 'DE', 'Cape Verde', '788', 'Burma', 'COG', 'Iran (Islamic Republic of)', 'Korea, Republic of', "Dem. People's Rep. of Korea"] standard_names = coco.convert(names=some_names, to='name_short') print(standard_names)
This results in [Tanzania ',' Germany ',' Cape Verde ',' Tunisia ',' Myanmar ',' Republic of the Congo ',' Iran ',' South Korea ',' North Korea '. The input format is automatically determined according to the two letters of ISO, three letters of ISO, ISO numbers or conventional expression matching. If there is any ambiguity, the source format can be specified with the parameter "src".
In the case of multiple transformations, better performance can be achieved by instantiating a single country transformation object for all transformations:
import country_converter as coco cc = coco.CountryConverter() some_names = ['United Rep. of Tanzania', 'Cape Verde', 'Burma', 'Iran (Islamic Republic of)', 'Korea, Republic of', "Dem. People's Rep. of Korea"] standard_names = cc.convert(names = some_names, to = 'name_short') UNmembership = cc.convert(names = some_names, to = 'UNmember') print(standard_names) print(UNmembership)
Conversion between classification schemes:
iso3_codes = ['USA', 'VUT', 'TKL', 'AUT', 'XXX' ] iso2_codes = coco.convert(names=iso3_codes, to='ISO2') print(iso2_codes)
This leads to [US], 'Vu', 'Tk', 'AT', 'not found'
The indication not found can be specified (for example, not_found = "does not exist"), if "not_found" fails, the original entry will pass:
iso2_codes = coco.convert(names=iso3_codes, to='ISO2', not_found=None) print(iso2_codes)
Results in ['USA', 'VU', 'Tk', 'AT', 'XX'
The internal data is stored in the pandas data frame and can be accessed directly. For example, this can be used to filter the countries of Member States and organizations (annually). Note: for this purpose, an instance of "country converter" is required.
import country_converter as coco cc = coco.CountryConverter() some_countries = ['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Romania', 'Russia', 'Turkey', 'United Kingdom', 'United States'] oecd_since_1995 = cc.data[(cc.data.OECD >= 1995) & cc.data.name_short.isin(some_countries)].name_short eu_until_1980 = cc.data[(cc.data.EU <= 1980) & cc.data.name_short.isin(some_countries)].name_short print(oecd_since_1995) print(eu_until_1980)
All classifications are available through:
cc.EU28 cc.OECD cc.EU27as('ISO3')
And available classification schemes:
cc.valid_class
There is also a type a gland, which is only classified by country (so any group of countries is omitted):
cc.valid_country_classifications
If you need a dictionary to describe the classification / Membership:
import country_converter as coco cc = coco.CountryConverter() cc.get_correspondence_dict('EXIO3', 'ISO3')
It also includes countries not allocated in the use of specific classifications:
cc.get_correspondence_dict('EU27', 'ISO2', replace_nan='NonEU')
Regular expressions can also be used to match any list of countries with any other list of countries. For example:
match_these = ['norway', 'united_states', 'china', 'taiwan'] master_list = ['USA', 'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China' ] matching_dict = coco.match(match_these, master_list)
By default, if no match is found, the country converter warns the giant snake recorder. The following example demonstrates how to configure cocoa logging behavior.
import logging import country_converter as coco logging.basicConfig(level=logging.INFO) coco.convert("asdf") # WARNING:country_converter.country_converter:asdf not found in regex # Out: 'not found' coco_logger = coco.logging.getLogger() coco_logger.setLevel(logging.CRITICAL) coco.convert("asdf") # Out: 'not found'
For more information, see the IPython notebook (country_converter_examples.ipynb).
Command line usage
The national converter package also provides a command line interface called coco.
Minimum example:
coco Cyprus DE Denmark Estonia 4 'United Kingdom' AUT
Converts a given name to an ISO3 code based on the input that matches the input to ISO2, ISO3, isonumeraic, or a regular expression. The name list must be separated by spaces, and the country name composed of multiple words must be placed in quotation marks (').
The input classification can be specified with "src" or "- s" (or will be determined automatically) and target classification ("to" or "- t").
The default output is a spatially separated list, which can be changed by "+ output_sep" or "- o" (for example, - o "|).
Therefore, to convert from ISO3 to UN number code and receive the output as a comma separated list:
coco AUT DEU VAT AUS -s ISO3 -t UNcode -o ', '
The command line tool also allows you to specify the output of items that are not found, including passing them to the output through none:
coco CAN Peru US Mexico Venezuela UK Arendelle --not_found=None
And specify an additional data file, which will overwrite the existing country match
coco Congo --additional_data path/to/datafile.csv
please see https://github.com/konstantinstadler/country_converter/tree/master/tests/custom_data_example.txt , Learn about other data file examples.
Flag nonmember_ Only (- u) and include_obsolete (- i) restrict the search to United Nations Member States only, or extend it to include countries that are currently obsolete. For example, Netherlands Antilles Dissolved in 2010.
Therefore:
coco "Netherlands Antilles"
Cause not found. However, the search can be extended to recently dissolved countries:
coco "Netherlands Antilles" -i
This leads to "ants".
In addition to these countries, the cocoa command line tool also accepts various country classifications (EXIO1, EXIO2, EXIO3, WIOD, Eora, MESSAGE, OECD, EU27, EU28, UN, obsolete, Cecilia 2050, BRICs, APEC, foundation, CIS, G7, G20). One of them can be passed
coco G20
List all countries in the classification/
For classifications covering almost all countries (MRIO and IAM classifications)
coco EXIO3
Lists unique category names. When passing parameters, print the simplified letter of the selected classification:
coco EXIO3 --to ISO3
For more information, please call
coco -h
Used in Matlab
The newer (tested in 2016) version of Matlab allows direct calls to Python functions and libraries. This requires the python version > = 3.4 installed in the system path (e.g. via Anaconda).
To test, try this in Matlab:
py.print(py.sys.version)
If this works, you can also use coco after installation through the point (on the window command line) - see the installation instructions above):
pip install country_converter --upgrade
In matlab:
coco = py.country_converter.CountryConverter() countries = {'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China'}; ISO2_pythontype = coco.convert(countries, pyargs('to', 'ISO2')); ISO2_cellarray = cellfun(@char,cell(ISO2_pythontype),'UniformOutput',false);
Or, as a long single line:
short_names = cellfun(@char, cell(py.country_converter.convert({56, 276}, pyargs('src', 'UNcode', 'to', 'name_short'))), 'UniformOutput',false);
As mentioned above, all properties of cocoa can also be provided in Matlab:
coco = py.country_converter.CountryConverter(); coco.EU27 EU27ISO3 = coco.EU27as('ISO3');
These functions return pandas data frames. The base values can be accessed with. Value (e.g
EU27ISO3.values
I leave it to professional Matlab users to find out how to deal with them further.
See also IPython notebook (country_converter_examples.ipynb) For more information - all the functions provided in Python (for example, specifying the output in case of data loss by attaching a data file) also work in Matlab, passing parameters through the pyargs function.
Building harmony for the country
Coco provides the function of building harmonious carriers, matrices and dictionaries between different classifications. This can be used for giant snakes and pads. For more information, see (country_converter_aggregation_helper.ipynb)
Classification scheme
There are currently the following classification schemes (see also the data sources below for more information):
- ISO2 (ISO 3166-1 alpha-2)
- ISO3 (ISO 3166-1 Alpha-3)
- ISO - Digital (ISO 3166-1 digital)
- UN numeric code (M.49 - follow extended ISO numbers)
- Standard name or short name
- "Official" name
- mainland
- United Nations region
- EXIOBASE 1 Classification
- EXIOBASE 2 classification
- EXIOBASE 3 Classification
- WIOD classification
- Euler
- OECD Members (per year)
- news 11 regional classification
- image
- remind
- the United Nations Member states (annually)
- european union Members (including EU 12, EU 15, EU 25, EU 27, EU 27_2007, EU 28)
- European Economic Area member
- Schengen area
- [Cecilia]( https://cecilia2050.eu/system/files/De Koning et al. (2014)_ Scenarios for 2050_ 0. PDF) 2050 classification
- APEC
- BRIC countries
- basic
- CIS (as of 2019, except Turkmenistan)
- G7
- G20( (list all EU Member States as individual member states)
- FAO code (number)
- GBD code (number - global burden of disease country code)
Coco contains officially recognized codes and non-standard codes of countries in dispute or dissolution. Limit the setting to only officially recognized United Nations Member States or include outdated countries, please pass
import country_converter as coco cc = coco.CountryConverter() cc_UN = coco.CountryConverter(only_UNmember=True) cc_all = coco.CountryConverter(include_obsolete=True) cc.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') cc_UN.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short') cc_all.convert(['PSE', 'XKX', 'EAZ', 'FRA'], to='name_short')
CC leads to [Palestine ',' Kosovo ',' not found ',' France ', and cc_UN is converted to [not found', 'not found', 'not found', 'France] and CC_ All is converted to ['Palestine', 'Kosovo', 'Zanzibar', 'France]. Please note that the basic data frame can be in the attribute Data (e.g. cc_all. Data).
Data sources and further reading
Most of the basic data can be found and described in Wikipedia Of ISO 3166-1 Page is a good starting point. The United Nations region / code is in the United Nations Statistics Division( Not counted )Available on the web page. stay It's also explained on Wikipedia Differences between ISO numeric codes and UN (M.49) codes. Extracted from their respective databases EXIOBASE,WIOD and Eora Classification. about Eora, These names are based on the "country name" csv file provided on the web page, but update the different names used in the Eora26 database. Message classification follows Message model area The 11 regions given in the description are aggregated. IMAGE Classification based“ Regional classification map",To remind We received the country map from the model developer.
OECD and the United Nations Members can be found on the web pages of member organizations, and information about outdated country codes can be found in Statoids On the web page.
Due to the brexit process, european union The situation has become complicated. In naming, cocoa follows Eurostat Glossary Therefore, EU27 refers to the EU without Britain, and EU27_2007 refers to the EU without Croatia (expanded status in 2007). EU shortcuts are always linked to recent classifications. European Economic Area The agreement still applies to the UK (September 2020, UK brexit transition period)- As described here )Therefore, the UK is currently included in the European economic area.
The global burden of disease country code is from Extracted from the GBD codebook available here.
Communication, problems, errors and enhancements
Use the problem tracker to log errors, suggested enhancements, and all other communications related to coco.
You can On twitter Follow me and get the latest news about all my open source and research projects (and occasionally some random tweets).
Related software
package It is planned to provide countries with an ISO official database of historical countries, country segments, languages and currencies. If you need to change your non English country name, Country name It includes an extensive database of country names translated into different ISO 3166 standards in different languages and functions. Python-iso3166 Focus on the conversion between two letter, three letter and three digit codes defined in ISO 3166 standard.
If you are using R, you should view Country code.