In today's big data era, the value of data can be imagined. Sometimes in order to test, we need to simulate the real environment, but we can't directly use the real data, so we need to make some data.
Compared with Excel, I still think Python makes such "virtual" data more time-saving and labor-saving.
[note] submit the technical exchange group at the end of the document
Mobile phone name: the boss asked to simulate a batch of data for project experiments. Because some real data can not be displayed, I need to simulate some data, including: name, location, address, phone number, ID number, date of birth, email, etc.
Of course, this batch of data must be written into Excel and handed over to the boss at one time. So, such a demand, will you do it?
Actual combat: simulate 1w pieces of data written into Excel
Before talking about the foundation, let's go directly to the actual combat to let you experience how to write the generated simulation data into the Excel file.
from faker import Faker import pandas as pd fake = Faker(["zh_CN"]) Faker.seed(0) def get_data(): key_list = ["full name","Detailed address","Province","cell-phone number","ID number","date of birth","mailbox"] name = fake.name() address = fake.address() province = address[:3] number = fake.phone_number() id_card = fake.ssn() birth_date = id_card[6:14] email = fake.email() info_list = [name,address,province,number,id_card,birth_date,email] person_info = dict(zip(key_list,info_list)) return person_info df = pd.DataFrame(columns=["full name","Detailed address","Province","cell-phone number","ID number","date of birth","mailbox"]) for i in range(10000): person_info = [get_data()] df1 = pd.DataFrame(person_info) df = pd.concat([df,df1]) df.to_excel("Analog data.xlsx",index=None)
The results are as follows:
Python library explanation
How to use such an easy-to-use Python library?
We can directly use the following code to complete the installation of this library.
pip install Faker -i https://pypi.tuna.tsinghua.edu.cn/simple/
Before using, use the following code to import this library.
from faker import Faker
Before writing to Excel, let's talk about the usage of each function.
1. Generate name
fake = Faker(locale='zh_CN')name = fake.name()name
The results are as follows:
2. Generate detailed address
address = fake.address()address
The results are as follows:
3. Province of generation
province = address[:3]province
The results are as follows:
Because the result of this function is different every time, I use slicing to generate provinces. Of course, there are also specific functions to generate provinces.
fake.province()
The results are as follows:
4. Generate mobile phone number
number = fake.phone_number()number
The results are as follows:
5. generate ID number.
id_card = fake.ssn()id_card
The results are as follows:
6. Date of birth
birth_date = id_card[6:14]birth_date
The results are as follows:
7. Generate mailbox
email = fake.email()email
The results are as follows:
supplement
Of course, faker library can not only help us generate the above information, but also many other methods can be used. These methods are divided into the following categories:
-
Address address
-
person: gender, name, etc
-
barcode class
-
Color color class
-
Company category: company name, email, company name prefix, etc
-
credit_card bank card category: card number, validity period, type, etc
-
currency
-
date_time date class: date, year, month, etc
-
File class: file name, file type, file extension, etc
-
internet class
-
Job job
-
lorem random number false text
-
misc miscellaneous class
-
phone_number mobile phone number category: mobile phone number and operator number segment
-
Python data
-
profile character description information: name, gender, address, company, etc
-
ssn social security code (ID number)
-
user_agent user agent
For the use of these methods, we directly refer to faker's official website, which is very convenient to use.
faker.readthedocs.io/en/master/providers.html
1. address
fake.country() # country fake.city() # city fake.city_suffix() # Suffix of city,City or county fake.address() # address fake.street_address() # street fake.street_name() # Street name fake.postcode() # Zip code fake.latitude() # dimension fake.longitude() # longitude
2. person
fake.name() # full name fake.last_name() # surname fake.first_name() # name fake.name_male() # Male name fake.last_name_male() # Male surname fake.first_name_male() # Male name fake.name_female() # Female name
3. color
fake.hex_color() # 16 Color of hexadecimal representation fake.rgb_css_color() # css Useful rgb colour fake.rgb_color() # express rgb Color string fake.color_name() # Color name fake.safe_hex_color() #Security hex color fake.safe_color_name() # Safety color name
4. company
fake.company() # Company name fake.company_suffix() # Company name suffix
5. credit_card bank credit card
fake.credit_card_number(card_type=None) # Card number fake.credit_card_provider(card_type=None) # Card provider fake.credit_card_security_code(card_type=None)# Card security password fake.credit_card_expire() # Validity of card fake.credit_card_full(card_type=None) # Complete card information
6. date_time date
fake.date_time(tzinfo=None) fake.iso8601(tzinfo=None) # Date of output in iso8601 standard fake.date_time_this_month(before_now=True, after_now=False, tzinfo=None) # A date of the month fake.date_time_this_year(before_now=True, after_now=False, tzinfo=None) # Date of the year fake.date_time_this_decade(before_now=True, after_now=False, tzinfo=None) # A date in this year fake.date_time_this_century(before_now=True, after_now=False, tzinfo=None) # A date in this century fake.date_time_between(start_date="-30y", end_date="now", tzinfo=None) # A random time between two times fake.timezone() # time zone fake.time(pattern="%H:%M:%S") # Time (customizable format) fake.am_pm() # Random morning and afternoon fake.month() # Random month fake.month_name() # Random month name fake.year() # Random year fake.day_of_week() # Random day of the week fake.day_of_month() # One day in a random month fake.time_delta() # Random time delay fake.date_object() # Random Date object fake.time_object() # Random time object fake.unix_time() # Random unix time (timestamp) fake.date(pattern="%Y-%m-%d") # Random date (customizable format) fake.date_time_ad(tzinfo=None) # Random date after AD
7. file
fake.file_name(category="image", extension="png") # File name (specify file type and suffix) fake.file_name() # Randomly generate various types of files fake.file_extension(category=None) # file extension fake.mime_type(category=None) # mime-type
8. internet
fake.ipv4(network=False) # ipv4 address fake.ipv6(network=False) # ipv6 address fake.uri_path(deep=None) # uri path fake.uri_extension() # uri extension fake.uri() # uri fake.url() # url fake.image_url(width=None, height=None) # Picture url fake.domain_word() # Domain name subject fake.domain_name() # domain name fake.tld() # Domain suffix fake.user_name() # user name fake.user_agent() # UA fake.mac_address() # MAC address fake.safe_email() # Secure mailbox fake.free_email() # Free email fake.company_email() # Company email fake.email() # mailbox
9. job
fake.job()#Job position
10. lorem random number of fake articles
fake.text(max_nb_chars=200) # Randomly generate an article fake.word() # Random words fake.words(nb=3) # Randomly generate a few words fake.sentence(nb_words=6, variable_nb_words=True) # Randomly generate a sentence fake.sentences(nb=3) # Randomly generate several sentences fake.paragraph(nb_sentences=3, variable_nb_sentences=True) # Randomly generate a text (string) fake.paragraphs(nb=3) # Randomly generate several paragraphs of text (list)
11. phone_number phone number
fake.phone_number() # phone number fake.phonenumber_prefix() # Operator section, top three mobile phone numbers
12. ssn social security code (ID card)
fake.ssn() # Generate ID number randomly (18 bits)
13. user_agent user agent
fake.user_agent()
Recommended articles
- Addicted, I recently gave the company a large visual screen (with source code)
- So elegant, four Python automatic data analysis artifacts are really fragrant
- After combing for more than half a month, we have carefully prepared 17 knowledge and thinking maps. This time, we should clarify statistics
- Year end summary: 20 visual large screen templates, directly apply Zhenxiang (the source code is attached at the end of the article)
Technical exchange
Welcome to reprint, collect, gain, praise and support!
At present, a technical exchange group has been opened, with more than 1000 group friends. The best way to add notes is: source + Interest direction, which is convenient to find like-minded friends
- Method ① send the following pictures to wechat, long press to identify, and the background replies: add group;
- Method ②. Add micro signal: dkl88191, remarks: CSDN + research direction
- WeChat search official account: Python learning and data mining, background reply: add group