前言
嗨喽~大家好呀,这里是魔王呐 !
闲的无聊的得我又来倒腾代码了~
今天给大家分享得是——122万人的生活工作和死亡数据分析
准备好了嘛~现在开始发车喽!!
@TOC
所需素材
获取素材点击
代码
- import pandas as pd
- df = pd.read_csv('.\data\AgeDatasetV1.csv')
- df.info()
- df.describe().to_excel(r'.\result\describe.xlsx')
- df.isnull().sum().to_excel(r'.\result\nullsum.xlsx')
- df[df.duplicated()].to_excel(r'.\result\duplicated.xlsx')
- df.rename(columns=lambda x: x.replace(' ', '_').replace('-', '_'), inplace=True)
- print(df.columns)
- print(df[df['Birth_year'] < 0].to_excel(r'.\result\biryear0.xlsx')) # 出生负数表示公元前
复制代码- import pandas as pd
- import matplotlib.pyplot as plt
- import seaborn as sns
复制代码- plt.rcParams['font.sans-serif'] = ['SimHei'] # 显示中文标签
- plt.rcParams['axes.unicode_minus'] = False
- df1 = pd.read_csv('./data/AgeDatasetV1.csv')
- # 列名规范化 重命名
- df1.rename(columns=lambda x: x.replace(' ', '_').replace('-', '_'), inplace=True)
- print(df1.columns)
- # print(Data.corr()) # 相关性
- # print(df1['Gender'].unique()) # 性别
复制代码 [code]# # 按不同年龄范围的死亡率百分比plt.figure(figsize=(12, 10))count = [df1[df1['Age_of_death'] > 100].shape[0], df1.shape[0] - df1[(df1['Age_of_death'] 90)].shape[0], df1[(df1['Age_of_death'] 70)].shape[0], df1[(df1['Age_of_death'] 50)].shape[0], df1.shape[0] - (df1[df1['Age_of_death'] > 100].shape[0] + df1[(df1['Age_of_death'] 90)].shape[0] + df1[(df1['Age_of_death'] 50)].shape[0] + df1[(df1['Age_of_death'] 70)].shape[0]) ]age = ['> 100', '> 90 & 70 & 50 & |