1 2 3
| import pandas as pd import numpy as np
|
字典创建为DataFrame
1 2
| data = {"grammer":["Python","C++","Java","Go",np.nan,"SQL","PHP","Python"], "score":[1,2,np.nan,4,5,6,7,10]}
|
1 2
| df = pd.DataFrame(data) df
|
| grammer | score |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | NaN |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
7 | Python | 10.0 |
---|
取得含有"Python"的行
1
| df[df['grammer'] == 'Python']
|
| grammer | score |
---|
0 | Python | 1.0 |
---|
7 | Python | 10.0 |
---|
输出所有列名
Index(['grammer', 'score'], dtype='object')
修改第二列列名为 fraction
1
| df.rename(columns={'score':'fraction'},inplace= True)
|
统计 grammer 列中每种编程语言出现的次数
1
| df['grammer'].value_counts()
|
Python 2
C++ 1
Java 1
Go 1
SQL 1
PHP 1
Name: grammer, dtype: int64
将 fraction列的空值,用平均值填充
1
| df['fraction'] = df['fraction'].fillna(df['fraction'].mean())
|
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
7 | Python | 10.0 |
---|
提取 fraction 列中值大于3的行
0 False
1 False
2 True
3 True
4 True
5 True
6 True
7 True
Name: fraction, dtype: bool
| grammer | fraction |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
7 | Python | 10.0 |
---|
按照 fraction 列进行去重复值操作
1
| df.drop_duplicates(['fraction'])
|
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
7 | Python | 10.0 |
---|
计算 fraction 列的平均值
5.0
将 grammer 列转换为list
['Python', 'C++', 'Java', 'Go', nan, 'SQL', 'PHP', 'Python']
将 DataFrame 保存为csv
查看数据行列数
(8, 2)
提取 fraction 列 值大于 2 且 小于 7 的行
1
| df[(df['fraction'] > 2) & (df['fraction'] < 7)]
|
| grammer | fraction |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
提取 fraction 列最大值所在的行
10.0
1
| df[df['fraction'] == df['fraction'].max()]
|
| grammer | fraction |
---|
7 | Python | 10.0 |
---|
查看最后5行
| grammer | fraction |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
7 | Python | 10.0 |
---|
查看前五行
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
删除最后一行
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
添加一行数据 [‘ASP’,8.28]
1
| s = pd.Series({'grammer':'ASP','fraction':8.28})
|
1 2 3
| df = pd.concat([df, s.to_frame().T], ignore_index=True) df
|
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
2 | Java | 5.0 |
---|
3 | Go | 4.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
7 | Python | 10.0 |
---|
8 | ASP | 8.28 |
---|
对数据进行排序,按照"fraction"列值的大小
1 2 3 4
| df.sort_values("fraction")
|
| grammer | fraction |
---|
0 | Python | 1.0 |
---|
1 | C++ | 2.0 |
---|
3 | Go | 4.0 |
---|
2 | Java | 5.0 |
---|
4 | NaN | 5.0 |
---|
5 | SQL | 6.0 |
---|
6 | PHP | 7.0 |
---|
8 | ASP | 8.28 |
---|
7 | Python | 10.0 |
---|
统计 grammer 列每个字符串的长度
1
| df['grammer'] = df['grammer'].fillna('Lisp')
|
1 2
| df['len_str'] = df['grammer'].apply(lambda x: len(x)) df
|
| grammer | fraction | len_str |
---|
0 | Python | 1.0 | 6 |
---|
1 | C++ | 2.0 | 3 |
---|
2 | Java | 5.0 | 4 |
---|
3 | Go | 4.0 | 2 |
---|
4 | Lisp | 5.0 | 4 |
---|
5 | SQL | 6.0 | 3 |
---|
6 | PHP | 7.0 | 3 |
---|
7 | Python | 10.0 | 6 |
---|
8 | ASP | 8.28 | 3 |
---|