Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- 만화 영화
- 서귀포
- 완결
- geopandas
- 제주도
- 빅데이터 분석기사 필기
- 커피
- 빅데이터 분석기사
- 진심
- 가족
- 아주 작은 습관의 힘
- 완결 웹툰
- 습관
- 네이버 완결 웹툰
- python
- 애니메이션
- 이범선
- 액션
- 넷플릭스
- 산책
- 이기적 출판사
- 웹툰
- 제임스 클리어
- 사랑
- 로맨스
- pandas
- 네이버 웹툰
- 네이버
- 영화
- QGIS
Archives
- Today
- Total
JuJuKwakKwak
시험 대비 (21.12.27) 본문
시험 대비¶
101 pandas tips for beginners titanic¶
Part 1 - Exploring Data¶
In [ ]:
pwd
In [2]:
import numpy as np
import pandas as pd
# 시각화 관련 라이브러리
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import seaborn as sns
import missingno
In [3]:
test = pd.read_csv('resources/titanic/data_titanic/test.csv')
train = pd.read_csv('resources/titanic/data_titanic/train.csv')
In [4]:
print(train.shape)
train.head(3)
(891, 12)
Out[4]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
In [5]:
print(test.shape)
test.head(3)
(418, 11)
Out[5]:
PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892 | 3 | Kelly, Mr. James | male | 34.5 | 0 | 0 | 330911 | 7.8292 | NaN | Q |
1 | 893 | 3 | Wilkes, Mrs. James (Ellen Needs) | female | 47.0 | 1 | 0 | 363272 | 7.0000 | NaN | S |
2 | 894 | 2 | Myles, Mr. Thomas Francis | male | 62.0 | 0 | 0 | 240276 | 9.6875 | NaN | Q |
In [6]:
# To read lines from 101 to 110
train[101:111]
Out[6]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
101 | 102 | 0 | 3 | Petroff, Mr. Pastcho ("Pentcho") | male | NaN | 0 | 0 | 349215 | 7.8958 | NaN | S |
102 | 103 | 0 | 1 | White, Mr. Richard Frasar | male | 21.0 | 0 | 1 | 35281 | 77.2875 | D26 | S |
103 | 104 | 0 | 3 | Johansson, Mr. Gustaf Joel | male | 33.0 | 0 | 0 | 7540 | 8.6542 | NaN | S |
104 | 105 | 0 | 3 | Gustafsson, Mr. Anders Vilhelm | male | 37.0 | 2 | 0 | 3101276 | 7.9250 | NaN | S |
105 | 106 | 0 | 3 | Mionoff, Mr. Stoytcho | male | 28.0 | 0 | 0 | 349207 | 7.8958 | NaN | S |
106 | 107 | 1 | 3 | Salkjelsvik, Miss. Anna Kristine | female | 21.0 | 0 | 0 | 343120 | 7.6500 | NaN | S |
107 | 108 | 1 | 3 | Moss, Mr. Albert Johan | male | NaN | 0 | 0 | 312991 | 7.7750 | NaN | S |
108 | 109 | 0 | 3 | Rekic, Mr. Tido | male | 38.0 | 0 | 0 | 349249 | 7.8958 | NaN | S |
109 | 110 | 1 | 3 | Moran, Miss. Bertha | female | NaN | 1 | 0 | 371110 | 24.1500 | NaN | Q |
110 | 111 | 0 | 1 | Porter, Mr. Walter Chamberlain | male | 47.0 | 0 | 0 | 110465 | 52.0000 | C110 | S |
In [7]:
train.describe()
Out[7]:
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
In [8]:
train.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.7+ KB
In [9]:
train.dtypes
Out[9]:
PassengerId int64 Survived int64 Pclass int64 Name object Sex object Age float64 SibSp int64 Parch int64 Ticket object Fare float64 Cabin object Embarked object dtype: object
In [10]:
len(train)
Out[10]:
891
In [11]:
train.columns
Out[11]:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object')
Part 2 - Indexing¶
In [12]:
# To see a certain column head
train.columns[3]
Out[12]:
'Name'
In [13]:
# To see multiple column heads
train.columns[3:5]
Out[13]:
Index(['Name', 'Sex'], dtype='object')
In [14]:
# To see the values of a certain column
train['Name'].head()
Out[14]:
0 Braund, Mr. Owen Harris 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 2 Heikkinen, Miss. Laina 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 Allen, Mr. William Henry Name: Name, dtype: object
In [15]:
# isnull().any() / isna().any()
# To check if there are any null values (empty or NaN (Not a Number))
train.isnull().any()
Out[15]:
PassengerId False Survived False Pclass False Name False Sex False Age True SibSp False Parch False Ticket False Fare False Cabin True Embarked True dtype: bool
In [16]:
# isnull().sum() / isna().sum()
# To check the sum counts of null values
train.isnull().sum()
Out[16]:
PassengerId 0 Survived 0 Pclass 0 Name 0 Sex 0 Age 177 SibSp 0 Parch 0 Ticket 0 Fare 0 Cabin 687 Embarked 2 dtype: int64
In [17]:
# Locating - loc - To read a certain row
train.loc[3]
Out[17]:
PassengerId 4 Survived 1 Pclass 1 Name Futrelle, Mrs. Jacques Heath (Lily May Peel) Sex female Age 35.0 SibSp 1 Parch 0 Ticket 113803 Fare 53.1 Cabin C123 Embarked S Name: 3, dtype: object
In [18]:
train.loc[ train['Pclass']==1 ].head(3)
Out[18]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
In [19]:
train.loc[train['Pclass']==1].reset_index(drop=True).head(3)
Out[19]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
1 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
2 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
In [20]:
train.loc[train['Age'] < 1]
# train.loc[train.Age.values < 1]
# train.loc[train.Age < 1]
# 다 똑같다
Out[20]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
78 | 79 | 1 | 2 | Caldwell, Master. Alden Gates | male | 0.83 | 0 | 2 | 248738 | 29.0000 | NaN | S |
305 | 306 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.92 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S |
469 | 470 | 1 | 3 | Baclini, Miss. Helene Barbara | female | 0.75 | 2 | 1 | 2666 | 19.2583 | NaN | C |
644 | 645 | 1 | 3 | Baclini, Miss. Eugenie | female | 0.75 | 2 | 1 | 2666 | 19.2583 | NaN | C |
755 | 756 | 1 | 2 | Hamalainen, Master. Viljo | male | 0.67 | 1 | 1 | 250649 | 14.5000 | NaN | S |
803 | 804 | 1 | 3 | Thomas, Master. Assad Alexander | male | 0.42 | 0 | 1 | 2625 | 8.5167 | NaN | C |
831 | 832 | 1 | 2 | Richards, Master. George Sibley | male | 0.83 | 1 | 1 | 29106 | 18.7500 | NaN | S |
In [21]:
train.loc[9, 'Name']
Out[21]:
'Nasser, Mrs. Nicholas (Adele Achem)'
In [22]:
train.loc[9, ['Name', 'Age']]
Out[22]:
Name Nasser, Mrs. Nicholas (Adele Achem) Age 14.0 Name: 9, dtype: object
In [23]:
# iloc - integer loc
train.iloc[11:16]
# train.loc[11:16]
# 똑같다
Out[23]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
11 | 12 | 1 | 1 | Bonnell, Miss. Elizabeth | female | 58.0 | 0 | 0 | 113783 | 26.5500 | C103 | S |
12 | 13 | 0 | 3 | Saundercock, Mr. William Henry | male | 20.0 | 0 | 0 | A/5. 2151 | 8.0500 | NaN | S |
13 | 14 | 0 | 3 | Andersson, Mr. Anders Johan | male | 39.0 | 1 | 5 | 347082 | 31.2750 | NaN | S |
14 | 15 | 0 | 3 | Vestrom, Miss. Hulda Amanda Adolfina | female | 14.0 | 0 | 0 | 350406 | 7.8542 | NaN | S |
15 | 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55.0 | 0 | 0 | 248706 | 16.0000 | NaN | S |
In [24]:
# If there are too many columns, you can reduce the number of columns it displays
pd.set_option("display.max_columns", 12)
train.head(3)
Out[24]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
Part 3 - Sorting¶
In [25]:
# Sorting columns with descending order
train.sort_values('Fare', ascending=False).head(3)
Out[25]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
258 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
737 | 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C |
679 | 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36.0 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C |
In [26]:
# Sorting columns with ascending order
train.sort_values('Fare', ascending=True).head(3)
Out[26]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
271 | 272 | 1 | 3 | Tornquist, Mr. William Henry | male | 25.0 | 0 | 0 | LINE | 0.0 | NaN | S |
597 | 598 | 0 | 3 | Johnson, Mr. Alfred | male | 49.0 | 0 | 0 | LINE | 0.0 | NaN | S |
302 | 303 | 0 | 3 | Johnson, Mr. William Cahoone Jr | male | 19.0 | 0 | 0 | LINE | 0.0 | NaN | S |
In [27]:
# Sorting columns - Looking at certain rows
train.sort_values('Fare', ascending=True)[101:105]
Out[27]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
421 | 422 | 0 | 3 | Charters, Mr. David | male | 21.0 | 0 | 0 | A/5. 13032 | 7.7333 | NaN | Q |
156 | 157 | 1 | 3 | Gilnagh, Miss. Katherine "Katie" | female | 16.0 | 0 | 0 | 35851 | 7.7333 | NaN | Q |
778 | 779 | 0 | 3 | Kilgannon, Mr. Thomas J | male | NaN | 0 | 0 | 36865 | 7.7375 | NaN | Q |
727 | 728 | 1 | 3 | Mannion, Miss. Margareth | female | NaN | 0 | 0 | 36866 | 7.7375 | NaN | Q |
In [28]:
# Sorting columns - multi columns in ascending order
train.sort_values(['Fare', 'Survived', 'Pclass'], ascending=[True, False, False]).head(3)
Out[28]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
271 | 272 | 1 | 3 | Tornquist, Mr. William Henry | male | 25.0 | 0 | 0 | LINE | 0.0 | NaN | S |
179 | 180 | 0 | 3 | Leonard, Mr. Lionel | male | 36.0 | 0 | 0 | LINE | 0.0 | NaN | S |
302 | 303 | 0 | 3 | Johnson, Mr. William Cahoone Jr | male | 19.0 | 0 | 0 | LINE | 0.0 | NaN | S |
In [29]:
# Sorting a certain column with a certain condition
train[train['Fare']>80].head(3)
Out[29]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
27 | 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
31 | 32 | 1 | 1 | Spencer, Mrs. William Augustus (Marie Eugenie) | female | NaN | 1 | 0 | PC 17569 | 146.5208 | B78 | C |
34 | 35 | 0 | 1 | Meyer, Mr. Edgar Joseph | male | 28.0 | 1 | 0 | PC 17604 | 82.1708 | NaN | C |
In [30]:
# Sort columns by alphabetical order
train.reindex(sorted(train.columns), axis=1).head(3)
# train.sort_index(axis=1)
# 똑같다
Out[30]:
Age | Cabin | Embarked | Fare | Name | Parch | PassengerId | Pclass | Sex | SibSp | Survived | Ticket | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 22.0 | NaN | S | 7.2500 | Braund, Mr. Owen Harris | 0 | 1 | 3 | male | 1 | 0 | A/5 21171 |
1 | 38.0 | C85 | C | 71.2833 | Cumings, Mrs. John Bradley (Florence Briggs Th... | 0 | 2 | 1 | female | 1 | 1 | PC 17599 |
2 | 26.0 | NaN | S | 7.9250 | Heikkinen, Miss. Laina | 0 | 3 | 3 | female | 0 | 1 | STON/O2. 3101282 |
In [31]:
# Multi column filtering
train[(train.Age >= 30) & (train.Sex == 'female')].head()
train[(train['Age'] >= 30) & (train['Sex'] == 'female')]
# 똑같다
Out[31]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
11 | 12 | 1 | 1 | Bonnell, Miss. Elizabeth | female | 58.0 | 0 | 0 | 113783 | 26.5500 | C103 | S |
15 | 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55.0 | 0 | 0 | 248706 | 16.0000 | NaN | S |
18 | 19 | 0 | 3 | Vander Planke, Mrs. Julius (Emelia Maria Vande... | female | 31.0 | 1 | 0 | 345763 | 18.0000 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
862 | 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Ba... | female | 48.0 | 0 | 0 | 17466 | 25.9292 | D17 | S |
865 | 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | female | 42.0 | 0 | 0 | 236852 | 13.0000 | NaN | S |
871 | 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | female | 47.0 | 1 | 1 | 11751 | 52.5542 | D35 | S |
879 | 880 | 1 | 1 | Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | female | 56.0 | 0 | 1 | 11767 | 83.1583 | C50 | C |
885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q |
114 rows × 12 columns
In [32]:
# isin() - filtering by conditions
train[train.Embarked.isin(['C'])].head(3)
# train[train['Embarked'].isin(['C'])]
# 똑같다
Out[32]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C |
19 | 20 | 1 | 3 | Masselmani, Mrs. Fatima | female | NaN | 0 | 0 | 2649 | 7.2250 | NaN | C |
In [33]:
train[train.Cabin.isin(['C50', 'C85'])]
# train[train['Cabin'].isin(['C50', 'C85'])]
# 똑같다
Out[33]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
879 | 880 | 1 | 1 | Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | female | 56.0 | 0 | 1 | 11767 | 83.1583 | C50 | C |
In [34]:
train[train.Ticket.isin(['113572']) & train.Parch.isin([0])]
Out[34]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
61 | 62 | 1 | 1 | Icard, Miss. Amelie | female | 38.0 | 0 | 0 | 113572 | 80.0 | B28 | NaN |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0 | B28 | NaN |
In [35]:
len(train[train.Fare == 80])
Out[35]:
2
In [36]:
# Sorting with conditions and if there are any that matches
(train.Fare > 80).any()
Out[36]:
True
In [37]:
# Sorting with conditions and get the percentage
(train.Fare > 80).mean()
Out[37]:
0.08305274971941638
Part 4 Creating¶
In [38]:
# Making a new empty DataFrame
sample = pd.DataFrame()
sample
Out[38]:
In [39]:
# Copying and creating a new DataFrame
temp = train.copy()
temp.head(3)
Out[39]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
In [40]:
# Creating a new column
pd.set_option("display.max_columns", 13)
temp['New'] = 0
temp.head(3)
Out[40]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | New | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | 0 |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | 0 |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | 0 |
In [41]:
# 특정 위치에 새로운 컬럼 추가하기
pd.set_option('display.max_columns', 14)
temp.insert(0, 'New2', 0)
temp.head(3)
Out[41]:
New2 | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | New | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | 0 |
1 | 0 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | 0 |
2 | 0 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | 0 |
In [42]:
# Creating a new column by copying an existing column
temp['New'] = temp['Age']
temp.head(3)
Out[42]:
New2 | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | New | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | 22.0 |
1 | 0 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | 38.0 |
2 | 0 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | 26.0 |
In [43]:
# Creating a new Dataframe with a certain column
y_train = train['Survived']
y_train.head(3)
Out[43]:
0 0 1 1 2 1 Name: Survived, dtype: int64
In [44]:
# Creating a new Dataframe with certain row
sample1 = train[0:5]
sample1
Out[44]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
In [45]:
# Creating a new Dataframe with certain columns
sample2 = train[['Pclass', 'Name', 'Sex']]
sample2.head(3)
Out[45]:
Pclass | Name | Sex | |
---|---|---|---|
0 | 3 | Braund, Mr. Owen Harris | male |
1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female |
2 | 3 | Heikkinen, Miss. Laina | female |
In [46]:
# Creating a new Dataframe with certain columns and certain rows
sample3 = train[['Pclass', 'Name', 'Sex']][1:4]
sample3
Out[46]:
Pclass | Name | Sex | |
---|---|---|---|
1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female |
2 | 3 | Heikkinen, Miss. Laina | female |
3 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female |
In [47]:
# Creating a new Dataframe with exisiting rows that matches a certain condition
sample4 = train.loc[train.Survived == 1]
# train.loc[train.Survived.values == 1]
# train.loc[train['Survived'] == 1]
sample4.head(3)
Out[47]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
In [48]:
# Creating a new Dataframe with exisiting rows that matches multiple conditions
sample5 = train.loc[(train.Survived == 1) & (train.Pclass == 3) & (train.Sex == 'male')]
sample5.head(3)
Out[48]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 37 | 1 | 3 | Mamee, Mr. Hanna | male | NaN | 0 | 0 | 2677 | 7.2292 | NaN | C |
65 | 66 | 1 | 3 | Moubarek, Master. Gerios | male | NaN | 1 | 1 | 2661 | 15.2458 | NaN | C |
74 | 75 | 1 | 3 | Bing, Mr. Lee | male | 32.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
In [49]:
# Concat - Merging and creating a new data frame
data = pd.concat((train, test), sort=True)
print(data.shape)
data.head(3)
(1309, 12)
Out[49]:
Age | Cabin | Embarked | Fare | Name | Parch | PassengerId | Pclass | Sex | SibSp | Survived | Ticket | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 22.0 | NaN | S | 7.2500 | Braund, Mr. Owen Harris | 0 | 1 | 3 | male | 1 | 0.0 | A/5 21171 |
1 | 38.0 | C85 | C | 71.2833 | Cumings, Mrs. John Bradley (Florence Briggs Th... | 0 | 2 | 1 | female | 1 | 1.0 | PC 17599 |
2 | 26.0 | NaN | S | 7.9250 | Heikkinen, Miss. Laina | 0 | 3 | 3 | female | 0 | 1.0 | STON/O2. 3101282 |
In [50]:
data.tail(3)
Out[50]:
Age | Cabin | Embarked | Fare | Name | Parch | PassengerId | Pclass | Sex | SibSp | Survived | Ticket | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
415 | 38.5 | NaN | S | 7.2500 | Saether, Mr. Simon Sivertsen | 0 | 1307 | 3 | male | 0 | NaN | SOTON/O.Q. 3101262 |
416 | NaN | NaN | S | 8.0500 | Ware, Mr. Frederick | 0 | 1308 | 3 | male | 0 | NaN | 359309 |
417 | NaN | NaN | C | 22.3583 | Peter, Master. Michael J | 1 | 1309 | 3 | male | 1 | NaN | 2668 |
'Data Science > 국비지원' 카테고리의 다른 글
Day11 - Code (0) | 2021.12.28 |
---|---|
Day11 (0) | 2021.12.28 |
Day10 - Code (0) | 2021.12.24 |
Day10 (0) | 2021.12.24 |
Day9 - Code (1) | 2021.12.23 |