当前位置: 首页 > article >正文

【21天学习AI底层概念】day11 (kaggle新手入门教程)Your First Machine Learning Model

网址:https://www.kaggle.com/code/dansbecker/your-first-machine-learning-model

代码

一、选择训练集数据

1.检查数据

import pandas as pd

melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns

Index([‘Suburb’, ‘Address’, ‘Rooms’, ‘Type’, ‘Price’, ‘Method’, ‘SellerG’,
‘Date’, ‘Distance’, ‘Postcode’, ‘Bedroom2’, ‘Bathroom’, ‘Car’,
‘Landsize’, ‘BuildingArea’, ‘YearBuilt’, ‘CouncilArea’, ‘Lattitude’,
‘Longtitude’, ‘Regionname’, ‘Propertycount’],
dtype=‘object’)

2.丢弃不可用数据

# The Melbourne data has some missing values (some houses for which some variables weren't recorded.)
# We'll learn to handle missing values in a later tutorial.  
# Your Iowa data doesn't have missing values in the columns you use. 
# So we will take the simplest option for now, and drop houses from our data. 
# Don't worry about this much for now, though the code is:

# dropna drops missing values (think of na as "not available")
melbourne_data = melbourne_data.dropna(axis=0)

3.选择prediction target

y = melbourne_data.Price

4.选择"Features"

melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]

5.检查X,X可以理解成一个由rooms列、bathroom列…等列组成的数据集

X.describe()

输出一个表格,复制不过来格式…关于X的整体描述,平均值、最大最小值、标准差等
Rooms Bathroom Landsize Lattitude Longtitude
count 6196.000000 6196.000000 6196.000000 6196.000000 6196.000000
mean 2.931407 1.576340 471.006940 -37.807904 144.990201
std 0.971079 0.711362 897.449881 0.075850 0.099165
min 1.000000 1.000000 0.000000 -38.164920 144.542370
25% 2.000000 1.000000 152.000000 -37.855438 144.926198
50% 3.000000 1.000000 373.000000 -37.802250 144.995800
75% 4.000000 2.000000 628.000000 -37.758200 145.052700
max 8.000000 8.000000 37000.000000 -37.457090 145.526350

X.head()

输出一个表格,复制不过来格式…输出X的前几行数据
Rooms Bathroom Landsize Lattitude Longtitude
1 2 1.0 156.0 -37.8079 144.9934
2 3 2.0 134.0 -37.8093 144.9944
4 4 1.0 120.0 -37.8072 144.9941
6 3 2.0 245.0 -37.8024 144.9993
7 2 1.0 256.0 -37.8060 144.9954

二、训练模型

The steps to building and using a model are:

  • Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.
  • Fit: Capture patterns from provided data. This is the heart of modeling.
  • Predict: Just what it sounds like
  • Evaluate: Determine how accurate the model’s predictions are.
from sklearn.tree import DecisionTreeRegressor

# Define model. Specify a number for random_state to ensure same results each run
melbourne_model = DecisionTreeRegressor(random_state=1)

# Fit model
melbourne_model.fit(X, y)
print("Making predictions for the following 5 houses:")
print(X.head())
print("The predictions are")
print(melbourne_model.predict(X.head()))

输出预测数据,由于输入的X是训练数据集里的前几条,所以给出的y预测值也是和数据集里的sale_price完全一模一样
Making predictions for the following 5 houses:
Rooms Bathroom Landsize Lattitude Longtitude
1 2 1.0 156.0 -37.8079 144.9934
2 3 2.0 134.0 -37.8093 144.9944
4 4 1.0 120.0 -37.8072 144.9941
6 3 2.0 245.0 -37.8024 144.9993
7 2 1.0 256.0 -37.8060 144.9954
The predictions are
[1035000. 1465000. 1600000. 1876000. 1636000.]


http://www.kler.cn/a/502780.html

相关文章:

  • ffmpeg 编译遇到的坑
  • linux自动分区后devmappercentos-home删除后合并到其它分区上
  • 不同音频振幅dBFS计算方法
  • ubuntu20下编译linux1.0 (part1)
  • 典型的 package.json 文件中的
  • Blazor中Syncfusion Word组件使用方法
  • qt设置qwidget背景色无效
  • arcgis中用python脚本批量给多个要素类的相同字段赋值
  • HTTP 入门:认识网络通信基础
  • 【WPS】【WORDWORD】【JavaScript】实现微软WORD自动更正的效果
  • Blazor开发复杂信息管理系统的优势
  • 【Linux】编辑器之神vim使用教程
  • 电力场景红外测温图像均压环下的避雷器识别分割数据集labelme格式2436张1类别
  • 8Hive SQL底层执行原理
  • 如何提高自动化测试覆盖率和效率
  • .NET framework、Core和Standard都是什么?
  • Linux IPC:管道与FIFO汇总整理
  • C#,图论与图算法,输出无向图“欧拉路径”的弗勒里(Fleury Algorithm)算法和源程序
  • css盒子水平垂直居中
  • 下载的stable diffudion 模型如何转换到diffusers可用的格式
  • SQLynx 数据库管理平台 3.6.0 全新发布:全面支持华为数据库和ClickHouse,代码提示更智能!
  • 软考信安21~网络设备安全
  • Android Room 构建问题:There are multiple good constructors
  • 备战春招—高频芯片设计面试题
  • DuckDB:星号(*)表达式完整指南
  • HIVE技术