1.网格搜索(Grid Search)
2.随机搜索(Random Search)
3.贝叶斯优化(Bayesian Optimization)
4.遗传算法(Genetic Algorithm)
5.基于梯度的优化(Gradient-Based Optimization)
在排除手动调参、数据集巨细和计算资源情况下,简单解释和调用库相关代码,下面给出总结表方便对比。根据你的需求和数据规模,可以选择合适的超参数优化方法。假如计算资源有限,保举利用随机搜索或贝叶斯优化;假如必要全局最优解,可以考虑 遗传算法或贝叶斯优化。我常用于随机丛林、LSTM、XGBoost等模型练习,均方误差 (MSE)、均方根误差 (RMSE)、平均绝对误差 (MAE)、决定系数 (R²):
方法优点缺点适用场景网格搜索简单直观计算成本高参数空间较小随机搜索计算成本较低可能无法找到全局最优解参数空间较大贝叶斯优化计算效率高,适合复杂问题实现复杂参数空间较大,计算资源有限遗传算法适合复杂非线性问题计算成本高,实现复杂复杂问题,全局优化基于梯度的优化计算效率高仅适用于可微分超参数一连型超参数 一、网格搜索(Grid Search)
1.原理
网格搜索通过遍历预界说的超参数组合来探求最优解。它会在所有可能的超参数组合上举行练习和验证,终极选择体现最好的组合。
2.调用库:
scikit-learn,这里我就选我之前RF回归预测部分代码,这个用的最多,我给的代码也很多,解释也很清楚
- # 导入算法
- from sklearn.ensemble import RandomForestRegressor
- from pprint import pprint
- from sklearn.model_selection import RandomizedSearchCV
- n_estimators = [int(x) for x in np.linspace(start = 800, stop = 1200, num = 50)] # 建立树的个数;范围可以自己改一下。
- max_features = [ 1.0] # 最大特征的选择方式;建议设置成[10,20,30] 或['auto'不用,'log2','sqrt']
- max_depth = [int(x) for x in np.linspace(10,60,num = 5)] # 树的最大深度;范围可以自己改一下
- max_depth.append(None)
- min_samples_split = [2,3,4,5,6,7,8] # 节点最小分裂所需样本个数;范围可以自己改一下[2, 5, 10]
- min_samples_leaf = [1,2,3,4] # 叶子节点最小样本数,任何分裂不能让其子节点样本数少于此值;范围可以自己改一下[1, 2, 4]
- bootstrap = [True] # 样本采样方法;范围可以自己改一下[True,False]
- Random_grid = {'n_estimators':n_estimators,
- 'max_features':max_features,
- 'max_depth':max_depth,
- 'min_samples_split':min_samples_split,
- 'min_samples_leaf':min_samples_leaf,
- 'bootstrap':bootstrap}
- # 随机选择最合适的参数组合
- rf = RandomForestRegressor()
- rf_random = RandomizedSearchCV(estimator = rf, param_distributions = Random_grid,
- n_iter = 100, scoring = 'neg_mean_absolute_error',
- cv = 3, verbose = 2,random_state = 42, n_jobs = -1)
- # estimator=rf,
- # param_distributions=Random_grid,
- # n_iter=100, # 随机选择100组参数组合
- # scoring='neg_mean_absolute_error', # 使用负平均绝对误差作为评分标准
- # cv=3, # 使用3折交叉验证
- # verbose=2, # 显示进度
- # random_state=42, # 设置随机种子以确保结果可复现
- # n_jobs=1 # 设置并行运行的作业数为1(如果是-1,则使用所有可用的CPU核心)
- # 执行寻找操作
- rf_random.fit(train_datas, train_labels)
- # 打印最好一组信息
- rf_random.best_params_
复制代码 例图
二、随机搜索(Random Search)
1.原理
随机搜索从预界说的超参数分布中随机采样举行练习和验证。相比于网格搜索,它可以在更少的迭代次数内找到较好的超参数组合。
2.调用库:
调用库: scikit-learn
- from sklearn.model_selection import RandomizedSearchCV
- from sklearn.ensemble import RandomForestClassifier
- from scipy.stats import randint
- param_dist = {
- 'n_estimators': randint(10, 100),
- 'max_depth': [None, 10, 20],
- 'min_samples_split': randint(2, 11)
- }
- random_search = RandomizedSearchCV(estimator=RandomForestClassifier(), param_distributions=param_dist, n_iter=10, cv=5)
- random_search.fit(X_train, y_train)
- best_params = random_search.best_params_
复制代码 三、贝叶斯优化(Bayesian Optimization)
1.原理
贝叶斯优化通过构建一个概率模型(通常是高斯过程)来预测超参数的性能,并选择最有可能提升性能的超参数举行下一次评估。
2.调用库:
调用库: BayesianOptimization
- from bayes_opt import BayesianOptimization
- def rf_cv(n_estimators, max_depth, min_samples_split):
- model = RandomForestClassifier(
- n_estimators=int(n_estimators),
- max_depth=int(max_depth),
- min_samples_split=int(min_samples_split),
- random_state=42
- )
- return cross_val_score(model, X_train, y_train, cv=5).mean()
- pbounds = {
- 'n_estimators': (10, 100),
- 'max_depth': (1, 20),
- 'min_samples_split': (2, 10)
- }
- optimizer = BayesianOptimization(
- f=rf_cv,
- pbounds=pbounds,
- random_state=42,
- )
- optimizer.maximize(init_points=5, n_iter=10)
- best_params = optimizer.max
复制代码 四、遗传算法(Genetic Algorithm)
1.原理
遗传算法模拟天然选择和遗传机制,通过选择、交叉和变异等操作逐步优化超参数组合。
2.调用库:
调用库: DEAP
- from deap import base, creator, tools, algorithms
- import random
- creator.create("FitnessMax", base.Fitness, weights=(1.0,))
- creator.create("Individual", list, fitness=creator.FitnessMax)
- toolbox = base.Toolbox()
- toolbox.register("attr_int", random.randint, 10, 100)
- toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_int, n=3)
- toolbox.register("population", tools.initRepeat, list, toolbox.individual)
- def evalRF(individual):
- model = RandomForestClassifier(
- n_estimators=individual[0],
- max_depth=individual[1],
- min_samples_split=individual[2],
- random_state=42
- )
- return cross_val_score(model, X_train, y_train, cv=5).mean(),
- toolbox.register("evaluate", evalRF)
- toolbox.register("mate", tools.cxTwoPoint)
- toolbox.register("mutate", tools.mutUniformInt, low=[10, 1, 2], up=[100, 20, 10], indpb=0.1)
- toolbox.register("select", tools.selTournament, tournsize=3)
- population = toolbox.population(n=50)
- algorithms.eaSimple(population, toolbox, cxpb=0.5, mutpb=0.2, ngen=10, verbose=True)
- best_individual = tools.selBest(population, k=1)[0]
- best_params = {'n_estimators': best_individual[0], 'max_depth': best_individual[1], 'min_samples_split': best_individual[2]}
复制代码 五、基于梯度的优化(Gradient-Based Optimization)
1.原理
基于梯度的优化方法通过计算目标函数对超参数的梯度来更新超参数,通常用于一连超参数的优化。
2.调用库:
调用库: Optuna
- import optuna
- def objective(trial):
- n_estimators = trial.suggest_int('n_estimators', 10, 100)
- max_depth = trial.suggest_int('max_depth', 1, 20)
- min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
- model = RandomForestClassifier(
- n_estimators=n_estimators,
- max_depth=max_depth,
- min_samples_split=min_samples_split,
- random_state=42
- )
- return cross_val_score(model, X_train, y_train, cv=5).mean()
- study = optuna.create_study(direction='maximize')
- study.optimize(objective, n_trials=50)
- best_params = study.best_trial.params
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |