Example: XGBoost power loss

This examples shows you how to load the trained XGBoost model and use it.

[1]:

import os
from pathlib import Path

project_root = Path.cwd().parents[1]
os.chdir(project_root)   # now cwd is .../pvcracks

from pvcracks.powerloss.powerloss_functions import load_xgb_models, predict_power_and_voc
import numpy as np
import pandas as pd

Calculate delta Pmpp from fitted IV cell curves

[2]:

Cell9Master = pd.read_csv('docs/data/ELdata_module_209_VAE_analysis.csv', index_col=0)

[3]:

# grab each Module’s Init‐stage Pmp
init = (
    Cell9Master[Cell9Master['Deg']=='Init']
    .set_index('Module')['Pmp']
    # init is now a Series: index=Module, value=Pmp_at_Init
)

# Calcuate differenc in %
Cell9Master['deltaPmp'] = 100*(Cell9Master['Pmp'] - Cell9Master['Module'].map(init))/Cell9Master['Module'].map(init)

[4]:

#Filter out specific cells we know have cracks
cracked = [
    ('209_A2','Deg1'),
    ('209_A1','Deg1'),
    ('209_A3','Deg2'),
    ('209_A1','Deg2'),
    ('209_C3','Deg2'),
    ('209_B2','Deg2'),
    ('209_B3','Deg2'),
]

# 2) build a boolean mask which is True only for those pairs
mask = Cell9Master[['Module','Deg']].apply(tuple,axis=1).isin(cracked)

# 3) slice to keep only the cracked cells
Cell9Degs = Cell9Master[mask].copy()

[5]:

Cell9Degs

[5]:

	ELPath	Module	Deg	Rs	Rsh	I	Is	N	Pmp	Vmp	Imp	lat_vec	klabel	deltaPmp
11	/docs/data/EL/209_A2/Deg1/209_A2_0005_2021_04_...	209_A2	Deg1	0.008996	1295.750000	8.228613	9.005018e-08	1.317542	3.534448	0.465455	7.593540	[ 5.0129116e-02 1.5392177e+00 1.2010708e+00 ...	4	-0.254354
12	/docs/data/EL/209_A1/Deg1/209_A1_0005_2021_04_...	209_A1	Deg1	0.010386	1.000000	8.184377	1.006196e-05	1.778587	3.051399	0.446061	6.840773	[ 0.1527828 0.6933546 0.61783403 -1.311733...	1	-13.414670
18	/docs/data/EL/209_A3/Deg2/209_A3_0005_2021_04_...	209_A3	Deg2	0.008634	8.986411	8.233849	6.987570e-06	1.730266	3.349300	0.458990	7.297111	[ 1.6261781e+00 -1.3577162e-01 3.6160046e-01 ...	4	-6.030860
21	/docs/data/EL/209_A1/Deg2/209_A1_0005_2021_04_...	209_A1	Deg2	0.012567	0.835178	7.588188	3.762260e-05	1.997049	2.678277	0.433131	6.183523	[-1.2965436 0.14127842 0.39876893 -0.320378...	1	-24.002233
23	/docs/data/EL/209_C3/Deg2/209_C3_0005_2021_04_...	209_C3	Deg2	0.009595	88.786965	8.160221	1.795159e-06	1.573030	3.347141	0.452525	7.396585	[-4.2912847e-01 2.1230426e+00 8.3105731e-01 ...	3	-3.268041
25	/docs/data/EL/209_B2/Deg2/209_B2_0005_2021_04_...	209_B2	Deg2	0.009456	529.959186	8.231207	3.558481e-06	1.644547	3.346776	0.452525	7.395777	[-0.2521562 0.70483327 1.0419804 -1.359632...	1	-4.311519
26	/docs/data/EL/209_B3/Deg2/209_B3_0005_2021_04_...	209_B3	Deg2	0.010031	97.047204	8.309195	1.610765e-06	1.567835	3.399534	0.452525	7.512363	[-1.8619239 1.6475563 1.0302961 -1.493011...	1	-4.495282

Load latent vectors:

We show in the EL variational autoencoder (VAE) example “Rapid EL processing” how to obtain these.

[6]:

#reformat latent vectors to np.array[np.array[],...]
def parse_whitespace_vec(s):
    # strip off the brackets, then split on any whitespace (incl newlines),
    # then convert each token to float
    nums = s.strip('[]').split()
    return [float(x) for x in nums]

# apply parsing
parsed = Cell9Degs['lat_vec'].apply(parse_whitespace_vec)

# 2) stack into a numpy array of dtype object
lat_vectors = np.array(parsed.tolist(), dtype=object)

Load xgboost models:

[7]:

pmpp_model, voc_model = load_xgb_models(
        pmpp_model_path="pvcracks/powerloss/xgb_model_pmpp_diff_percent_3CH.pkl",
        voc_model_path="pvcracks/powerloss/xgb_model_Voc_diff_percent_3CH.pkl"
    )

Predict delta Pmpp in %

[8]:

df_predictions = predict_power_and_voc(lat_vectors, pmpp_model, voc_model)

[9]:

#power loss predictions added to cell info dataframe
Cell9Degs.loc[:, 'power_loss_%'] = df_predictions['power_loss_%'].values

[10]:

Cell9Degs[['Module', 'Deg', 'deltaPmp', 'power_loss_%']]

[10]:

	Module	Deg	deltaPmp	power_loss_%
11	209_A2	Deg1	-0.254354	-13.171658
12	209_A1	Deg1	-13.414670	-12.675356
18	209_A3	Deg2	-6.030860	-15.003642
21	209_A1	Deg2	-24.002233	-10.363716
23	209_C3	Deg2	-3.268041	-7.291249
25	209_B2	Deg2	-4.311519	-18.673622
26	209_B3	Deg2	-4.495282	-15.299884

[11]:

#Compare IV fit results vs XGboost
import matplotlib.pyplot as plt

x = Cell9Degs['deltaPmp']
y = Cell9Degs['power_loss_%']

# compute correlation
r = x.corr(y)

plt.figure(figsize=(6,4))
plt.scatter(x, y, alpha=0.7)
mn = min(x.min(), y.min())
mx = max(x.max(), y.max())
plt.plot([mn,mx], [mn,mx], 'r--', label='y = x')
plt.text(x.min()-0.9, 0, f"Pearson r = {r:.3f}", fontsize=10, verticalalignment='top',
        bbox=dict(boxstyle='round', facecolor='white', alpha=0.7))
plt.xlabel('Cell delta Pmp from IV curves (%)')
plt.ylabel('Power Loss from latent vectors and XGboost (%)')
plt.title('IV curve delta Pmpp vs. XGboost power loss %')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

../_images/Examples_example_xgboost_15_0.png

The current results show a low pearson score (<0.3). This is due to our limited amount of EL/IV pairs that we could use to train the XGboost model, 77 pairs. In the powerloss subpackage there is a jupyter notebook going through the steps how the xgboost model is trained. This could be done with more data when made avaialable.