First Commit

2025-11-08 14:24:03 +00:00 · 2022-05-11 11:00:55 +08:00 · 2022-05-11 11:00:55 +08:00 · 17e8992073
commit 17e8992073
14 changed files with 8194 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,219 @@
 preprocess/dataset/*
 checkpoints/*
 .idea
 ### JetBrains template
 # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
 # Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
 # User-specific stuff
 .idea/**/workspace.xml
 .idea/**/tasks.xml
 .idea/**/usage.statistics.xml
 .idea/**/dictionaries
 .idea/**/shelf
 # Generated files
 .idea/**/contentModel.xml
 # Sensitive or high-churn files
 .idea/**/dataSources/
 .idea/**/dataSources.ids
 .idea/**/dataSources.local.xml
 .idea/**/sqlDataSources.xml
 .idea/**/dynamic.xml
 .idea/**/uiDesigner.xml
 .idea/**/dbnavigator.xml
 # Gradle
 .idea/**/gradle.xml
 .idea/**/libraries
 # Gradle and Maven with auto-import
 # When using Gradle or Maven with auto-import, you should exclude module files,
 # since they will be recreated, and may cause churn.  Uncomment if using
 # auto-import.
 # .idea/artifacts
 # .idea/compiler.xml
 # .idea/jarRepositories.xml
 # .idea/modules.xml
 # .idea/*.iml
 # .idea/modules
 # *.iml
 # *.ipr
 # CMake
 cmake-build-*/
 # Mongo Explorer plugin
 .idea/**/mongoSettings.xml
 # File-based project format
 *.iws
 # IntelliJ
 out/
 # mpeltonen/sbt-idea plugin
 .idea_modules/
 # JIRA plugin
 atlassian-ide-plugin.xml
 # Cursive Clojure plugin
 .idea/replstate.xml
 # Crashlytics plugin (for Android Studio and IntelliJ)
 com_crashlytics_export_strings.xml
 crashlytics.properties
 crashlytics-build.properties
 fabric.properties
 # Editor-based Rest Client
 .idea/httpRequests
 # Android studio 3.1+ serialized cache file
 .idea/caches/build_file_checksums.ser
 ### Python template
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 .pybuilder/
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
 # .python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 #Pipfile.lock
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 !/checkpoints/
 !/preprocess/dataset/
 !/preprocess/dataset/
--- a/README.md
+++ b/README.md
@ -0,0 +1,21 @@
 # SCNet: A deep learning network framework for analyzing near-infrared spectroscopy using short-cut
 ## Pre-processing
 Since the method we proposed is a regression model, the classification dataset weat kernel is not used in this work.
 The other three dataset (corn, marzipan, soil) were preprocessed manually with Matlab and saved in the sub dictionary of `./preprocess` dir. The original  dataset of these three dataset were stored in the `./preprocess/dataset/`.
 The mango dataset is not in Matlab .m file format, so we save them with the `process.py`. 
 Meanwhile, we drop the useless part and only save the data between  684 and 900 nm.
 > The data set used in this study comprises a total of 11,691 NIR spectra (684–990 nm in 3 nm sampling with a total 103 variables) and DM measurements performed on 4675 mango fruit across 4 harvest seasons 2015, 2016, 2017 and 2018 [24]. 
 The detailed preprocessing progress can be found in [./preprocess.ipynb](./preprocess.ipynb)
 ## Network Training
 In order to show our network can prevent degration problem, we hold the experiment which contains the training loss curve of four models. The detailed information can be found in [model_training.ipynb](./model_training.ipynb).
 ## Network evaluation
 After training our model on training set, we evaluate the models on testing dataset that spared before. The evaluation is done with [model_evaluation.ipynb](model_evaluating.ipynb).
--- a/model_evaluating.ipynb
+++ b/model_evaluating.ipynb
@ -0,0 +1,155 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Experiment 2: Model Evaluating"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from keras.models import load_model\n",
    "from matplotlib import ticker\n",
    "from scipy.io import loadmat\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import mean_squared_error\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this experiment, we load model weights from the experiment1 and evaluate them on test dataset."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape of data:\n",
      "x_train: (5728, 1, 102), y_train: (5728, 1),\n",
      "x_val: (2455, 1, 102), y_val: (2455, 1)\n",
      "x_test: (3508, 1, 102), y_test: (3508, 1)\n"
     ]
    }
   ],
   "source": [
    "data = loadmat('./preprocess/dataset/mango/mango_dm_split.mat')\n",
    "x_train, y_train, x_test, y_test = data['x_train'], data['y_train'], data['x_test'], data['y_test']\n",
    "x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.3, random_state=12, shuffle=True)\n",
    "x_train, x_val, x_test = x_train[:, np.newaxis, :], x_val[:, np.newaxis, :], x_test[:, np.newaxis, :]\n",
    "print(f\"shape of data:\\n\"\n",
    "      f\"x_train: {x_train.shape}, y_train: {y_train.shape},\\n\"\n",
    "      f\"x_val: {x_val.shape}, y_val: {y_val.shape}\\n\"\n",
    "      f\"x_test: {x_test.shape}, y_test: {y_test.shape}\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "source": [
    "## Build model and load weights\n",
    "plain_5, plain_11 = load_model('./checkpoints/plain5.hdf5'), load_model('./checkpoints/plain11.hdf5')\n",
    "shortcut5, shortcut11 = load_model('./checkpoints/shortcut5.hdf5'), load_model('./checkpoints/shortcut11.hdf5')\n",
    "models = {'plain 5': plain_5, 'plain 11': plain_11, 'shortcut 5': shortcut5, 'shortcut11': shortcut11}\n",
    "results = {model_name: model.predict(x_test).reshape((-1, )) for model_name, model in models.items()}\n",
    "for model_name, model_result in results.items():\n",
    "      print(model_name, \" : \", mean_squared_error(y_test, model_result)*100, \"%\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "execution_count": 31,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "plain 5  :  0.2707851525589865 %\n",
      "plain 11  :  0.26240810192725905 %\n",
      "shortcut 5  :  0.28330442301217196 %\n",
      "shortcut11  :  0.25743312483685266 %\n"
     ]
    }
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
--- a/model_training.ipynb
+++ b/model_training.ipynb
--- a/models.py
+++ b/models.py
@ -0,0 +1,264 @@
 import keras.callbacks
 import keras.layers as KL
 from keras import Model
 from keras.optimizers import adam_v2
 class Plain5(object):
    def __init__(self, model_path=None, input_shape=None):
        self.model = None
        self.input_shape = input_shape
        if model_path is not None:
            # TODO: loading from the file
            pass
        else:
            self.model = self.build_model()
    def build_model(self):
        input_layer = KL.Input(self.input_shape, name='input')
        x = KL.Conv1D(8, 3, padding='same', name='Conv1')(input_layer)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Dense(20, activation='relu', name='dense')(x)
        x = KL.Dense(1, activation='sigmoid', name='output')(x)
        model = Model(input_layer, x)
        return model
    def fit(self, x, y, x_val, y_val, epoch, batch_size):
        self.model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.01 * (batch_size / 256)))
        checkpoint = keras.callbacks.ModelCheckpoint(filepath='checkpoints/plain5.hdf5', monitor='val_loss',
                                                     mode="min", save_best_only=True)
        early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
                                                   patience=1000, verbose=0, mode='auto')
        lr_decay = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=25, min_delta=1e-6)
        callbacks = [checkpoint, early_stop, lr_decay]
        history = self.model.fit(x, y, validation_data=(x_val, y_val), epochs=epoch, verbose=1,
                                 callbacks=callbacks, batch_size=batch_size)
        return history
 class Residual5(object):
    def __init__(self, model_path=None, input_shape=None):
        self.model = None
        self.input_shape = input_shape
        if model_path is not None:
            # TODO: loading from the file
            pass
        else:
            self.model = self.build_model()
    def build_model(self):
        input_layer = KL.Input(self.input_shape, name='input')
        fx = KL.Conv1D(8, 3, padding='same', name='Conv1')(input_layer)
        fx = KL.BatchNormalization()(fx)
        x = KL.Activation('relu')(fx)
        fx = KL.Conv1D(8, 3, padding='same', name='Conv2')(x)
        fx = KL.BatchNormalization()(fx)
        fx = KL.Activation('relu')(fx)
        x = fx + x
        fx = KL.Conv1D(8, 3, padding='same', name='Conv3')(x)
        fx = KL.BatchNormalization()(fx)
        fx = KL.Activation('relu')(fx)
        x = fx + x
        x = KL.Dense(20, activation='relu', name='dense')(x)
        x = KL.Dense(1, activation='sigmoid', name='output')(x)
        model = Model(input_layer, x)
        return model
    def fit(self, x, y, x_val, y_val, epoch, batch_size):
        self.model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.01 * (batch_size / 256)))
        checkpoint = keras.callbacks.ModelCheckpoint(filepath='checkpoints/res5.hdf5', monitor='val_loss',
                                                     mode="min", save_best_only=True)
        early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
                                                   patience=1000, verbose=0, mode='auto')
        lr_decay = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=25, min_delta=1e-6)
        callbacks = [checkpoint, early_stop, lr_decay]
        history = self.model.fit(x, y, validation_data=(x_val, y_val), epochs=epoch, verbose=1,
                                 callbacks=callbacks, batch_size=batch_size)
        return history
 class ShortCut5(object):
    def __init__(self, model_path=None, input_shape=None):
        self.model = None
        self.input_shape = input_shape
        if model_path is not None:
            # TODO: loading from the file
            pass
        else:
            self.model = self.build_model()
    def build_model(self):
        input_layer = KL.Input(self.input_shape, name='input')
        x_raw = KL.Conv1D(8, 3, padding='same', name='Conv1')(input_layer)
        fx1 = KL.BatchNormalization()(x_raw)
        fx1 = KL.Activation('relu')(fx1)
        fx2 = KL.Conv1D(8, 3, padding='same', name='Conv2')(fx1)
        fx2 = KL.BatchNormalization()(fx2)
        fx2 = KL.Activation('relu')(fx2)
        fx3 = KL.Conv1D(8, 3, padding='same', name='Conv3')(fx2)
        fx3 = KL.BatchNormalization()(fx3)
        fx3 = KL.Activation('relu')(fx3)
        x = KL.Concatenate(axis=2)([x_raw, fx1, fx2, fx3])
        x = KL.Dense(20, activation='relu', name='dense')(x)
        x = KL.Dense(1, activation='sigmoid', name='output')(x)
        model = Model(input_layer, x)
        return model
    def fit(self, x, y, x_val, y_val, epoch, batch_size):
        self.model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.01 * (batch_size / 256)))
        checkpoint = keras.callbacks.ModelCheckpoint(filepath='checkpoints/shortcut5.hdf5', monitor='val_loss',
                                                     mode="min", save_best_only=True)
        early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0,
                                                   patience=1000, verbose=0, mode='auto')
        lr_decay = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=25, min_delta=1e-6)
        callbacks = [checkpoint, early_stop, lr_decay]
        history = self.model.fit(x, y, validation_data=(x_val, y_val), epochs=epoch, verbose=1,
                                 callbacks=callbacks, batch_size=batch_size)
        return history
 class ShortCut11(object):
    def __init__(self, model_path=None, input_shape=None):
        self.model = None
        self.input_shape = input_shape
        if model_path is not None:
            # TODO: loading from the file
            pass
        else:
            self.model = self.build_model()
    def build_model(self):
        input_layer = KL.Input(self.input_shape, name='input')
        x_raw = KL.Conv1D(8, 3, padding='same', name='Conv1_1')(input_layer)
        x = KL.BatchNormalization()(x_raw)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv1_2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv1_3')(x)
        x = KL.BatchNormalization()(x)
        fx1 = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2_1')(fx1)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2_2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2_3')(x)
        x = KL.BatchNormalization()(x)
        fx2 = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3_1')(fx2)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3_2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3_3')(x)
        x = KL.BatchNormalization()(x)
        fx3 = KL.Activation('relu')(x)
        x = KL.Concatenate(axis=2)([x_raw, fx1, fx2, fx3])
        x = KL.Dense(200, activation='relu', name='dense1')(x)
        x = KL.Dense(1, activation='sigmoid', name='output')(x)
        model = Model(input_layer, x)
        return model
    def fit(self, x, y, x_val, y_val, epoch, batch_size):
        self.model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.01 * (batch_size / 256)))
        checkpoint = keras.callbacks.ModelCheckpoint(filepath='checkpoints/shortcut11.hdf5', monitor='val_loss',
                                                     mode="min", save_best_only=True)
        early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=1e-6,
                                                   patience=200, verbose=0, mode='auto')
        lr_decay = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5,
                                                     patience=25, min_delta=1e-6)
        callbacks = [checkpoint, early_stop, lr_decay]
        history = self.model.fit(x, y, validation_data=(x_val, y_val), epochs=epoch, verbose=1,
                                 callbacks=callbacks, batch_size=batch_size)
        return history
 class Plain11(object):
    def __init__(self, model_path=None, input_shape=None):
        self.model = None
        self.input_shape = input_shape
        if model_path is not None:
            # TODO: loading from the file
            pass
        else:
            self.model = self.build_model()
    def build_model(self):
        input_layer = KL.Input(self.input_shape, name='input')
        x = KL.Conv1D(8, 3, padding='same', name='Conv1_1')(input_layer)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv1_2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv1_3')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2_1')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2_2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv2_3')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3_1')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3_2')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Conv1D(8, 3, padding='same', name='Conv3_3')(x)
        x = KL.BatchNormalization()(x)
        x = KL.Activation('relu')(x)
        x = KL.Dense(200, activation='relu', name='dense1')(x)
        x = KL.Dense(1, activation='sigmoid', name='output')(x)
        model = Model(input_layer, x)
        return model
    def fit(self, x, y, x_val, y_val, epoch, batch_size):
        self.model.compile(loss='mse', optimizer=adam_v2.Adam(learning_rate=0.01 * (batch_size / 256)))
        checkpoint = keras.callbacks.ModelCheckpoint(filepath='checkpoints/plain11.hdf5', monitor='val_loss',
                                                     mode="min", save_best_only=True)
        early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=1e-6,
                                                   patience=200, verbose=0, mode='auto')
        lr_decay = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5,
                                                     patience=25, min_delta=1e-6)
        callbacks = [checkpoint, early_stop, lr_decay]
        history = self.model.fit(x, y, validation_data=(x_val, y_val), epochs=epoch, verbose=1,
                                 callbacks=callbacks, batch_size=batch_size)
        return history
 if __name__ == '__main__':
    # plain5 = Plain5(model_path=None, input_shape=(1, 102))
    # plain11 = Plain11(model_path=None, input_shape=(1, 102))
    residual5 = Residual5(model_path=None, input_shape=(1, 102))
    short5 = ShortCut5(model_path=None, input_shape=(1, 102))
--- a/preprocess.ipynb
+++ b/preprocess.ipynb
@ -0,0 +1,127 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dd2c8c55",
   "metadata": {},
   "source": [
    "# Preprocessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "716880ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "from scipy.io import savemat, loadmat\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d7dc4a0",
   "metadata": {},
   "source": [
    "## Step 1:  \n",
    "Convert the dataset to mat format for Matlab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "711356a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = pd.read_csv('preprocess/dataset/mango/NAnderson2020MendeleyMangoNIRData.csv')\n",
    "y = dataset.DM\n",
    "x = dataset.loc[:, '684': '990']\n",
    "savemat('preprocess/dataset/mango/mango_origin.mat', {'x': x.values, 'y': y.values})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e41e8e6",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ea5e54fd",
   "metadata": {},
   "source": [
    "## Step3:\n",
    "Data split with train test split."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6eac026e",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = loadmat('preprocess/dataset/mango/mango_preprocessed.mat')\n",
    "x, y = data['x'], data['y']\n",
    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=24)\n",
    "if not os.path.exists('mango'):\n",
    "        os.makedirs('mango')\n",
    "savemat('preprocess/dataset/mango/mango_dm_split.mat',{'x_train':x_train, 'y_train':y_train, 'x_test':x_test, 'y_test':y_test,\n",
    "        'max_y': data['max_y'], 'min_y': data['min_y'],\n",
    "         'min_x':data['min_x'], 'max_x':data['max_x']})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2977dae",
   "metadata": {},
   "source": [
    "## Step 4:\n",
    "Show data with pictures\n",
    "use `draw_pics_origin` to draw original spectra\n",
    "![img](./preprocess/pics/raw.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "use `draw_pics_preprocessed.m` to draw proprecessed spectra\n",
    "![img](./preprocess/pics/preprocessed.png)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "7f619fc91ee8bdab81d49e7c14228037474662e3f2d607687ae505108922fa06"
  },
  "kernelspec": {
   "display_name": "Python 3.9.7 64-bit ('base': conda)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/preprocess/draw_pics_origin.m
+++ b/preprocess/draw_pics_origin.m
@ -0,0 +1,45 @@
 set(gca,'LooseInset',get(gca,'TightInset'))
 f = figure;
 f.Position(3:4) = [1331 331];
 %%% draw the pic of corn spectra
 load('dataset/corn.mat');
 x = m5spec.data;
 wave_length = m5spec.axisscale{2, 1};
 subplot(1, 4, 1)
 plot(wave_length, x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Absorbance');
 clear
 %%% draw the pic of Marzipan spectra
 load('dataset/marzipan.mat');
 x = NIRS1;
 wave_length = NIRS1_axis;
 subplot(1, 4, 2)
 plot(wave_length, x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Absorbance');
 clear
 %%% draw the pic of Marzipan spectra
 load('dataset/soil.mat');
 x = soil.data;
 wave_length = soil.axisscale{2, 1};
 subplot(1, 4, 3)
 plot(wave_length, x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Absorbance');
 clear
 % draw the pic of Mango spectra
 load('dataset/mango/mango_origin.mat');
 wave_length = 684: 3: 990;
 subplot(1, 4, 4)
 plot(wave_length, x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Signal intensity');
 clear
--- a/preprocess/draw_pics_preprocessed.m
+++ b/preprocess/draw_pics_preprocessed.m
@ -0,0 +1,48 @@
 set(gca,'LooseInset',get(gca,'TightInset'))
 f = figure;
 f.Position(3:4) = [1331 331];
 %%% draw the pic of corn spectra
 load('dataset/corn.mat');
 x = m5spec.data;
 wave_length = m5spec.axisscale{2, 1};
 preprocess;
 subplot(1, 4, 1)
 plot(wave_length(1, 1:end-1), x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Absorbance');
 clear
 %%% draw the pic of Marzipan spectra
 load('dataset/marzipan.mat');
 x = NIRS1;
 wave_length = NIRS1_axis;
 preprocess;
 subplot(1, 4, 2)
 plot(wave_length(1, 1:end-1), x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Absorbance');
 clear
 %%% draw the pic of Marzipan spectra
 load('dataset/soil.mat');
 x = soil.data;
 wave_length = soil.axisscale{2, 1};
 preprocess;
 subplot(1, 4, 3)
 plot(wave_length(1, 1:end-1), x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Absorbance');
 clear
 % draw the pic of Mango spectra
 load('dataset/mango/mango_preprocessed.mat');
 wave_length = 687: 3: 990;
 subplot(1, 4, 4)
 plot(wave_length, x');
 xlim([wave_length(1) wave_length(end)]);
 xlabel('Wavelength(nm)');
 ylabel('Signal intensity');
 clear
--- a/preprocess/pics/preprocessed.png
+++ b/preprocess/pics/preprocessed.png
--- a/preprocess/pics/raw.png
+++ b/preprocess/pics/raw.png
--- a/preprocess/preprocess.m
+++ b/preprocess/preprocess.m
@ -0,0 +1,8 @@
 %% x preprocessing
 x = x';
 x = sgolayfilt(x,2,17);
 x =diff(x);
 max_x=max(max(x));
 min_x=min(min(x));
 x=(x-min_x)/(max_x-min_x);
 x = x';
--- a/preprocess/preprocess_mango.m
+++ b/preprocess/preprocess_mango.m
@ -0,0 +1,15 @@
 %% x preprocessing
 clear;
 load('dataset/mango/mango_origin.mat')
 x = x';
 x = sgolayfilt(x,2,17);
 x =diff(x);
 max_x=max(max(x));
 min_x=min(min(x));
 x=(x-min_x)/(max_x-min_x);
 x = x';
 y = y';
 min_y = min(min(y));
 max_y = max(max(y));
 y = (y-min_y)/(max_y-min_y);
 save('dataset/mango/mango_preprocessed.mat')
--- a/preprocess/train_test_split.m
+++ b/preprocess/train_test_split.m
@ -0,0 +1,15 @@
 data=[x,y];
 test_rate = 0.3;
 data_num = size(x, 1);
 train_num = round((1-test_rate) * data_num);
 idx=randperm(data_num);
 train_idx=idx(1:train_num);
 test_idx=idx(train_num+1:data_num);
 data_train=data(train_idx,:);
 x_train=data_train(:,1:size(x, 2));
 y_train=data_train(:,size(x, 2)+1);
 test_data=data(test_idx,:);
 x_test=test_data(:,1:size(x, 2));
 y_test=test_data(:,size(x, 2)+1);
 clear data_num train_num idx train_idx test_idx test_data train_data x y;
 clear data data_train test_rate;
--- a/utils.py
+++ b/utils.py
@ -0,0 +1,153 @@
 from scipy.io import loadmat
 import numpy as np
 from sklearn.model_selection import train_test_split
 import os
 import shutil
 def load_data(data_path='./pine_water_cc.mat', validation_rate=0.25):
    if data_path == './pine_water_cc.mat':
        data = loadmat(data_path)
        y_train, y_test = data['value_train'], data['value_test']
        print('Value train shape: ', y_train.shape, 'Value test shape', y_test.shape)
        y_max_value, y_min_value = data['value_max'], data['value_min']
        x_train, x_test = data['DL_train'], data['DL_test']
    elif data_path == './N_100_leaf_cc.mat':
        data = loadmat(data_path)
        y_train, y_test = data['y_train'], data['y_test']
        x_train, x_test = data['x_train'], data['x_test']
        y_max_value, y_min_value = data['max_y'], data['min_y']
        x_train = np.expand_dims(x_train, axis=1)
        x_test = np.expand_dims(x_test, axis=1)
        x_validation, y_validation = x_test, y_test
        return x_train, x_test, x_validation, y_train, y_test, y_validation, y_max_value, y_min_value
    else:
        data = loadmat(data_path)
        y_train, y_test = data['y_train'], data['y_test']
        x_train, x_test = data['x_train'], data['x_test']
        y_max_value, y_min_value = data['max_y'], data['min_y']
    x_train = np.expand_dims(x_train, axis=1)
    x_test = np.expand_dims(x_test, axis=1)
    print('SG17 DATA train shape: ', x_train.shape, 'SG17 DATA test shape', x_test.shape)
    print('Mini value: %s, Max value %s.' % (y_min_value, y_max_value))
    x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train, test_size=validation_rate,
                                                                    random_state=8)
    return x_train, x_test, x_validation, y_train, y_test, y_validation, y_max_value, y_min_value
 def mkdir_if_not_exist(dir_name, is_delete=False):
    """
    创建文件夹
    :param dir_name: 文件夹
    :param is_delete: 是否删除
    :return: 是否成功
    """
    try:
        if is_delete:
            if os.path.exists(dir_name):
                shutil.rmtree(dir_name)
                print('[Info] 文件夹 "%s" 存在, 删除文件夹.' % dir_name)
        if not os.path.exists(dir_name):
            os.makedirs(dir_name)
            print('[Info] 文件夹 "%s" 不存在, 创建文件夹.' % dir_name)
        return True
    except Exception as e:
        print('[Exception] %s' % e)
        return False
 class Config:
    def __init__(self):
        # 数据有关的参数
        self.validation_rate = 0.2
        # 训练有关参数
        self.train_epoch = 20000
        self.batch_size = 20
        # 是否训练的参数
        self.train_cnn = True
        self.train_ms_cnn = True
        self.train_ms_sc_cnn = True
        # 是否评估参数
        self.evaluate_cnn = True
        self.evaluate_ms_cnn = True
        self.evaluate_ms_sc_cnn = True
        # 要评估的保存好的模型列表
        self.evaluate_cnn_name_list = []
        self.evaluate_ms_cnn_name_list = []
        self.evaluate_ms_sc_cnn_name_list = []
        # 存储训练出的模型和图片的文件夹
        self.img_dir = './pictures0331'
        self.checkpoint_dir = './check_points0331'
        # 数据集选择
        self.data_set = './dataset_preprocess/corn/corn_mositure.mat'
    def show_yourself(self, to_text_file=None):
        line_width = 36
        content = '\n'
        # create line
        line_text = 'Data Parameters'
        line = '='*((line_width-len(line_text))//2) + line_text + '='*((line_width-len(line_text))//2)
        line.ljust(line_width, '=')
        content += line + '\n'
        content += 'Validation Rate: ' + str(self.validation_rate) + '\n'
        # create line
        line_text = 'Training Parameters'
        line = '=' * ((line_width - len(line_text)) // 2) + line_text + '=' * ((line_width - len(line_text)) // 2)
        line.ljust(line_width, '=')
        content += line + '\n'
        content += 'Train CNN: ' + str(self.train_cnn) + '\n'
        content += 'Train Ms CNN: ' + str(self.train_ms_cnn) + '\n'
        content += 'Train Ms Sc CNN: ' + str(self.train_ms_sc_cnn) + '\n'
        # create line
        line_text = 'Evaluate Parameters'
        line = '=' * ((line_width - len(line_text)) // 2) + line_text + '=' * ((line_width - len(line_text)) // 2)
        line.ljust(line_width, '=')
        content += line + '\n'
        content += 'Train Epoch: ' + str(self.train_epoch) + '\n'
        content += 'Train Batch Size: ' + str(self.batch_size) + '\n'
        content += 'Evaluate CNN: ' + str(self.evaluate_cnn) + '\n'
        if len(self.evaluate_cnn_name_list) >=1:
            content += 'Saved CNNs to Evaluate:\n'
            for models in self.evaluate_cnn_name_list:
                content += models + '\n'
        content += 'Evaluate Ms CNN: ' + str(self.evaluate_ms_cnn) + '\n'
        if len(self.evaluate_ms_cnn_name_list) >= 1:
            content += 'Saved Ms CNNs to Evaluate:\n'
            for models in self.evaluate_ms_cnn_name_list:
                content += models + '\n'
        content += 'Evaluate Ms Sc CNN: ' + str(self.evaluate_ms_cnn) + '\n'
        if len(self.evaluate_ms_sc_cnn_name_list) >= 1:
            content += 'Saved Ms Sc CNNs to Evaluate:\n'
            for models in self.evaluate_ms_sc_cnn_name_list:
                content += models + '\n'
        # create line
        line_text = 'Saving Dir'
        line = '=' * ((line_width - len(line_text)) // 2) + line_text + '=' * ((line_width - len(line_text)) // 2)
        line.ljust(line_width, '=')
        content += line + '\n'
        content += 'Image Dir: ' + str(self.img_dir) + '\n'
        content += 'Check Point Dir: ' + str(self.img_dir) + '\n'
        print(content)
        if to_text_file:
            with open(to_text_file, 'w') as f:
                f.write(content)
        return content
 if __name__ == '__main__':
    config = Config()
    config.show_yourself(to_text_file='name.txt')
    x_train, x_test, x_validation, y_train, y_test, y_validation, y_max_value, y_min_value = \
        load_data(data_path='./yaowan_calibrate.mat', validation_rate=0.25)
    print(x_train.shape, x_test.shape, y_train.shape, y_test.shape, x_validation.shape, y_validation.shape,
          y_max_value, y_min_value)