In this thesis, we are exploring the recent advancements in neural networks (NNs), for cell classification in spreadsheets. The latter is necessary for the automatic comprehension of spreadsheet contents. This will contribute to our final goal, extracting and transforming spreadsheet data with minimum intervention from users.
Long Short Term Memory (LSTM) networks have been particularly successful with the classification of items organized in long sequences. Considering this, we hypothesize that LSTMs can be successfully applied for cell classification, as well. It is straightforward to sort and organize cells into sequences, using their row and column numbers. In our case, we are primarily interested to capture the context of a cell (i.e., features of neighboring cells). Intuitively, the classifier can make better decisions if it additionally considers the surroundings of a cell. Thus, for each to-be-classified cell, we create sequences that hold the cell itself (as the last element) and its neighbors.