Projection Overview
Projection components are instances of a program called HiSee. HiSee is a high dimensional visualizer. As a neural net operates its state travels along a path in its state space which typically has many dimensions. Hisee can take the collection of states which are visited by a neural net and project them down into two dimensions so that many of their geometric and topological properties are preserved. This gives users of Simbrain a way to visualize the behavior of the network component. More information about HiSee is available at the HiSee website: http://hisee.sourceforge.net .
This gauge represents the states (patterns of activity across the nodes of the network) that have occurred in a network with 40 neurons. Each green dot corresponds to one of those states. The red dot represents the current state. For the most part, points that are close to each other in the gauge correspond to states that close to each other in the network's 40-dimensional state space.
When Simbrain is running users can add one or more gauges windows, each of which projects some subset of the network's state variables to two dimensions. This allows users to independently study different aspects of the network component. For example, one gauge might represent the activity at the input nodes of a network, another might represent activity at the hidden nodes, another might represent the aggregate activity of the entire network, and another might represent the changing values of the network's weights as it learns.
What the Dots Represent
Each "dot" in the gauge represents a state of the network. When a gauge is first opened, there are no dots, since no state information has been sent to the gauge yet. With each update of the network, state information is sent to the gauge which is then displayed as a red dot. The set of dots that appear in the gauge window represent the states that have been visited by the associated network since the gauge was added. This representation of the history of a network's activity allows for the user to understand and analyze the network's behavior. As the network is updated the user can watch new dots appear as new states arise.
The gauge below represents the states that have occurred in a network with 20 neurons. Each dot corresponds to one of those states, where the red dot is current state, and the blue dots are all previous states (colors can be changed in the Preferences Dialog). Points that are close to each other in the gauge correspond to patterns of activity that are similar. One can linger over a data point to reveal a tooltip that displays which high-dimensional position they correspond to.
When a gauge is created by pressing a network's Gauge button, they, by default, represent the activity of all the neurons across the network, though they can be set to represent any subset of the neurons of a neural network. Gauges can also be used to represent weight strengths.
Pan and Zoom
The gauge display is currently 2-dimensional. It is based on the Piccolo zoomable user interface (ZUI), which allows users to pan and zoom and graphical data. When autoscale in the preference menu is turned off, you can pan the visible data by left-dragging (dragging the mouse while holding the left-mouse button down), and you can zoom in or out on data by right-dragging (dragging the mouse while holding the right button down).
Projection Methods
There are numerous ways to project high dimensional data to lower dimensions. The projection menu in the gauge allows one to switch projections and thereby compare the way neural network data look under different projections.
This is perhaps the simplest possible projection technique. If one has a list of datapoints with 40 components each, coordinate projection to two-dimensions simply ignores all but two of these components, which are then used to display the data in two-space.
Principal Component Analysis (PCA)
PCA builds on coordinate projection by making use of the "principal axes" of the dataset. The principal axes of an object are the directions in space about which the object is most balanced or evenly spaced. PCA selects the two principal axes along which the dataset is the most spread out and projects the data onto these two axes.
The Sammon map is an iterative technique for making interpoint distances in the low-dimensional projection as close as possible to the interpoint distances in the high-dimensional object. Two points close together in the high-dimensional space should appear close together in the projection, while two points far apart in the high dimensional space should appear far apart in the projection. By minimizing an error function between the high and low dimensional sets of interpoint distances, the Sammon map does its best to preserve these distances in the projection. This iterative procedure can be watched in the gauge by loading a dataset and pressing the "play" button on the interface.
Note: Before Sammon Mapping is used, it is useful to Randomize,, the data points, as overlapping points cause the algorithm to blow up. One would run the Sammon Mapping process after data points have been developed by PCA or Coordinate Projection.
Gauge Toolbar
Projection Selector | This drop down box allows you to select what projection method to use on the data. Either PCA, Sammon, or Coordinate. |
![]() |
Pressing this button will cause the program to iterate the algorithm an indefinite number of steps until the Stop button, ![]() |
![]() |
Pressing this button will cause the program to stop iterating the algorithm. |
![]() |
Pressing this button will cause the projection algorithm to iterate once. |
![]() |
This button clears both the high dimensional data set and low dimensional data set from the program. |
![]() |
This button randomizes the points in the low dimensional set. Useful for bumping the Sammon map out of local minima, and for exploring different possible projections of a given dataset under the Sammon map. |
Projector Preferences: Coordinate
First and second dimension: control which dimensions of the high-dimensional data are projected to the horizontal and vertical axes of the display.
Automatically use most variant dimensions: If this is selected then the program selects the two most variant axes of the high dimensional dataset for coordinate projection.
Step size How much the points should be incremented at each iteration of the Sammon map. The bigger the step size is the faster the projection algorithm will run but if the step size is too large the projected image will explode. One can generally experiment with different step sizes to get the right one. If iteration is progressing very slowly, one can just try something large, like 100, 300, or even 1000. If the dataset "explodes" (in which case everything in the display may contract to a point), press the randomize button to start over. Tip: A step size of a little less than 1 is good for objects with about a dozen points while a step size in the hundreds is good for objects with hundreds of points.
Projector Preferences: PCA
There are no preferences for PCA.
Tolerance: When datasets are initially loaded, or when data are added to an existing dataset we want to ignore repeated points. Even if a new point is not exactly the same as some other point in the set, it may be "close enough" to be considered the same point. This field allows one to set a tolerance level for deciding whether two points are the same. If "2" is specified in this field, for example, then any new point within a radius of 2 of some existing point will not be added to the dataset. Note: Repeated points are allowed in the low-dimensional dataset; this field only applies to the high-dimensional data.
Purturbation Factor: the distance the program will move coincident low dimensional points before running the Sammon mapping algorithm. It must do this because overlapping low-dimensional points will cause the Sammon map to divide by zero (this is observed on-screen as the disappearance of all data or the contraction of data to a small point).
Add new datapoint using: allows users to choose how new points will be added to the dataset. These methods are described here.
Default Projector: the default projector to be used by each subsequent gauge that is opened.
Keyboard
H: Dump the high-dimensional dataset to the terminal window.
L: Dump the low-dimensional dataset to the terminal window.
Menus
File
Open: is used for opening guage files with an xml encoding.
Save/Save As: saves guages using the xml style encoding. The xml files stores gauged data as well as preferences and information about what state variables in what network this gauge represents.
Import: is used for importing gauge files from CSV (comma separated values) files.
Preferences
Projection Preferences: Preferences relating to the currently selected projection method.
Graphics / GUI Preferences: Preferences relating to the appearances of the dots.
General Preferences: General preferences relating to gauged data.
Autoscale: Autoscale is used for automatically re-sizing the viewing window to fit all data points. Unchecking "Autoscale" may cause some data points to not be shown and, therefore, the full dataset may not be seen. However, specific regions of data can be zoomed in on using the mouse.
Tolerance: Tolerance
Hi D1: Hi D1
Hi D2: Hi D2
Pertubation Amount: Pertubation Amount
Epsilon: Epsilon
Auto Find: Auto Find
Help
Help: Opens this help file.
Adding Points
In some cases it is useful to be able to add new points to an existing dataset without running the projection method on the whole dataset again. Methods exist for quickly adding new data points based on data that have already been projected. These methods work best when a certain amount of data has already been collected and projected using, for example, PCA or the Sammon map. Note that these methods will rarely be applied in most uses of Simbrain.
Nearest Neighbor Subspace Method
(1) Takes each new point and determines the three points in the current data set that are closest to it.
(2) Finds the projection of the new point into the two-dimensional subspace that contains the three nearest neighbors in the high-dimensional space.
(3) Uses the three nearest neighbors and their corresponding points in the low dimensional dataset to find an affine map that approximates the full projection method (whichever one is currently being used).
(4) Applies the affine map to the new datapoint.
The Triangulate method takes each new point and determines which two points in the current data set are closest to it. Then, if possible, it will place the projected image of the new point so that its distance from the projected image of its two nearest neighbors is the same as it was in the high dimensional space. When it is not possible to project the point such that its distance to its two nearest neighbors is preserved, then the projected image of the new point will be placed on a line connecting the projected image of its two nearest neighbors. In this case the position of the projected image of the new point on this line is determined by the relative sizes of the distances between the new point and its two nearest neighbors in the current data set.
Refresh
Data points are not added using any special algorithm. Rather, when new data points arrive, the current projection algorithm is re-run on the entire updated dataset (if the current projection algorithm is an iterative algorithm like the Sammon map then coordinate projection is used by default). PCA tends to be useful in refresh mode, for it is relatively fast but also takes into consideration the entire dataset. For better results using coordinate projection, Automatically Select Most Variant Dimensions can be used.