Virtual directory

Virtual directory

In computing, the term virtual directory has a couple of meanings. It may simply designate (for example in IIS) a folder which appears in a path but which is not actually a subfolder of the preceding folder in the path. However, this article will discuss the term in the context of directory services and identity management. A virtual directory or virtual directory server (VDS) in this context is a software layer that delivers a single access point for identity management applications and service platforms. A virtual directory operates as a high-performance, lightweight abstraction layer that resides between client applications and disparate types of identity-data repositories, such as proprietary and standard directories, databases, web services, and applications. A virtual directory receives queries and directs them to the appropriate data sources by abstracting and virtualizing data. The virtual directory integrates identity data from multiple heterogeneous data stores and presents it as though it were coming from one source. This ability to reach into disparate repositories makes virtual directory technology ideal for consolidating data stored in a distributed environment. As of 2011, virtual directory servers most commonly use the LDAP protocol, but more sophisticated virtual directories can also support SQL as well as DSML and SPML. Industry experts have heralded the importance of the virtual directory in modernizing the identity infrastructure. According to Dave Kearns of Network World, "Virtualization is hot and a virtual directory is the building block, or foundation, you should be looking at for your next identity management project." In addition, Gartner analyst, Bob Blakley said that virtual directories are playing an increasingly vital role. In his report, “The Emerging Architecture of Identity Management,” Blakley wrote: “In the first phase, production of identities will be separated from consumption of identities through the introduction of a virtual directory interface.” == Capabilities == Virtual directories can have some or all of the following capabilities: Aggregate identity data across sources to create a single point of access. Create high-availability for authoritative data stores. Act as identity firewall by preventing denial-of-service attacks on the primary data stores through an additional virtual layer. Support a common searchable namespace for centralized authentication. Present a unified virtual view of user information stored across multiple systems. Delegate authentication to backend sources through source-specific security means. Virtualize data sources to support migration from legacy data stores without modifying the applications that rely on them. Enrich identities with attributes pulled from multiple data stores, based on a link between user entries. Some advanced identity virtualization platforms can also: Enable application-specific, customized views of identity data without violating internal or external regulations governing identity data. Reveal contextual relationships between objects through hierarchical directory structures. Develop advanced correlation across diverse sources using correlation rules. Build a global user identity by correlating unique user accounts across various data stores, and enrich identities with attributes pulled from multiple data stores, based on a link between user entries. Enable constant data refresh for real-time updates through a persistent cache. == Advantages == Virtual directories: Enable faster deployment because users do not need to add and sync additional application-specific data sources Leverage existing identity infrastructure and security investments to deploy new services Deliver high availability of data sources Provide application-specific views of identity data which can help avoid the need to develop a master enterprise schema Allow a single view of identity data without violating internal or external regulations governing identity data Act as identity firewalls by preventing denial-of-service attacks on the primary data-stores and providing further security on access to sensitive data Can reflect changes made to authoritative sources in real-time Leverages existing update processes of authoritative sources, so no separate (sometimes manual) process to update a central directory is needed Present a unified virtual view of user information from multiple systems so that it appears to reside in a single system Can secure all backend storage locations with a single security policy == Disadvantages == An original disadvantage is public perception of "push & pull technologies" which is the general classification of "virtual directories" depending on the nature of their deployment. Virtual directories were initially designed and later deployed with "push technologies" in mind, which also contravened with privacy laws of the United States. This is no longer the case. There are, however, other disadvantages in the current technologies. The classical virtual directory based on proxy cannot modify underlying data structures or create new views based on the relationships of data from across multiple systems. So if an application requires a different structure, such as a flattened list of identities, or a deeper hierarchy for delegated administration, a virtual directory is limited. Many virtual directories cannot correlate same-users across multiple diverse sources in the case of duplicate users Virtual directories without advanced caching technologies cannot scale to heterogeneous, high-volume environments. == Sample terminology == Unify metadata: Extract schemas from the local data source, map them to a common format, and link the same identities from different data silos based on a unique identifier. Namespace joining: Create a single large directory by bringing multiple directories together at the namespace level. For instance, if one directory has the namespace "ou=internal,dc=domain,dc=com" and a second directory has the namespace "ou=external,dc=domain,dc=com," then creating a virtual directory with both namespaces is an example of namespace joining. Identity joining: Enrich identities with attributes pulled from multiple data stores, based on a link between user entries. For instance if the user joeuser exists in a directory as "cn=joeuser,ou=users" and in a database with a username of "joeuser" then the "joeuser" identity can be constructed from both the directory and the database. Data remapping: The translation of data inside of the virtual directory. For instance, mapping “uid” to “samaccountname,” so a client application that only supports a standard LDAP-compliant data source is able to search an Active Directory namespace, as well. Query routing: Route requests based on certain criteria, such as “write operations going to a master, while read operations are forwarded to replicas.” Identity routing: Virtual directories may support the routing of requests based on certain criteria (such as write operations going to a master while read operations being forwarded to replicas). Authoritative source: A "virtualized" data repository, such as a directory or database, that the virtual directory can trust for user data. Server groups: Group one or more servers containing the same data and functionality. A typical implementation is the multi-master, multi-replica environment in which replicas process "read" requests and are in one server group, while masters process "write" requests and are in another, so that servers are grouped by their response to external stimuli, even though all share the same data. == Use cases == The following are sample use cases of virtual directories: Integrating multiple directory namespaces to create a central enterprise directory. Supporting infrastructure integrations after mergers and acquisitions. Centralizing identity storage across the infrastructure, making identity information available to applications through various protocols (including LDAP, JDBC, and web services). Creating a single access point for web access management (WAM) tools. Enabling web single sign-on (SSO) across varied sources or domains. Supporting role-based, fine-grained authorization policies Enabling authentication across different security domains using each domain’s specific credential checking method. Improving secure access to information both inside and outside of the firewall.

Distributed Common Ground System

The Distributed Common Ground System (DCGS) is a system which produces military intelligence for multiple branches of the American military. == DCGS Programs == DCGS-N - DCGS for the United States Navy DCGS-A - DCGS for the United States Army AF DCGS - DCGS for the United States Air Force DCGS-MC - DCGS for the United States Marine Corps DCGS-SOF - DCGS for the United States Special Operations Forces IS&A Support Center - DCGS-A Help Desk for the United States Army - https://dcgsahelp.max.gov/ - Max.gov sunset 15 December 2023 == Description == While in U.S. Air Force use, the system produces intelligence collected by the U-2 Dragonlady, RQ-4 Global Hawk, MQ-9 Reaper and MQ-1 Predator. The previous system of similar use was the Deployable Ground Station (DGS), which was first deployed in July 1994. Subsequent version of DGS were developed from 1995 through 2009. Although officially designated a "weapons system", it consists of computer hardware and software connected together in a computer network, devoted to processing and dissemination of information such as images. The 480th Intelligence, Surveillance and Reconnaissance Wing of the Air Combat Command operates and maintains the USAF system. A plan envisioned in 1998 was to develop interoperable systems for the Army and Navy, in addition to the Air Force. By 2006, version 10.6 was deployed by the Air Force, and a version known as DCGS-A was developed for the Army. After a 2010 report by General Michael T. Flynn, the program was intended to use cloud computing and be as easy to use as an iPad, which soldiers over a few years were commonly using. By April 2011, project manager Colonel Charles Wells announced version 3 of the Army system (code named "Griffin") was being deployed in the US war in Afghanistan. In January 2012, the United States Army Communications-Electronics Research, Development and Engineering Center hosted a meeting based on the DCGS-A early experience. It brought together technology providers in the hope of developing more integrated systems using cloud computing with open architectures, compared to previously specialized custom-built systems. A major contractor was Lockheed Martin, with computers supplied by Silicon Graphics International out of its Chippewa Falls, Wisconsin office. Software known as the Analyst's Notebook, originally developed by i2 Limited, was included in DCGS-A. IBM acquired i2 in 2011. Some US Army personnel reported using a Palantir Technologies product to improve their ability to predict locations of improvised explosive devices. An April 2012 report recommending further study after initial success. Palantir software was rated easy to use, but did not have the flexibility and wide number of data sources of DCGS-A. In July 2012, Congressman Duncan D. Hunter (from California, the state where Palantir is based) complained of US DoD obstacles to its wider use. Although a limited test in August 2011 by the Test and Evaluation Command had recommended deployment, operation problems of DCGS-A included the baseline system was "not operationally effective" with reboots on average about every 8 hours. A set of improvements was identified in November 2012. The press reported some of the shortcomings uncovered by General Genaro Dellarocco in the tests. The ambitious goal of integrating 473 data sources for 75 million reports proved to be challenging, after spending an estimated $2.3 billion on the Army system alone. In May 2013 Politico reported that Palantir lobbyists and some anonymous returning veterans continued to advocate the use of its software, despite its interoperability limits. In particular, members of special forces and US Marines were not required to use the official Army system. Similar stories appeared in other publications, with Army representatives (such as Major General Mary A. Legere) citing the limitations of various systems. Congressman Hunter was a member of the House Armed Services Committee which required a review of the program, after two other members of congress sent an open letter to Secretary of Defense Leon Panetta. The Senate Defense Appropriations Subcommittee included testimony from Army Chief of Staff General Ray Odierno. The 130th Engineer Brigade (United States) has found the system to be "unstable, slow, not friendly and a major hindrance to operations". The equivalent system for the United States Navy was planned for initial deployment by 2015, and within a shipboard network called Consolidated Afloat Networks and Enterprise Services (CANES) by 2016. Some early testing was announced in 2009 aboard the aircraft carrier USS Harry Truman. A portion of the software, a distributed data framework for the DCGS integration backbone (DIB) version 4, was submitted to an open-source software repository of the Codice Foundation on GitHub. The framework was new for DIB version 4, replacing the legacy DIB portal with an Ozone Widget Framework interface. It was written in the Java programming language. == DCGS-A == Distributed Common Ground System-Army (DCGS-A) is the United States Army's primary system to post data, process information, and disseminate Intelligence, Surveillance and Reconnaissance (ISR) information about the threat, weather, and terrain to echelons. DCGS-A provides commanders the ability to task battle-space sensors and receive intelligence information from multiple sources. === Promotion === An August 17, 2011, UPI article quoted i2 Chief Executive Officer Robert Griffin who commented on DCGS-A's best-of-breed approach to development. The article detailed the Army contracting with i2 for Analyst's Notebook software. "With its open architecture, Analyst's Notebook supports the Army's strategy to employ and integrate best-of-breed solutions from across the industry to meet the dynamic needs users face in the field on a daily basis." A February 1, 2012, article in the Army web page quoted Mark Kitz, DCGS-A technical director. DCGS-A "uses the latest in cloud technology to rapidly gather, collaborate and share intelligence data from multiple sources to deliver a common operating picture. DCGS-A is able to rapidly adapt to changing operational environments by leveraging an iterative development model and open architecture allowing for collaboration with multiple government, industry and academic partners." A July 2012 article in SIGNAL Magazine, monthly publication of the Armed Forces Communications and Electronics Association, promoted DCGS-A as taking advantage of technological environments with which young soldiers are familiar. The article quoted the DCGS-A program manager, Col. Charles Wells on the systems benefits. The article also included Lockheed Martin's DCGS-A program manager. The Milwaukee Journal Sentinel published an article May 4, 2012, about Wisconsin-located companies helping DCGS-A with cloud computing technology. The article promoted the speed when cloud computing processes intelligence and cost savings by analyzing data in the field. === The U.S. Army's 2011 Posture Statement === The U.S. Army released its 2011 Army Posture Statement March 2. It included a statement on DCGS-A: “The Distributed Common Ground System-Army (DCGS-A) is the Army's premier intelligence, surveillance, and reconnaissance (ISR) enterprise for the tasking of sensors, analysis and processing of data, exploitation of data, and dissemination of intelligence (TPED) across all echelons. It is the Army component of the larger Defense Intelligence Information Enterprise (DI2E) and interoperable with other Service DCGS programs. Under the DI2E framework, USD (I) hopes to provide COCOM Joint Intelligence Operations Centers (JIOCs) capabilities interoperable with DCGS-A through a Cloud/widget approach. DCGS-A connects tactical, operational, and theater-level commanders to hundreds of intelligence and intelligence-related data sources at all classification levels and allows them to focus efforts of the entire ISR community on their information requirements. === Comparisons === Some Ground Commanders who describe DCGS-A as "unwieldy and unreliable, hard to learn and difficult to use," supporting alternative software from Palantir Technologies. Palantir software supports small unit situational awareness, but is not sufficiently funded to support the broader role that DCGS-A fulfills. == Operators == 480th Intelligence, Surveillance and Reconnaissance Wing 9th Intelligence Squadron 13th Intelligence Squadron 548th Intelligence, Surveillance and Reconnaissance Group 548 Operational Support Squadron 48th Intelligence Squadron 101st Intelligence Squadron 113th Air Support Operations Squadron 127th Command and Control Squadron 161st Intelligence Squadron

Soft independent modelling of class analogies

Soft independent modelling by class analogy (SIMCA) is a statistical method for supervised classification of data. The method requires a training data set consisting of samples (or objects) with a set of attributes and their class membership. The term soft refers to the fact the classifier can identify samples as belonging to multiple classes and not necessarily producing a classification of samples into non-overlapping classes. == Method == In order to build the classification models, the samples belonging to each class need to be analysed using principal component analysis (PCA); only the significant components are retained. For a given class, the resulting model then describes either a line (for one Principal Component or PC), plane (for two PCs) or hyper-plane (for more than two PCs). For each modelled class, the mean orthogonal distance of training data samples from the line, plane, or hyper-plane (calculated as the residual standard deviation) is used to determine a critical distance for classification. This critical distance is based on the F-distribution and is usually calculated using 95% or 99% confidence intervals. New observations are projected into each PC model and the residual distances calculated. An observation is assigned to the model class when its residual distance from the model is below the statistical limit for the class. The observation may be found to belong to multiple classes and a measure of goodness of the model can be found from the number of cases where the observations are classified into multiple classes. The classification efficiency is usually indicated by Receiver operating characteristics. In the original SIMCA method, the ends of the hyper-plane of each class are closed off by setting statistical control limits along the retained principal components axes (i.e., score value between plus and minus 0.5 times score standard deviation). More recent adaptations of the SIMCA method close off the hyper-plane by construction of ellipsoids (e.g. Hotelling's T2 or Mahalanobis distance). With such modified SIMCA methods, classification of an object requires both that its orthogonal distance from the model and its projection within the model (i.e. score value within the region defined by the ellipsoid) are not significant. == Application == SIMCA as a method of classification has gained widespread use especially in applied statistical fields such as chemometrics and spectroscopic data analysis.

Loss function

In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its opposite (in specific domains, variously called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized. The loss function could include terms from several levels of the hierarchy. In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data. The concept, as old as Laplace, was reintroduced in statistics by Abraham Wald in the middle of the 20th century. In the context of economics, for example, this is usually economic cost or regret. In classification, it is the penalty for an incorrect classification of an example. In actuarial science, it is used in an insurance context to model benefits paid over premiums, particularly since the works of Harald Cramér in the 1920s. In optimal control, the loss is the penalty for failing to achieve a desired value. In financial risk management, the function is mapped to a monetary loss. == Examples == === Regret === Leonard J. Savage argued that using non-Bayesian methods such as minimax, the loss function should be based on the idea of regret, i.e., the loss associated with a decision should be the difference between the consequences of the best decision that could have been made under circumstances will be known and the decision that was in fact taken before they were known. === Quadratic loss function === The use of a quadratic loss function is common, for example when using least squares techniques. It is often more mathematically tractable than other loss functions because of the properties of variances, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is t {\displaystyle t} , then a quadratic loss function is λ ( x ) = C ( t − x ) 2 {\displaystyle \lambda (x)=C(t-x)^{2}\;} for some constant C {\displaystyle C} ; the value of the constant makes no difference to a decision, and can be ignored by setting it equal to 1. This is also known as the squared error loss (SEL). Many common statistics, including t-tests, regression models, design of experiments, and much else, use least squares methods applied using linear regression theory, which is based on the quadratic loss function. The quadratic loss function is also used in linear-quadratic optimal control problems. In these problems, even in the absence of uncertainty, it may not be possible to achieve the desired values of all target variables. Often loss is expressed as a quadratic form in the deviations of the variables of interest from their desired values; this approach is tractable because it results in linear first-order conditions. In the context of stochastic control, the expected value of the quadratic form is used. The quadratic loss assigns more importance to outliers than to the true data due to its square nature, so alternatives like the Huber, log-cosh and SMAE losses are used when the data has many large outliers. === 0-1 loss function === In statistics and decision theory, a frequently used loss function is the 0-1 loss function L ( y ^ , y ) = { 0 if y = y ^ 1 if y ≠ y ^ {\displaystyle L({\hat {y}},y)={\begin{cases}0&{\text{if }}y={\hat {y}}\\1&{\text{if }}y\neq {\hat {y}}\end{cases}}} In information theory, this loss function is known as Hamming distortion. == Constructing loss and objective functions == In many applications, objective functions, including loss functions as a particular case, are determined by the problem formulation. In other situations, the decision maker’s preference must be elicited and represented by a scalar-valued function (called also utility function) in a form suitable for optimization — the problem that Ragnar Frisch has highlighted in his Nobel Prize lecture. The existing methods for constructing objective functions are collected in the proceedings of two dedicated conferences. In particular, Andranik Tangian showed that the most usable objective functions — quadratic and additive — are determined by a few indifference points. He used this property in the models for constructing these objective functions from either ordinal or cardinal data that were elicited through computer-assisted interviews with decision makers. Among other things, he constructed objective functions to optimally distribute budgets for 16 Westfalian universities and the European subsidies for equalizing unemployment rates among 271 German regions. == Expected loss == In some contexts, the value of the loss function itself is a random quantity because it depends on the outcome of a random variable X {\displaystyle X} . === Statistics === Both frequentist and Bayesian statistical theory involve making a decision based on the expected value of the loss function; however, this quantity is defined differently under the two paradigms. ==== Frequentist expected loss ==== We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the probability distribution, P θ {\displaystyle P_{\theta }} , of the observed data, X {\displaystyle X} . This is also referred to as the risk function of the decision rule δ {\displaystyle \delta } and the parameter θ {\displaystyle \theta } . Here the decision rule depends on the outcome of X {\displaystyle X} . The risk function is given by: R ( θ , δ ) = E θ ⁡ L ( θ , δ ( X ) ) = ∫ X L ( θ , δ ( x ) ) d P θ ( x ) . {\displaystyle R(\theta ,\delta )=\operatorname {E} _{\theta }L{\big (}\theta ,\delta (X){\big )}=\int _{X}L{\big (}\theta ,\delta (x){\big )}\,\mathrm {d} P_{\theta }(x).} Here, θ {\displaystyle \theta } is a fixed but possibly unknown state of nature, X {\displaystyle X} is a vector of observations stochastically drawn from a population, E θ {\displaystyle \operatorname {E} _{\theta }} is the expectation over all population values of X {\displaystyle X} , d P θ {\displaystyle \mathrm {d} P_{\theta }} is a probability measure over the event space of X {\displaystyle X} (parametrized by θ {\displaystyle \theta } ) and the integral is evaluated over the entire support of X {\displaystyle X} . ==== Bayes Risk ==== In a Bayesian approach, the expectation is calculated using the prior distribution π ∗ {\displaystyle \pi ^{}} of the parameter θ {\displaystyle \theta } : ρ ( π ∗ , a ) = ∫ Θ ∫ X L ( θ , a ( x ) ) d P ( x | θ ) d π ∗ ( θ ) = ∫ X ∫ Θ L ( θ , a ( x ) ) d π ∗ ( θ | x ) d M ( x ) {\displaystyle \rho (\pi ^{},a)=\int _{\Theta }\int _{\mathbf {X}}L(\theta ,a({\mathbf {x}}))\,\mathrm {d} P({\mathbf {x}}\vert \theta )\,\mathrm {d} \pi ^{}(\theta )=\int _{\mathbf {X}}\int _{\Theta }L(\theta ,a({\mathbf {x}}))\,\mathrm {d} \pi ^{}(\theta \vert {\mathbf {x}})\,\mathrm {d} M({\mathbf {x}})} where M ( x ) {\displaystyle M(\mathbf {x} )} is known as the predictive likelihood wherein θ {\displaystyle \theta } has been "integrated out," π ∗ ( θ | x ) {\displaystyle \pi ^{}(\theta |\mathbf {x} )} is the posterior distribution, and the order of integration has been changed. One then should choose the action a ∗ {\displaystyle a^{}} which minimises this expected loss, which is referred to as Bayes Risk. In the latter equation, the integrand inside d x {\displaystyle \mathrm {d} x} is known as the Posterior Risk, and minimising it with respect to decision a {\displaystyle a} also minimizes the overall Bayes Risk. This optimal decision, a ∗ {\displaystyle a^{}} is known as the Bayes (decision) Rule - it minimises the average loss over all possible states of nature θ {\displaystyle \theta } , over all possible (probability-weighted) data outcomes. One advantage of the Bayesian approach is to that one need only choose the optimal action under the actual observed data to obtain a uniformly optimal one, whereas choosing the actual frequentist optimal decision rule as a function of all possible observations, is a much more difficult problem. Of equal importance though, the Bayes Rule reflects consideration of loss outcomes under different states of nature, θ {\displaystyle \theta } . ==== Examples in statistics ==== For a scalar parameter θ {\displaystyle \theta } , a decision function whose output θ ^ {\displaystyle {\hat {\theta }}} is an estimate of θ {\displaystyle \theta } , and a quadratic loss function (squared error loss) L ( θ , θ ^ ) = ( θ − θ ^ ) 2 , {\displaystyle L(\theta ,{\hat {\theta }})=(\theta -{\hat {\theta }})^{2},} the risk function becomes the mean squared error of the estimate, R ( θ , θ ^ ) = E θ ⁡ [ ( θ − θ ^ ) 2 ] . {\displaystyle R(\theta ,{\hat {\thet

Gaussian process emulator

In statistics, Gaussian process emulator is one name for a general type of statistical model that has been used in contexts where the problem is to make maximum use of the outputs of a complicated (often non-random) computer-based simulation model. Each run of the simulation model is computationally expensive and each run is based on many different controlling inputs. The variation of the outputs of the simulation model is expected to vary reasonably smoothly with the inputs, but in an unknown way. The overall analysis involves two models: the simulation model, or "simulator", and the statistical model, or "emulator", which notionally emulates the unknown outputs from the simulator. The Gaussian process emulator model treats the problem from the viewpoint of Bayesian statistics. In this approach, even though the output of the simulation model is fixed for any given set of inputs, the actual outputs are unknown unless the computer model is run and hence can be made the subject of a Bayesian analysis. The main element of the Gaussian process emulator model is that it models the outputs as a Gaussian process on a space that is defined by the model inputs. The model includes a description of the correlation or covariance of the outputs, which enables the model to encompass the idea that differences in the output will be small if there are only small differences in the inputs.

Discrete skeleton evolution

Discrete Skeleton Evolution (DSE) describes an iterative approach to reducing a morphological or topological skeleton. It is a form of pruning in that it removes noisy or redundant branches (spurs) generated by the skeletonization process, while preserving information-rich "trunk" segments. The value assigned to individual branches varies from algorithm to algorithm, with the general goal being to convey the features of interest of the original contour with a few carefully chosen lines. Usually, clarity for human vision (aka. the ability to "read" some features of the original shape from the skeleton) is valued as well. DSE algorithms are distinguished by complex, recursive decision-making processes with high computational requirements. Pruning methods such as by structuring element (SE) convolution and the Hough transform are general purpose algorithms which quickly pass through an image and eliminate all branches shorter than a given threshold. DSE methods are most applicable when detail retention and contour reconstruction are valued. == Methodology == === Pre-processing === Input images will typical contain more data than is necessary to generate an initial skeleton, and thus must be reduced in some way. Reducing the resolution, converting to grayscale, and then binary by masking or thresholding are common first steps. Noise removal may occur before and/or after converting an image to binary. Morphological operations such as closing, opening, and smoothing of the binary image may also be part of pre-processing. Ideally, the binarized contour should be as noise-free as possible before the skeleton is generated. === Skeletonization === DSE techniques may be applied to an existing skeleton or incorporated as part of the skeleton growing algorithm. Suitable skeletons may be obtained using a variety of methods: Thinning algorithms, such as the Grassfire transform Voronoi diagram Medial Axis Transform or Symmetry Axis Transform Distance Mapping === Significance Measures === DSE and related methods remove entire spurious branches while leaving the main trunk intact. The intended result is typically optimized for visual clarity and retention of information, such that the original contour can be reconstructed from the fully pruned skeleton. The value of various properties must be weighted by the application, and improving the efficiency is an ongoing topic of research in computer vision and image processing. Some significance measures include: Discrete Bisector Function Contour length Bending Potential Ratio Discrete Curve Evolution === Iteration === Each branch is evaluated during a pass through the skeletonized image according to the specific algorithm being used. Low value branches are removed and the process is repeated until a desired threshold of simplicity is reached. === Reconstruction === If all points on the output skeleton are the center points of maximal disks of the image and the radius information is retained, a contour image can be reconstructed == Applications == === Handwriting and text parsing === Variability in hand-written text is an ongoing challenge, simplification makes it somewhat easier for computer vision algorithms to make judgements about intended characters. === Soft body classification (animals) === The maximal disks centered on the skeleton imply roughly spherical masses, the features of the extracted skeleton are relatively unchanged even as the soft body deforms or self-occludes. Skeleton information is one facet of determining whether two animals are the "same" some way, though it must usually be paired with another technique to effectively identify a target. === Medical uses === Investigation of organs, tissue damage and deformation caused by disease.

Boosting (machine learning)

In machine learning (ML), boosting is an ensemble learning method that combines a set of less accurate models (called "weak learners") to create a single, highly accurate model (a "strong learner"). Unlike other ensemble methods that build models in parallel (such as bagging), boosting algorithms build models sequentially. Each new model in the sequence is trained to correct the errors made by its predecessors. This iterative process allows the overall model to improve its accuracy, particularly by reducing bias. Boosting is a popular and effective technique used in supervised learning for both classification and regression tasks. The theoretical foundation for boosting came from a question posed by Kearns and Valiant (1988, 1989): "Can a set of weak learners create a single strong learner?" A weak learner is defined as a classifier that performs only slightly better than random guessing, whereas a strong learner is a classifier that is highly correlated with the true classification. Robert Schapire's affirmative answer to this question in a 1990 paper led to the development of practical boosting algorithms. The first such algorithm was developed by Schapire, with Freund and Schapire later developing AdaBoost, which remains a foundational example of boosting. == Algorithms == While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are weighted in a way that is related to the weak learners' accuracy. After a weak learner is added, the data weights are readjusted, known as "re-weighting". Misclassified input data gain a higher weight and examples that are classified correctly lose weight. Thus, future weak learners focus more on the examples that previous weak learners misclassified. There are many boosting algorithms. The original ones, proposed by Robert Schapire (a recursive majority gate formulation), and Yoav Freund (boost by majority), were not adaptive and could not take full advantage of the weak learners. Schapire and Freund then developed AdaBoost, an adaptive boosting algorithm that won the prestigious Gödel Prize. Only algorithms that are provable boosting algorithms in the probably approximately correct learning formulation can accurately be called boosting algorithms. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms. The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. AdaBoost is very popular and the most significant historically as it was the first algorithm that could adapt to the weak learners. It is often the basis of introductory coverage of boosting in university machine learning courses. There are many more recent algorithms such as LPBoost, TotalBoost, BrownBoost, xgboost, MadaBoost, LogitBoost, CatBoost and others. Many boosting algorithms fit into the AnyBoost framework, which shows that boosting performs gradient descent in a function space using a convex cost function. == Object categorization in computer vision == Given images containing various known objects in the world, a classifier can be learned from them to automatically classify the objects in future images. Simple classifiers built based on some image feature of the object tend to be weak in categorization performance. Using boosting methods for object categorization is a way to unify the weak classifiers in a special way to boost the overall ability of categorization. === Problem of object categorization === Object categorization is a typical task of computer vision that involves determining whether or not an image contains some specific category of object. The idea is closely related with recognition, identification, and detection. Appearance based object categorization typically contains feature extraction, learning a classifier, and applying the classifier to new examples. There are many ways to represent a category of objects, e.g. from shape analysis, bag of words models, or local descriptors such as SIFT, etc. Examples of supervised classifiers are Naive Bayes classifiers, support vector machines, mixtures of Gaussians, and neural networks. However, research has shown that object categories and their locations in images can be discovered in an unsupervised manner as well. === Status quo for object categorization === The recognition of object categories in images is a challenging problem in computer vision, especially when the number of categories is large. This is due to high intra class variability and the need for generalization across variations of objects within the same category. Objects within one category may look quite different. Even the same object may appear unalike under different viewpoint, scale, and illumination. Background clutter and partial occlusion add difficulties to recognition as well. Humans are able to recognize thousands of object types, whereas most of the existing object recognition systems are trained to recognize only a few, e.g. human faces, cars, simple objects, etc. Research has been very active on dealing with more categories and enabling incremental additions of new categories, and although the general problem remains unsolved, several multi-category objects detectors (for up to hundreds or thousands of categories) have been developed. One means is by feature sharing and boosting. === Boosting for binary categorization === AdaBoost can be used for face detection as an example of binary categorization. The two categories are faces versus background. The general algorithm is as follows: Form a large set of simple features Initialize weights for training images For T rounds Normalize the weights For available features from the set, train a classifier using a single feature and evaluate the training error Choose the classifier with the lowest error Update the weights of the training images: increase if classified wrongly by this classifier, decrease if correctly Form the final strong classifier as the linear combination of the T classifiers (coefficient larger if training error is small) After boosting, a classifier constructed from 200 features could yield a 95% detection rate under a 10 − 5 {\displaystyle 10^{-5}} false positive rate. Another application of boosting for binary categorization is a system that detects pedestrians using patterns of motion and appearance. This work is the first to combine both motion information and appearance information as features to detect a walking person. It takes a similar approach to the Viola-Jones object detection framework. === Boosting for multi-class categorization === Compared with binary categorization, multi-class categorization looks for common features that can be shared across the categories at the same time. They turn to be more generic edge like features. During learning, the detectors for each category can be trained jointly. Compared with training separately, it generalizes better, needs less training data, and requires fewer features to achieve the same performance. The main flow of the algorithm is similar to the binary case. What is different is that a measure of the joint training error shall be defined in advance. During each iteration the algorithm chooses a classifier of a single feature (features that can be shared by more categories shall be encouraged). This can be done via converting multi-class classification into a binary one (a set of categories versus the rest), or by introducing a penalty error from the categories that do not have the feature of the classifier. In the paper "Sharing visual features for multiclass and multiview object detection", A. Torralba et al. used GentleBoost for boosting and showed that when training data is limited, learning via sharing features does a much better job than no sharing, given same boosting rounds. Also, for a given performance level, the total number of features required (and therefore the run time cost of the classifier) for the feature sharing detectors, is observed to scale approximately logarithmically with the number of class, i.e., slower than linear growth in the non-sharing case. Similar results are shown in the paper "Incremental learning of object detectors using a visual shape alphabet", yet the authors used AdaBoost for boosting. == Convex vs. non-convex boosting algorithms == Boosting algorithms can be based on convex or non-convex optimization algorithms. Convex algorithms, such as AdaBoost and LogitBoost, can be "defeated" by random noise such that they can't learn basic and learnable combinations of weak hypotheses. This limitation was pointed out by Long & Servedio in 2008. However, by 2009, multiple authors demonstrated that boosting algorithms based on non-convex optimization, such as BrownBoost, can learn from nois