org.electrocodeogram.cpc.core.api.provider.similarity
Interface ISimilarityProvider

All Superinterfaces:
IProvider

public interface ISimilarityProvider
extends IProvider

A similarity provider can be used to determine the percentage of similarity between two given IClone instances.
The ISimilarityProvider interface is implemented by all similarity provider implementations.

An implementation will typically offer its own extension API to allow addition, modification or removal of the strategies used to determine the similarity value.

Author:
vw

Field Summary
static java.lang.String LANGUAGE_C
          For future extensions.
static java.lang.String LANGUAGE_C_PLUS_PLUS
          For future extensions.
static java.lang.String LANGUAGE_JAVA
          Possible value for the language parameters of this interface.
static java.lang.String LANGUAGE_JAVASCRIPT
          For future extensions.
static java.lang.String LANGUAGE_OTHER
          Possible value for the language parameters of this interface.
static java.lang.String LANGUAGE_PERL
          For future extensions.
static java.lang.String LANGUAGE_PHP
          For future extensions.
static java.lang.String LANGUAGE_PYTHON
          For future extensions.
static java.lang.String LANGUAGE_RUBY
          For future extensions.
static java.lang.String LANGUAGE_TEXT
          Possible value for the language parameters of this interface.
 
Method Summary
 int calculateSimilarity(java.lang.String language, IClone clone1, IClone clone2, boolean transientCheck)
          Takes two clones and calculates the similarity of the two clones to each other.
 int calculateSimilarity(java.lang.String language, java.lang.String content1, java.lang.String content2)
          Simple interface for similarity calculation between two strings.
 
Methods inherited from interface org.electrocodeogram.cpc.core.api.provider.IProvider
getProviderName, toString
 

Field Detail

LANGUAGE_JAVA

static final java.lang.String LANGUAGE_JAVA
Possible value for the language parameters of this interface.
Indicates to the similarity provider that the given clone contents are potentially valid java source fragments.
This is only a hint, the source fragments may have invalid syntax or may not actually be java sources.
The similarity provider will fall back to LANGUAGE_TEXT if it can't parse the given sources.

See Also:
Constant Field Values

LANGUAGE_OTHER

static final java.lang.String LANGUAGE_OTHER
Possible value for the language parameters of this interface.
Indicates to the similarity provider that the given clone contents are potentially source fragments in an unknown language.
The similarity provider may try to normalise white spaces for such cases.

See Also:
Constant Field Values

LANGUAGE_TEXT

static final java.lang.String LANGUAGE_TEXT
Possible value for the language parameters of this interface.
Indicates to the similarity provider that the given clone contents are not sources in any particular programming language and that they should be handled as plain text.

See Also:
Constant Field Values

LANGUAGE_C_PLUS_PLUS

static final java.lang.String LANGUAGE_C_PLUS_PLUS
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values

LANGUAGE_C

static final java.lang.String LANGUAGE_C
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values

LANGUAGE_PERL

static final java.lang.String LANGUAGE_PERL
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values

LANGUAGE_PHP

static final java.lang.String LANGUAGE_PHP
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values

LANGUAGE_PYTHON

static final java.lang.String LANGUAGE_PYTHON
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values

LANGUAGE_RUBY

static final java.lang.String LANGUAGE_RUBY
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values

LANGUAGE_JAVASCRIPT

static final java.lang.String LANGUAGE_JAVASCRIPT
For future extensions.

See Also:
LANGUAGE_JAVA, Constant Field Values
Method Detail

calculateSimilarity

int calculateSimilarity(java.lang.String language,
                        IClone clone1,
                        IClone clone2,
                        boolean transientCheck)
Takes two clones and calculates the similarity of the two clones to each other.
The similarity is returned as a percent value.

Similarity is based on the contents of the given clones. The clone uuids are not taken into account. It is therefore possible to calculate the similarity between two instances of the same clone.

A similarity provider may internally acquire a store provider to obtain additional data for the clones in question, if transientCheck is false.
I.e. the detailed CloneDiffs.

NOTE: A similarity of 100 may only be returned if it can be guaranteed that the two code fragments are semantically equal. Thus clients of this API can distinguish two classes of matches, =100 and <100.

Parameters:
language - indication of the potential programming language of the given source fragments, never null.
clone1 - the first clone to compare, never null.
clone2 - the second clone to compare, never null.
transientCheck - true if the given clones might not be in sync with the store provider, in this case any implementation of this interface is forbidden to query the store provider for any additional info about the clones.
Returns:
similarity between the two clones, range: 0-100, 0 = no similarity, 100 = clones are semantically equal.

calculateSimilarity

int calculateSimilarity(java.lang.String language,
                        java.lang.String content1,
                        java.lang.String content2)
Simple interface for similarity calculation between two strings.

Parameters:
language - indication of the potential programming language of the given source fragments, never null.
content1 - content of the first clone, never null.
content2 - content of the second clone, never null.
Returns:
similarity between the two clones, range: 0-100, 0 = no similarity, 100 = clones are semantically equal.
See Also:
calculateSimilarity(String, IClone, IClone, boolean)