Package org.spdx.licenseTemplate
Class LicenseTextHelper
- java.lang.Object
-
- org.spdx.licenseTemplate.LicenseTextHelper
-
public class LicenseTextHelper extends Object
Static helper class for comparing license text- Author:
- Gary O'Neall
-
-
Field Summary
Fields Modifier and Type Field Description static Map<String,String>
NORMALIZE_TOKENS
protected static Set<String>
PUNCTUATION
protected static Set<String>
SKIPPABLE_TOKENS
static Pattern
TOKEN_SPLIT_PATTERN
protected static String
TOKEN_SPLIT_REGEX
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
canSkip(String token)
Returns true if the token can be ignored per the rulesstatic String
getTokenAt(String[] tokens, int tokenIndex)
Just fetches the string at the index checking for range.static boolean
isLicenseTextEquivalent(String licenseTextA, String licenseTextB)
Returns true if two sets of license text is considered a match per the SPDX License matching guidelines documented at spdx.org (currently license matching guidelines) There are 2 unimplemented features - bullets/numbering is not considered and comments with no whitespace between text is not skippedstatic String
normalizeText(String s)
Normalize quotes and no-break spacesstatic String
removeLineSeparators(String s)
static String
replaceMultWord(String s)
replaces all multi-words with a single token using a dash to separatestatic String
replaceSpaceComma(String s)
Replace different forms of space with a normalized space and different forms of commas with a normalized commastatic String[]
tokenizeLicenseText(String licenseText, Map<Integer,LineColumn> tokenToLocation)
Tokenizes the license text, normalizes quotes, lowercases and converts multi-words for better equiv.static boolean
tokensEquivalent(String tokenA, String tokenB)
Returns true if the two tokens can be considered equivalent per the SPDX license matching rules
-
-
-
Field Detail
-
TOKEN_SPLIT_REGEX
protected static final String TOKEN_SPLIT_REGEX
- See Also:
- Constant Field Values
-
TOKEN_SPLIT_PATTERN
public static final Pattern TOKEN_SPLIT_PATTERN
-
-
Method Detail
-
isLicenseTextEquivalent
public static boolean isLicenseTextEquivalent(String licenseTextA, String licenseTextB)
Returns true if two sets of license text is considered a match per the SPDX License matching guidelines documented at spdx.org (currently license matching guidelines) There are 2 unimplemented features - bullets/numbering is not considered and comments with no whitespace between text is not skipped- Parameters:
licenseTextA
- text to comparelicenseTextB
- text to compare- Returns:
- true if the license text is equivalent
-
tokenizeLicenseText
public static String[] tokenizeLicenseText(String licenseText, Map<Integer,LineColumn> tokenToLocation)
Tokenizes the license text, normalizes quotes, lowercases and converts multi-words for better equiv. comparisons- Parameters:
tokenToLocation
- location for all of the tokenslicenseText
- text to tokenize- Returns:
- tokens array of tokens from the licenseText
-
getTokenAt
public static String getTokenAt(String[] tokens, int tokenIndex)
Just fetches the string at the index checking for range. Returns null if index is out of range.- Parameters:
tokens
- array of tokenstokenIndex
- index of token to retrieve- Returns:
- the token at the index or null if the token does not exist
-
canSkip
public static boolean canSkip(String token)
Returns true if the token can be ignored per the rules- Parameters:
token
- token to check- Returns:
- true if the token can be ignored per the rules
-
tokensEquivalent
public static boolean tokensEquivalent(String tokenA, String tokenB)
Returns true if the two tokens can be considered equivalent per the SPDX license matching rules- Parameters:
tokenA
- token to comparetokenB
- token to compare- Returns:
- true if tokenA is equivalent to tokenB
-
replaceSpaceComma
public static String replaceSpaceComma(String s)
Replace different forms of space with a normalized space and different forms of commas with a normalized comma- Parameters:
s
- input string- Returns:
- input string replacing all UTF-8 spaces with " " and all UTF-8 commas with ","
-
replaceMultWord
public static String replaceMultWord(String s)
replaces all multi-words with a single token using a dash to separate- Parameters:
s
- input string- Returns:
- input string with all multi-words with a single token using a dash to separate
-
normalizeText
public static String normalizeText(String s)
Normalize quotes and no-break spaces- Parameters:
s
- String to normalize- Returns:
- String normalized for comparison
-
-