Package org.spdx.licenseTemplate
Class LicenseTextHelper
- java.lang.Object
-
- org.spdx.licenseTemplate.LicenseTextHelper
-
public class LicenseTextHelper extends Object
Static helper class for comparing license text- Author:
- Gary O'Neall
-
-
Field Summary
Fields Modifier and Type Field Description static Map<String,String>NORMALIZE_TOKENSprotected static Set<String>PUNCTUATIONprotected static Set<String>SKIPPABLE_TOKENSstatic PatternTOKEN_SPLIT_PATTERNprotected static StringTOKEN_SPLIT_REGEX
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static booleancanSkip(String token)Returns true if the token can be ignored per the rulesstatic StringgetTokenAt(String[] tokens, int tokenIndex)Just fetches the string at the index checking for range.static booleanisLicenseTextEquivalent(String licenseTextA, String licenseTextB)Returns true if two sets of license text is considered a match per the SPDX License matching guidelines documented at spdx.org (currently license matching guidelines) There are 2 unimplemented features - bullets/numbering is not considered and comments with no whitespace between text is not skippedstatic StringnormalizeText(String s)Normalize quotes and no-break spacesstatic StringremoveLineSeparators(String s)static StringreplaceMultWord(String s)replaces all multi-words with a single token using a dash to separatestatic StringreplaceSpaceComma(String s)Replace different forms of space with a normalized space and different forms of commas with a normalized commastatic String[]tokenizeLicenseText(String licenseText, Map<Integer,LineColumn> tokenToLocation)Tokenizes the license text, normalizes quotes, lowercases and converts multi-words for better equiv. comparisonsstatic booleantokensEquivalent(String tokenA, String tokenB)Returns true if the two tokens can be considered equivalent per the SPDX license matching rules
-
-
-
Field Detail
-
TOKEN_SPLIT_REGEX
protected static final String TOKEN_SPLIT_REGEX
- See Also:
- Constant Field Values
-
TOKEN_SPLIT_PATTERN
public static final Pattern TOKEN_SPLIT_PATTERN
-
-
Method Detail
-
isLicenseTextEquivalent
public static boolean isLicenseTextEquivalent(String licenseTextA, String licenseTextB)
Returns true if two sets of license text is considered a match per the SPDX License matching guidelines documented at spdx.org (currently license matching guidelines) There are 2 unimplemented features - bullets/numbering is not considered and comments with no whitespace between text is not skipped- Parameters:
licenseTextA- text to comparelicenseTextB- text to compare- Returns:
- true if the license text is equivalent
-
tokenizeLicenseText
public static String[] tokenizeLicenseText(String licenseText, Map<Integer,LineColumn> tokenToLocation)
Tokenizes the license text, normalizes quotes, lowercases and converts multi-words for better equiv. comparisons- Parameters:
tokenToLocation- location for all of the tokenslicenseText- text to tokenize- Returns:
- tokens array of tokens from the licenseText
-
getTokenAt
public static String getTokenAt(String[] tokens, int tokenIndex)
Just fetches the string at the index checking for range. Returns null if index is out of range.- Parameters:
tokens- array of tokenstokenIndex- index of token to retrieve- Returns:
- the token at the index or null if the token does not exist
-
canSkip
public static boolean canSkip(String token)
Returns true if the token can be ignored per the rules- Parameters:
token- token to check- Returns:
- true if the token can be ignored per the rules
-
tokensEquivalent
public static boolean tokensEquivalent(String tokenA, String tokenB)
Returns true if the two tokens can be considered equivalent per the SPDX license matching rules- Parameters:
tokenA- token to comparetokenB- token to compare- Returns:
- true if tokenA is equivalent to tokenB
-
replaceSpaceComma
public static String replaceSpaceComma(String s)
Replace different forms of space with a normalized space and different forms of commas with a normalized comma- Parameters:
s- input string- Returns:
- input string replacing all UTF-8 spaces with " " and all UTF-8 commas with ","
-
replaceMultWord
public static String replaceMultWord(String s)
replaces all multi-words with a single token using a dash to separate- Parameters:
s- input string- Returns:
- input string with all multi-words with a single token using a dash to separate
-
normalizeText
public static String normalizeText(String s)
Normalize quotes and no-break spaces- Parameters:
s- String to normalize- Returns:
- String normalized for comparison
-
-