Base Crawler#

pcapkit.vendor.default contains Vendor only, which is the base meta class for all vendor crawlers.

class pcapkit.vendor.default.Vendor[source]#

Bases: object

Default vendor generator.

Inherit this class with FLAG & LINK attributes, etc., to implement a new vendor generator.

NAME: str#

Name of constant enumeration.

DOCS: str#

Docstring of constant enumeration.

FLAG: str#

Value limit checker.

Link to registry.

count(data)[source]#

Count field records.

Parameters:

data (list[str]) – CSV data.

Return type:

Counter[str]

Returns:

Field recordings.

process(data)[source]#

Process CSV data.

Parameters:

data (list[str]) – CSV data.

Return type:

tuple[list[str], list[str]]

Returns:

Enumeration fields and missing fields.

context(data)[source]#

Generate constant context.

Parameters:

data (list[str]) – CSV data.

Return type:

str

Returns:

Constant context.

static wrap_comment(text)[source]#

Wraps long-length text to shorter lines of comments.

Parameters:

text (str) – Source text.

Return type:

str

Returns:

Wrapped comments.

safe_name(name)[source]#

Convert enumeration name to enum.Enum friendly.

Parameters:

name (str) – original enumeration name

Return type:

str

Returns:

Converted enumeration name.

rename(name, code, *, original=None)[source]#

Rename duplicated fields.

Parameters:
  • name (str) – Field name.

  • code (str) – Field code.

  • original (Optional[str]) – Original field name (extracted from CSV records).

Return type:

str

Returns:

Revised field name.

Example

If name has multiple occurrences in the source registry, the field name will be sanitised as ${name}_${code}.

Otherwise, the plain name will be returned.

request(text=None)[source]#

Fetch CSV file.

Parameters:

text (Optional[str]) – Context from LINK.

Return type:

list[str]

Returns:

CSV data.

_request()[source]#

Fetch CSV data from LINK.

This is the low-level call of request().

If LINK is None, it will directly call the upper method request() with NO arguments.

The method will first try to GET the content of LINK. Should any exception raised, it will first try with proxy settings from get_proxies().

Note

Since some LINK links are from Wikipedia, etc., they might not be available in certain areas, e.g. the amazing PRC :)

Would proxies failed again, it will prompt for user intervention, i.e. it will use webbrowser.open() to open the page in browser for you, and you can manually load that page and save the HTML source at the location it provides.

Return type:

list[str]

Returns:

CSV data.

Warns:

VendorRequestWarning – If connection failed with and/or without proxies.

See also

request()

Internal Definitions#

class pcapkit.vendor.default.VendorMeta(name, bases, namespace, /, **kwargs)[source]#

Bases: ABCMeta

Meta class to add dynamic support to Vendor.

This meta class is used to generate necessary attributes for the Vendor class. It can be useful to reduce unnecessary registry calls and simplify the customisation process.

pcapkit.vendor.default.LINE(NAME, DOCS, FLAG, ENUM, MISS, MODL)#

Default constant template of enumeration registry from IANA CSV.

Parameters:
  • NAME (str) – name of the constant enumeration class

  • DOCS (str) – docstring for the constant enumeration class

  • FLAG (str) – threshold value validator (range of valid values)

  • ENUM (str) – enumeration data (class attributes)

  • MISS (str) – missing value handler (default value)

  • MODL (str) – module name of the constant enumeration class

Return type:

str

pcapkit.vendor.default.get_proxies()[source]#

Get proxy for blocked sites.

The function will read PCAPKIT_HTTP_PROXY and PCAPKIT_HTTPS_PROXY, if any, for the proxy settings of requests.

Return type:

dict[str, str]

Returns:

Proxy settings for requests.