This page lists all components that have been made available as part of the Penelope platform. For more general information on Penelope components and examples of how to use them, see the Getting Started page. For contributing a component, see the Contributing page.

Spacy Natural Language Processing Tools

Contributed by EHAI – Vrije Universiteit Brussel.

This component provides access to a wide variety of natural language processing (NLP) tools:

  • Tokenization
  • Lemmatization
  • Noun chunking
  • Part-of-speech tagging
  • Named entity recognition
  • Dependency parsing
  • Word embeddings
  • Sentencization

With each API call, these tools can analyse either a single sentence or an array of text documents (for faster performance). The tools are available in 6 languages (en, de, es, pt, it, nl, fr). The NLP tools rely on the Spacy Python library.

The OpenAPI specification is available at or can be downloaded in JSON format here.

Multidimensional Outlier Explorer

Contributed by the Complex Networks team at LIP6 (UPMC / CNRS).

This component allows for the exploration of multidimensional datasets and for the detection of statistical outliers within. Hence, it is mainly a tool for data exploration allowing to have a first glance at the data and to formulate research hypotheses to be later tested.

The component takes as input a list of numeric observations each described according to several categorical dimensions. For example, in the case of Twitter data, it can be the number of tweets (numeric observation) that have been published by a given user (first dimension) about a given topic (second dimension) at a given date (third dimension). The input data hence takes the form of a list of quadruplets (user, topic, date, number of tweets) in a JSON format. Statistical outliers are then identified by first selecting some dimensions of interest, that is by subsetting or by aggregating the input dimensions. If needed, observations can also be normalised according to the marginal values along the selected dimensions, thus comparing the observed value to an expected value obtained by the uniform redistribution of the selected marginal values. Different statistical tests can then be chosen to measure the deviation between the observed and the expected values. The component finally returns a list of positive outliers in a JSON format, that is observations that are significantly higher than expected.

The OpenAPI specification is available at

Semantic Frame Extractor

Contributed by EHAI – Vrije Universiteit Brussel.

Frame semantics is commonly used as a methodology for representing the meaning of linguistic utterances. While semantic frames have successfully been formalised on a large scale, it is still a major challenge to automatically extract them from raw text. This Penelope component overcomes this challenge by using precision language processing techniques. Concretely, the component takes a sentence (or a list of texts) and a frame of interest (e.g. ‘Causation’) as input and returns all instances of this frame, and its frame elements, that occur in the sentence (or list of texts). The language processing part of the semantic frame extractor has been developed within the Fluid Construction Grammar (FCG) framework.

The OpenAPI specification is available at

Language Innovation Tracker

Contributed by EHAI – Vrije Universiteit Brussel.

This component allows extracting neologisms from texts, which is particularly helpful to gain insight into discussions on platforms such as 4chan, where language is highly innovative and new words are often used to convey a highly non-neutral meaning, and to distinguish between the in-group and the out-group. It also has visualization functionalities, including word clouds and tracking the popularity of neologisms through time.

The OpenAPI specification is available at

Stream Graph Analyser

Coming in June 2019. Contributed by the Complex Networks team at LIP6 (UPMC / CNRS).

This component allows for the modelling of interaction networks and dynamical graphs from the Stream Graph formalism, that is the representation of graphs whose nodes and links may appear and disappear through time. Various metrics and analytical tools are made available through this service, ranging from simple degree distributions (over time) to more sophisticated computation such as the detection of temporal cliques, communities, and time-preserving paths. This component is hence designed for the analysis of the dynamics of social interactions, such as communication and information exchange in social media.

The formal specification of such temporal graphs is based on the definition given by Matthieu Latapy et al. in [1]. There, a Stream Graph consists in four components: a) the set of nodes belonging to the graph, b) the set of time intervals representing the graph’s lifespan, c) a Node Stream, that is the set of nodes and time intervals representing the presence of nodes, and d) a Link Stream, that is the set of node pairs and time intervals representing the presence of links.

With this component, users are first able to upload temporal interaction data as Stream Graphs through JSON files, then to process the data with various analytical methods: subsetting functions, density measures, clique and path computation, and so on. The creation of custom analysis pipelines from method chaining is also possible. Visualization tools are also available in the form of SVG pictures.

The OpenAPI specification is available at