Some issues related with Unified State Examination in Informatics in Russian Federation: November 2022

Monday, November 7, 2022

Ripser.py 0.6.4 Representative Cocycles ( Matplotlib 3.6.2)

See https://ripser.scikit-tda.org/en/latest/notebooks/Representative%20Cocycles.html

Версия risperCohomology.py для Matplotlib 3.6.2.

Fixed code lines highlited blue

(.env) boris@UbuntuLTS:~/RIPSER$ cat risperCohomology642.py

import numpy as np

import matplotlib.pyplot as plt

from ripser import ripser

from persim import plot_diagrams

import tadasets

def drawLineColored(X, C):

for i in range(X.shape[0]-1):

lines=plt.plot(X[i:i+2, 0], X[i:i+2, 1], c=C[i, :])

plt.setp(lines, color='black', linewidth=2.0)

def plotCocycle2D(D, X, cocycle, thresh):

"""

Given a 2D point cloud X, display a cocycle projected

onto edges under a given threshold "thresh"

"""

#Plot all edges under the threshold

N = X.shape[0]

t = np.linspace(0, 1, 10)

c = plt.get_cmap('Greys')

C = c(np.array(np.round(np.linspace(0, 255, len(t))), dtype=np.int32))

C = C[:, 0:3]

for i in range(N):

for j in range(N):

if D[i, j] <= thresh:

Y = np.zeros((len(t), 2))

Y[:, 0] = X[i, 0] + t*(X[j, 0] - X[i, 0])

Y[:, 1] = X[i, 1] + t*(X[j, 1] - X[i, 1])

drawLineColored(Y, C)

#Plot cocycle projected to edges under the chosen threshold

for k in range(cocycle.shape[0]):

[i, j, val] = cocycle[k, :]

if D[i, j] <= thresh:

[i, j] = [min(i, j), max(i, j)]

a = 0.5*(X[i, :] + X[j, :])

plt.text(a[0], a[1], '%g'%val, color='b')

#Plot vertex labels

for i in range(N):

plt.text(X[i, 0], X[i, 1], '%i'%i, color='r')

plt.axis('equal')

np.random.seed(9)

x = tadasets.dsphere(n=12, d=1, noise=0.1)

plt.scatter(x[:, 0], x[:, 1])

plt.axis('equal')

plt.show()

result = ripser(x, coeff=17, do_cocycles=True)

diagrams = result['dgms']

cocycles = result['cocycles']

D = result['dperm2all']

dgm1 = diagrams[1]

idx = np.argmax(dgm1[:, 1] - dgm1[:, 0])

plot_diagrams(diagrams, show = False)

plt.scatter(dgm1[idx, 0], dgm1[idx, 1], 20, 'k', 'x')

plt.title("Max 1D birth = %.3g, death = %.3g"%(dgm1[idx, 0], dgm1[idx, 1]))

plt.show()

cocycle = cocycles[1][idx]

thresh = dgm1[idx, 1] #Project cocycle onto edges less than or equal to death time

plotCocycle2D(D, x, cocycle, thresh)

plt.title("1-Form Thresh=%g"%thresh)

plt.show()

thresh = dgm1[idx, 1]-0.00001 #Project cocycle onto edges less slightly less than the death time

plotCocycle2D(D, x, cocycle, thresh)

plt.title("1-Form Thresh=%g"%thresh)

plt.show()

thresh = dgm1[idx, 0] #Project cocycle onto edges that have lengths less than or equal to the birth time

plotCocycle2D(D, x, cocycle, thresh)

plt.title("1-Form Thresh=%g"%thresh)

plt.show()

REFRENCES

1. https://informatics-ege.blogspot.com/2022/11/ripser-demonstration.html

Sunday, November 6, 2022

Ripser demonstration

Ripser.py — это компактный постоянный пакет гомологии для Python. Основанный на невероятно быстром пакете C++ Ripser в качестве основного вычислительного механизма, Ripser.py предоставляет интуитивно понятный интерфейс для вычисление когомологий персистентности разреженных и плотных наборов данных, визуализация диаграмм постоянства,вычисление фильтрации нижних звезд на изображениях и вычисление репрезентативных коцепей.

============================

В математике, особенно в теории гомологии и алгебраической топологии, когомологии — это общий термин для последовательности абелевых групп, обычно связанной с топологическим пространством, часто определяемым из коцепного комплекса. Когомологии можно рассматривать как метод присвоения пространству более богатых алгебраических инвариантов, чем гомологии. Некоторые версии когомологий возникают в результате дуализации построения гомологии. Другими словами, коцепи — это функции на группе цепей в теории гомологии.

Зародившись в топологии, эта идея стала господствующим методом в математике второй половины двадцатого века. От первоначального представления о гомологии как о методе построения алгебраических инвариантов топологических пространств диапазон приложений теорий гомологии и когомологий распространился на всю геометрию и алгебру. Терминология имеет тенденцию скрывать тот факт, что когомологии, контравариантная теория, более естественны, чем гомологии во многих приложениях. На базовом уровне это имеет отношение к функциям и обратным образам в геометрических ситуациях: для заданных пространств X и Y и некоторой функции F на Y для любого отображения f : X → Y композиция с f порождает функцию F ∘ f на X. У наиболее важных теорий когомологий есть продукт, чашечный продукт, который придает им кольцевую структуру. Из-за этой особенности когомологии обычно являются более сильным инвариантом, чем гомологии.

===================================

Ripser.py — это эволюция исходного проекта C++ Ripser. Мы проделали большую работу, чтобы сделать пакет доступным для разработчиков Python на всех основных платформах. Если у вас возникли проблемы с установкой, сообщите нам об этом, открыв проблему на github.

Вы можете найти исходный код на github по адресу Scikit-TDA/Ripser.py. Исходную библиотеку C++ см. в разделе Ripser/ripser.

(.env) boris@UbuntuLTS:~/RIPSER$ cat risperPlot3.py

from ripser import ripser

from persim import plot_diagrams

import matplotlib.pyplot as plt

import numpy as np

from sklearn import datasets

data = datasets.make_circles(n_samples=100)[0] + 5 * datasets.make_circles(n_samples=100)[0]

dgms = ripser(data)['dgms']

plot_diagrams(dgms, show=True)

"""

plot_diagrams(dgms, plot_only=[0], ax=plt.subplot(121))

plot_diagrams(dgms, plot_only=[1], ax=plt.subplot(122))

plot_diagrams(dgms, show=True)

"""

dgms = ripser(data, thresh=0.2)['dgms']

plot_diagrams(dgms, show=True)

dgms = ripser(data, thresh=1)['dgms']

plot_diagrams(dgms, show=True)

dgms = ripser(data, thresh=999)['dgms']

plot_diagrams(dgms, show=True)

(.env) boris@UbuntuLTS:~/RIPSER$ python3 risperPlot3.py

Ripser основан на когомологиях и возвращает репрезентативные коциклы для каждого генератора класса когомологий, возвращенного из алгоритма постоянных когомологий.

Напомним, что когомологии двойственны гомологиям, а кограничный оператор — это сопряженный оператор граничного оператора; то есть кограничный оператор принимает формы. Например, кограничный оператор переводит 0-формы (скалярные функции на вершинах) в 1-формы на векторном пространстве (функции на ориентированных ребрах) и переводит 1-формы в 2-формы (функции на ориентированных треугольниках). d-мерный коцикл — это -форма, кограница которой равна нулю. Как и в случае гомологии, двукратное применение кограничного оператора дает ноль; , поэтому образ находится в ядре , и мы можем взять частное, чтобы получить d-ю группу комологий. Конкретный класс эквивалентности -форм в этой группе, эквивалентный по модулю, называется классом когомологий. Алгоритм персистентных когомологий вычисляет набор генераторов класса когомологий, которые генерируют группу, и чьи рождения и смерти представлены на диаграмме персистентности (ПРИМЕЧАНИЕ: когомологическое рождение на самом деле является гомологической смертью, и наоборот, как мы увидим в нашем примере ниже, но мы все еще используем соглашение о рождении/смерти гомологии при построении диаграмм). Алгоритм персистентности возвращает репрезентативный коцикл для каждого класса генератора, который можно извлечь из рипсера и который мы сейчас исследуем на простом примере.

(.env) boris@UbuntuLTS:~/RIPSER$ pip install --force-reinstall matplotlib==3.4.2

============================================

UPDATE as of 7/11/2022

See Ripser.py 0.6.4 Representative Cocycles ( Matplotlib 3.6.2)

Version risperCohomology.py running with Matplotlib 3.6.2.

Code fixed for version 3.6.2.

============================================

(.env) boris@UbuntuLTS:~/RIPSER$ cat risperCohomology.py

import numpy as np

import matplotlib.pyplot as plt

from ripser import ripser

from persim import plot_diagrams

import tadasets

def drawLineColored(X, C):

for i in range(X.shape[0]-1):

plt.plot(X[i:i+2, 0], X[i:i+2, 1], c=C[i, :], lineWidth = 3)

def plotCocycle2D(D, X, cocycle, thresh):

"""

Given a 2D point cloud X, display a cocycle projected

onto edges under a given threshold "thresh"

"""

#Plot all edges under the threshold

N = X.shape[0]

t = np.linspace(0, 1, 10)

c = plt.get_cmap('Greys')

C = c(np.array(np.round(np.linspace(0, 255, len(t))), dtype=np.int32))

C = C[:, 0:3]

for i in range(N):

for j in range(N):

if D[i, j] <= thresh:

Y = np.zeros((len(t), 2))

Y[:, 0] = X[i, 0] + t*(X[j, 0] - X[i, 0])

Y[:, 1] = X[i, 1] + t*(X[j, 1] - X[i, 1])

drawLineColored(Y, C)

#Plot cocycle projected to edges under the chosen threshold

for k in range(cocycle.shape[0]):

[i, j, val] = cocycle[k, :]

if D[i, j] <= thresh:

[i, j] = [min(i, j), max(i, j)]

a = 0.5*(X[i, :] + X[j, :])

plt.text(a[0], a[1], '%g'%val, color='b')

#Plot vertex labels

for i in range(N):

plt.text(X[i, 0], X[i, 1], '%i'%i, color='r')

plt.axis('equal')

np.random.seed(9)

x = tadasets.dsphere(n=12, d=1, noise=0.1)

plt.scatter(x[:, 0], x[:, 1])

plt.axis('equal')

plt.show()

result = ripser(x, coeff=17, do_cocycles=True)

diagrams = result['dgms']

cocycles = result['cocycles']

D = result['dperm2all']

dgm1 = diagrams[1]

idx = np.argmax(dgm1[:, 1] - dgm1[:, 0])

plot_diagrams(diagrams, show = False)

plt.scatter(dgm1[idx, 0], dgm1[idx, 1], 20, 'k', 'x')

plt.title("Max 1D birth = %.3g, death = %.3g"%(dgm1[idx, 0], dgm1[idx, 1]))

plt.show()

cocycle = cocycles[1][idx]

thresh = dgm1[idx, 1] #Project cocycle onto edges less than or equal to death time

plotCocycle2D(D, x, cocycle, thresh)

plt.title("1-Form Thresh=%g"%thresh)

plt.show()

thresh = dgm1[idx, 1]-0.00001 #Project cocycle onto edges less slightly less than the death time

plotCocycle2D(D, x, cocycle, thresh)

plt.title("1-Form Thresh=%g"%thresh)

plt.show()

thresh = dgm1[idx, 0] #Project cocycle onto edges that have lengths less than or equal to the birth time

plotCocycle2D(D, x, cocycle, thresh)

plt.title("1-Form Thresh=%g"%thresh)

plt.show()

REFERENCES

1. https://ripser.scikit-tda.org/en/latest/notebooks/Basic%20Usage.html

2. https://ripser.scikit-tda.org/en/latest/notebooks/Representative%20Cocycles.html

Saturday, November 5, 2022

KeplerMapper & NLP examples

(.env) boris@UbuntuLTS:~/TDANALYSIS$ cat Newsgroups20.py

import kmapper as km

from kmapper import Cover, jupyter

import numpy as np

from sklearn.datasets import fetch_20newsgroups

from sklearn.cluster import AgglomerativeClustering

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.decomposition import TruncatedSVD

from sklearn.manifold import Isomap

from sklearn.preprocessing import MinMaxScaler

newsgroups = fetch_20newsgroups(subset='train')

X, y, target_names = np.array(newsgroups.data), np.array(newsgroups.target), np.array(newsgroups.target_names)

print("SAMPLE",X[0])

print("SHAPE",X.shape)

print("TARGET",target_names[y[0]])

mapper = km.KeplerMapper(verbose=2)

projected_X = mapper.fit_transform(X,

projection=[TfidfVectorizer(analyzer="char",

ngram_range=(1,6),

max_df=0.83,

min_df=0.05),

TruncatedSVD(n_components=100,

random_state=1729),

Isomap(n_components=2,

n_jobs=-1)],

scaler=[None, None, MinMaxScaler()])

print("SHAPE",projected_X.shape)

from sklearn import cluster

graph = mapper.map(projected_X,

X=None,

clusterer=cluster.AgglomerativeClustering(n_clusters=3,

linkage="complete",

affinity="cosine"),

cover=Cover(perc_overlap=0.33))

vec = TfidfVectorizer(analyzer="word",

strip_accents="unicode",

stop_words="english",

ngram_range=(1,3),

max_df=0.97,

min_df=0.02)

interpretable_inverse_X = vec.fit_transform(X).toarray()

interpretable_inverse_X_names = vec.get_feature_names()

print("SHAPE", interpretable_inverse_X.shape)

print("FEATURE NAMES SAMPLE", interpretable_inverse_X_names[:400])

_ = mapper.visualize(graph,

X=interpretable_inverse_X,

X_names=interpretable_inverse_X_names,

path_html="output/newsgroups20.html",

lens=projected_X,

lens_names=["ISOMAP1", "ISOMAP2"],

title="Newsgroups20: Latent Semantic Char-gram Analysis with Isometric Embedding",

custom_tooltips=np.array([target_names[ys] for ys in y]),

color_values=y,

color_function_name='target')

(.env) boris@UbuntuLTS:~/SCIKIT-TDA$ python3 Newsgroups20.py

SAMPLE From: lerxst@wam.umd.edu (where's my thing)

Subject: WHAT car is this!?

Nntp-Posting-Host: rac3.wam.umd.edu

Organization: University of Maryland, College Park

Lines: 15

I was wondering if anyone out there could enlighten me on this car I saw

the other day. It was a 2-door sports car, looked to be from the late 60s/

early 70s. It was called a Bricklin. The doors were really small. In addition,

the front bumper was separate from the rest of the body. This is

all I know. If anyone can tellme a model name, engine specs, years

of production, where this car is made, history, or whatever info you

have on this funky looking car, please e-mail.

Thanks,

- IL

---- brought to you by your neighborhood Lerxst ----

SHAPE (11314,)

TARGET rec.autos

KeplerMapper(verbose=2)

..Composing projection pipeline of length 3:

Projections: TfidfVectorizer(analyzer='char', max_df=0.83, min_df=0.05, ngram_range=(1, 6))

TruncatedSVD(n_components=100, random_state=1729)

Isomap(n_jobs=-1)

Distance matrices: False

False

Scalers: None

None

MinMaxScaler()

..Projecting on data shaped (11314,)

..Projecting data using:

TfidfVectorizer(analyzer='char', max_df=0.83, min_df=0.05, ngram_range=(1, 6))

..Created projection shaped (11314, 13967)

..Projecting on data shaped (11314, 13967)

..Projecting data using:

TruncatedSVD(n_components=100, random_state=1729)

..Projecting on data shaped (11314, 100)

..Projecting data using:

Isomap(n_jobs=-1)

/home/boris/SCIKIT-TDA/.env/lib/python3.10/site-packages/sklearn/manifold/_isomap.py:348: UserWarning: The number of connected components of the neighbors graph is 2 > 1. Completing the graph to fit Isomap might be slow. Increase the number of neighbors to avoid this issue.

self._fit_transform(X)

/home/boris/SCIKIT-TDA/.env/lib/python3.10/site-packages/scipy/sparse/_index.py:103: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

self._set_intXint(row, col, x.flat[0])

..Scaling with: MinMaxScaler()

SHAPE (11314, 2)

Mapping on data shaped (11314, 2) using lens shaped (11314, 2)

Minimal points in hypercube before clustering: 3

Creating 100 hypercubes.

> Found 3 clusters in hypercube 0.

> Found 3 clusters in hypercube 1.

> Found 3 clusters in hypercube 2.

> Found 3 clusters in hypercube 3.

Cube_4 is empty.

> Found 3 clusters in hypercube 5.

> Found 3 clusters in hypercube 6.

> Found 3 clusters in hypercube 7.

> Found 3 clusters in hypercube 8.

> Found 3 clusters in hypercube 9.

> Found 3 clusters in hypercube 10.

> Found 3 clusters in hypercube 11.

> Found 3 clusters in hypercube 12.

> Found 3 clusters in hypercube 13.

> Found 3 clusters in hypercube 14.

> Found 3 clusters in hypercube 15.

> Found 3 clusters in hypercube 16.

> Found 3 clusters in hypercube 17.

> Found 3 clusters in hypercube 18.

> Found 3 clusters in hypercube 19.

> Found 3 clusters in hypercube 20.

> Found 3 clusters in hypercube 21.

> Found 3 clusters in hypercube 22.

> Found 3 clusters in hypercube 23.

> Found 3 clusters in hypercube 24.

> Found 3 clusters in hypercube 25.

> Found 3 clusters in hypercube 26.

> Found 3 clusters in hypercube 27.

> Found 3 clusters in hypercube 28.

> Found 3 clusters in hypercube 29.

> Found 3 clusters in hypercube 30.

> Found 3 clusters in hypercube 31.

> Found 3 clusters in hypercube 32.

> Found 3 clusters in hypercube 33.

> Found 3 clusters in hypercube 34.

> Found 3 clusters in hypercube 35.

> Found 3 clusters in hypercube 36.

> Found 3 clusters in hypercube 37.

> Found 3 clusters in hypercube 38.

> Found 3 clusters in hypercube 39.

> Found 3 clusters in hypercube 40.

> Found 3 clusters in hypercube 41.

> Found 3 clusters in hypercube 42.

> Found 3 clusters in hypercube 43.

> Found 3 clusters in hypercube 44.

> Found 3 clusters in hypercube 45.

> Found 3 clusters in hypercube 46.

> Found 3 clusters in hypercube 47.

> Found 3 clusters in hypercube 48.

> Found 3 clusters in hypercube 49.

> Found 3 clusters in hypercube 50.

> Found 3 clusters in hypercube 51.

> Found 3 clusters in hypercube 52.

> Found 3 clusters in hypercube 53.

> Found 3 clusters in hypercube 54.

> Found 3 clusters in hypercube 55.

> Found 3 clusters in hypercube 56.

> Found 3 clusters in hypercube 57.

> Found 3 clusters in hypercube 58.

> Found 3 clusters in hypercube 59.

> Found 3 clusters in hypercube 60.

> Found 3 clusters in hypercube 61.

> Found 3 clusters in hypercube 62.

> Found 3 clusters in hypercube 63.

> Found 3 clusters in hypercube 64.

> Found 3 clusters in hypercube 65.

> Found 3 clusters in hypercube 66.

> Found 3 clusters in hypercube 67.

> Found 3 clusters in hypercube 68.

> Found 3 clusters in hypercube 69.

> Found 3 clusters in hypercube 70.

> Found 3 clusters in hypercube 71.

> Found 3 clusters in hypercube 72.

> Found 3 clusters in hypercube 73.

> Found 3 clusters in hypercube 74.

> Found 3 clusters in hypercube 75.

> Found 3 clusters in hypercube 76.

Cube_77 is empty.

Cube_78 is empty.

> Found 3 clusters in hypercube 79.

> Found 3 clusters in hypercube 80.

> Found 3 clusters in hypercube 81.

> Found 3 clusters in hypercube 82.

> Found 3 clusters in hypercube 83.

Created 618 edges and 243 nodes in 0:00:01.169844.

/home/boris/SCIKIT-TDA/.env/lib/python3.10/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.

warnings.warn(msg, category=FutureWarning)

SHAPE (11314, 947)

FEATURE NAMES SAMPLE ['00', '000', '10', '100', '11', '12', '13', '14', '15', '16', '17', '18', '19', '1992', '1993', '1993apr15', '20', '200', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '408', '41', '42', '43', '44', '45', '49', '50', '500', '60', '70', '80', '90', '92', '93', 'able', 'ac', 'ac uk', 'accept', 'access', 'according', 'acs', 'act', 'action', 'actually', 'add', 'address', 'advance', 'advice', 'ago', 'agree', 'air', 'al', 'allow', 'allowed', 'america', 'american', 'andrew', 'answer', 'anti', 'anybody', 'apparently', 'appears', 'apple', 'application', 'apply', 'appreciate', 'appreciated', 'apr', 'apr 1993', 'apr 93', 'april', 'area', 'aren', 'argument', 'article', 'article 1993apr15', 'ask', 'asked', 'asking', 'assume', 'att', 'att com', 'au', 'available', 'average', 'avoid', 'away', 'bad', 'base', 'baseball', 'based', 'basic', 'basically', 'basis', 'bbs', 'believe', 'best', 'better', 'bible', 'big', 'bike', 'bit', 'bitnet', 'black', 'blue', 'board', 'bob', 'body', 'book', 'books', 'bought', 'box', 'break', 'brian', 'bring', 'brought', 'btw', 'build', 'building', 'built', 'bus', 'business', 'buy', 'ca', 'ca lines', 'california', 'called', 'came', 'canada', 'car', 'card', 'cards', 'care', 'carry', 'cars', 'case', 'cases', 'cause', 'cc', 'center', 'certain', 'certainly', 'chance', 'change', 'changed', 'cheap', 'check', 'chicago', 'children', 'chip', 'choice', 'chris', 'christ', 'christian', 'christians', 'church', 'city', 'claim', 'claims', 'class', 'clear', 'clearly', 'cleveland', 'clinton', 'clipper', 'close', 'cmu', 'cmu edu', 'code', 'college', 'color', 'colorado', 'com', 'com organization', 'com writes', 'come', 'comes', 'coming', 'comment', 'comments', 'common', 'communications', 'comp', 'company', 'complete', 'completely', 'computer', 'computer science', 'computing', 'condition', 'consider', 'considered', 'contact', 'continue', 'control', 'copy', 'corp', 'corporation', 'correct', 'cost', 'couldn', 'country', 'couple', 'course', 'court', 'cover', 'create', 'created', 'crime', 'cs', 'cso', 'cso uiuc', 'cso uiuc edu', 'cup', 'current', 'currently', 'cut', 'cwru', 'cwru edu', 'data', 'date', 'dave', 'david', 'day', 'days', 'dead', 'deal', 'death', 'decided', 'defense', 'deleted', 'department', 'dept', 'design', 'designed', 'details', 'development', 'device', 'did', 'didn', 'die', 'difference', 'different', 'difficult', 'directly', 'disclaimer', 'discussion', 'disk', 'display', 'distribution', 'distribution na', 'distribution na lines', 'distribution usa', 'distribution usa lines', 'distribution world', 'distribution world nntp', 'distribution world organization', 'division', 'dod', 'does', 'does know', 'doesn', 'doing', 'don', 'don know', 'don think', 'don want', 'dos', 'doubt', 'dr', 'drive', 'driver', 'drivers', 'early', 'earth', 'easily', 'east', 'easy', 'ed', 'edu', 'edu article', 'edu au', 'edu david', 'edu organization', 'edu organization university', 'edu reply', 'edu subject', 'edu writes', 'effect', 'email', 'encryption', 'end', 'engineering', 'entire', 'error', 'especially', 'evidence', 'exactly', 'example', 'excellent', 'exist', 'exists', 'expect', 'experience', 'explain', 'expressed', 'extra', 'face', 'fact', 'faith', 'family', 'fan', 'faq', 'far', 'fast', 'faster', 'fax', 'federal', 'feel', 'figure', 'file', 'files', 'final', 'finally', 'fine', 'folks', 'follow', 'following', 'force', 'forget', 'form', 'frank', 'free', 'friend', 'ftp', 'future', 'game', 'games', 'gave', 'general', 'generally', 'germany', 'gets', 'getting', 'given', 'gives', 'giving', 'gmt', 'god', 'goes', 'going', 'gone', 'good', 'got', 'gov', 'government', 'graphics', 'great', 'greatly', 'ground', 'group', 'groups', 'guess', 'gun', 'guns', 'guy', 'half', 'hand', 'happen', 'happened', 'happens', 'happy', 'hard', 'hardware', 'haven', 'having', 'head', 'hear', 'heard', 'heart', 'hell']

Wrote visualization to: output/newsgroups20.html

Another sample

(.env) boris@UbuntuLTS:~/TDANALYSIS$ cat plot_cat.py

"""

3D Cat Data

============

This example generates a Mapper built from a point-cloud sampled from a 3D model of a cat.

`Visualization of the cat mapper <../../_static/cat.html>`_

"""

import numpy as np

import sklearn

import kmapper as km

data = np.genfromtxt("./cat-reference.csv", delimiter=",")

mapper = km.KeplerMapper(verbose=2)

lens = mapper.fit_transform(data)

graph = mapper.map(

lens,

data,

clusterer=sklearn.cluster.DBSCAN(eps=0.1, min_samples=5),

cover=km.Cover(n_cubes=15, perc_overlap=0.2),

)

mapper.visualize(graph, path_html="output/cat.html")

km.draw_matplotlib(graph)

import matplotlib.pyplot as plt

plt.show()

(.env) boris@UbuntuLTS:~/TDANALYSIS$ python3 plot_cat.py

KeplerMapper(verbose=2)

..Composing projection pipeline of length 1:

Projections: sum

Distance matrices: False

Scalers: MinMaxScaler()

..Projecting on data shaped (7207, 3)

..Projecting data using: sum

..Scaling with: MinMaxScaler()

Mapping on data shaped (7207, 3) using lens shaped (7207, 1)

Minimal points in hypercube before clustering: 5

Creating 15 hypercubes.

> Found 2 clusters in hypercube 0.

> Found 2 clusters in hypercube 1.

> Found 2 clusters in hypercube 2.

> Found 1 clusters in hypercube 3.

> Found 2 clusters in hypercube 4.

> Found 2 clusters in hypercube 5.

> Found 1 clusters in hypercube 6.

> Found 1 clusters in hypercube 7.

> Found 1 clusters in hypercube 8.

> Found 1 clusters in hypercube 9.

> Found 1 clusters in hypercube 10.

> Found 1 clusters in hypercube 11.

> Found 1 clusters in hypercube 12.

> Found 1 clusters in hypercube 13.

> Found 1 clusters in hypercube 14.

Created 19 edges and 20 nodes in 0:00:00.078696.

Wrote visualization to: output/cat.html

To be able reproduce any sample at https://kepler-mapper.scikit-tda.org/en/latest/examples.html

(.env) boris@UbuntuLTS:~/SCIKIT-TDA$ pip install --force-reinstall numpy==1.21.5

(.env) boris@UbuntuLTS:~/SCIKIT-TDA$ python3 plot_horse.py

KeplerMapper(verbose=2)

..Composing projection pipeline of length 1:

Projections: sum

Distance matrices: False

Scalers: MinMaxScaler()

..Projecting on data shaped (8431, 3)

..Projecting data using: sum

..Scaling with: MinMaxScaler()

Mapping on data shaped (8431, 3) using lens shaped (8431, 1)

Minimal points in hypercube before clustering: 5

Creating 30 hypercubes.

> Found 1 clusters in hypercube 0.

> Found 2 clusters in hypercube 1.

> Found 3 clusters in hypercube 2.

> Found 3 clusters in hypercube 3.

> Found 3 clusters in hypercube 4.

> Found 3 clusters in hypercube 5.

> Found 3 clusters in hypercube 6.

> Found 1 clusters in hypercube 7.

> Found 2 clusters in hypercube 8.

> Found 2 clusters in hypercube 9.

> Found 2 clusters in hypercube 10.

> Found 3 clusters in hypercube 11.

> Found 2 clusters in hypercube 12.

> Found 2 clusters in hypercube 13.

> Found 1 clusters in hypercube 14.

> Found 1 clusters in hypercube 15.

> Found 1 clusters in hypercube 16.

> Found 1 clusters in hypercube 17.

> Found 1 clusters in hypercube 18.

> Found 1 clusters in hypercube 19.

> Found 1 clusters in hypercube 20.

> Found 1 clusters in hypercube 21.

> Found 1 clusters in hypercube 22.

> Found 1 clusters in hypercube 23.

> Found 1 clusters in hypercube 24.

> Found 1 clusters in hypercube 25.

> Found 1 clusters in hypercube 26.

> Found 1 clusters in hypercube 27.

> Found 1 clusters in hypercube 28.

> Found 1 clusters in hypercube 29.

Created 48 edges and 48 nodes in 0:00:00.072752.

Wrote visualization to: output/horse.html

REFERENCES

1. https://kepler-mapper.scikit-tda.org/en/latest/notebooks/KeplerMapper-Newsgroup20-Pipeline.html

2. https://pypi.org/project/scikit-tda/

3. https://kepler-mapper.scikit-tda.org/en/latest/generated/gallery/plot_cat.html