Lazy loaded image
Python 项目练习
Words 1064Read Time 3 min
2024-11-19
用 Python 提取《釜山行》人物关系
 

Network Mining Based on Co-occurrence

Main Points

  1. Network based on conventional co-occurrence methods
  1. Capture structured data from unstructured data set

Introduction

Generate network based on co-occurrence was proposed several decades ago, however, it still occupies most of papers talking about network discovering. You can exploit structured data network, use them to generate a graph from a praph text , online text or even video.

Prerequistes

WorkFlows

1. Entity Identification (determine the set of nodes )
Generate a network for entity set from a given data set, in fact, in some few cases generating a network for a movie like the example above, very few main entites appear in a movie ,wo we can get their identifiaction for the web or generate them for yourselves.
regress method (binary classification)
SVM (the characteristics for nodes)
deep learning algorithm(vonventional nerual network)
2. Relationship Identification
This project will generate the relationships between two nodes based on the methods ,that what just methioned above, the convention co-occurrence methods. the block of codes will build an edge for two nodes if they occur in a same paragraph.If there always been an edge for two nodes, the weight of that edge will be increased. Once the data set is big enough, the main line of the data set will appear.
 

Scripts

 
Note:
the co-occurrence methond only is applicable for the data set that have obvious centralization, edge with lower weight will always be redundant. two method will be appied to reduce the redundancy degree
  1. The first way is filter
  1. The second way is segmenting your network.

Batch Renaming Filename

Main Points

  1. **
  1. **
  1. **

Introduction

 

Prerequistes

WorkFlows

1. **
2. **
3. **
 

Codes

 

Reference

map()
string.replace()
zip()
delimiter.join(list_you_want_to_join)
"str1"+"str2"
 
re.findall(pat,tex)

Working Log

Main Points

  1. **
  1. **
  1. **

Introduction

 

Prerequistes

 

WorkFlows

1. **
2. **
3. **

Codes

 
 

Python ToolBoxs

docx第三方库

Formatted Data

 
notion image

Batch Get Download Link

Python 内置的 argparse 库,这个库可以让你以命令行地方式来运行 Python 程序
上一篇
模板设计模式:让你的代码结构更清晰
下一篇
Guide to Linux System

Comments
Loading...