Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Tips / Data Analysis

Data Analysis

Pandas, Numpy

如何掌握資料: 建立、量測長度、調整大小寫、去除多餘空格、格式化輸出、擷取部分文字、轉換為日期時間格式、根據特徵分隔、判斷特徵存在與否及存在位置、根據特徵取代、正規表達特徵以及應用文字處理函數至陣列上

零售採購 Retail Procurement America Land Use

Matplotlib

Pandas

Code Example   Pandas Cheat Sheet Quick Dive Complete Tutorial to Learn Data Science from Scratch Nick Eubank Useful Snippets Pandas Snippets 範例教學 Scrape Weather Data with Pandas Pandas Dataframe as a Process Tracker (postgres example) Udemy

編號保留 0 符號 編號採 int 形態的話,不能是空值,變通方式是採 float 形態。

Pandas Basics Broadcasting Reshaping Pivot Table

指定 Excel 欄位的資料型別 Split a Column into Two

10 Minutes to Pandas

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

String Processing Methods

Balance Tasks between Pandas and PostgreSQL

Practical Medium Data Analytics with Python (10 Things I Hate About Pandas) Geospatial Analysis 37:30 Filtering on Geodesic Features

GeoPandas + Leaflet: 與既有工具合作 ArcGIS QGIS PostGIS D3 https://medium.com/@Elijah_Meeks/d3-is-not-a-data-visualization-library-67ba549e8520

Bokeh vs Dash Data Cleaning

Geographic Statistical Data with Google Maps

Numpy

Introduction to Data Analytics with Pandas

常態分配亂數 s = np.random.normal(100, 5, 100) 平均數, 標準差, 個數

b = s.astype(int) 轉成整數

b = b.clip(0, 100) 大於 100 則改為 100 小於 0 則改為 0

series = pd.Series(np.random.rand(n))
series = pd.Series(np.random.randint(1, 5, n))
series = pd.Series(np.random.randint(1, 5, n), dtype=np.float64)
series = pd.Series(np.random.randint(1, 5, n), dtype=np.float64, index=[n*x for x in range(n)])
series = pd.Series(np.random.randint(1, 100, n), dtype=np.float64, index=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')[:n])

Circular binary structure for SciPy morphological operations

import numpy as np
 
def circular_structure(radius):
    size = radius*2+1
    i,j = np.mgrid[0:size, 0:size]
    i -= (size/2)
    j -= (size/2)
    return np.sqrt(i**2+j**2) <= radius="" pre="">

eliminate double loop

In [1]: import numpy as np
In [2]: a = np.zeros((1, 2, 3))
In [3]: a.shape
Out[3]: (1, 2, 3)
In [4]: a_sum = a.sum(axis=-1)
In [5]: a_sum.shape
Out[5]: (1, 2)

把最後一個axis黏起來了

In [6]: a_sum0 = a.sum(axis=0)
In [7]: a_sum0.shape
Out[7]: (2, 3)

SciPy

只 import scipy 是拿不到 io 這個 submodule 要用 from .... import 才拿得到,背後機制好玩。

Numba

Better Performance

Dask

PyTubes

Analysing 1.4 billion rows with Python

PySpark

Multi-Class Text Classification

NLP

NLP Fun https://medium.com/@Thomwolf/100-times-faster-natural-language-processing-in-python-ee32033bdced