Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Tips / Data Analysis

Data Analysis

Pandas, Numpy

https://medium.com/@Thomwolf/100-times-faster-natural-language-processing-in-python-ee32033bdced “Unsupervised Learning with Python” https://towardsdatascience.com/unsupervised-learning-with-python-173c51dc7f03 https://medium.com/@ehiagheaigg/introduction-to-matplotlib-data-visualization-in-python-d9143287ae39 Pandas Cheat Sheet Quick Dive Complete Tutorial to Learn Data Science from Scratch Nick Eubank Useful Snippets Pandas Snippets 範例教學 Scrape Weather Data with Pandas Pandas Dataframe as a Process Tracker (postgres example) Udemy

編號保留 0 符號 編號採 int 形態的話,不能是空值,變通方式是採 float 形態。

Pandas Basics Broadcasting Reshaping Pivot Table

指定 Excel 欄位的資料型別 Split a Column into Two

10 Minutes to Pandas

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

String Processing Methods

Balance Tasks between Pandas and PostgreSQL

Practical Medium Data Analytics with Python (10 Things I Hate About Pandas) Geospatial Analysis 37:30 Filtering on Geodesic Features

GeoPandas + Leaflet: 與既有工具合作 ArcGIS QGIS PostGIS D3 https://medium.com/@Elijah_Meeks/d3-is-not-a-data-visualization-library-67ba549e8520

Bokeh vs Dash

Numpy

Introduction to Data Analytics with Pandas

常態分配亂數 s = np.random.normal(100, 5, 100) 平均數, 標準差, 個數

b = s.astype(int) 轉成整數

b = b.clip(0, 100) 大於 100 則改為 100 小於 0 則改為 0

series = pd.Series(np.random.rand(n))
series = pd.Series(np.random.randint(1, 5, n))
series = pd.Series(np.random.randint(1, 5, n), dtype=np.float64)
series = pd.Series(np.random.randint(1, 5, n), dtype=np.float64, index=[n*x for x in range(n)])
series = pd.Series(np.random.randint(1, 100, n), dtype=np.float64, index=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')[:n])

Circular binary structure for SciPy morphological operations

import numpy as np
 
def circular_structure(radius):
    size = radius*2+1
    i,j = np.mgrid[0:size, 0:size]
    i -= (size/2)
    j -= (size/2)
    return np.sqrt(i**2+j**2) <= radius="" pre="">

SciPy

只 import scipy 是拿不到 io 這個 submodule 要用 from .... import 才拿得到,背後機制好玩。

Numba

Better Performance

Dask

PyTubes

Analysing 1.4 billion rows with Python