Hadoop in Taiwan 2012 講者投稿現況

|
目前講者投稿情形,共計四個開發者議程,三個應用案例,兩個管理者講題。

PS. 比較有趣的是今年大家都用英文命名,真是國際化呀!!

Developer / 開發者
楊詠成(Gibson Yang) / 台灣雅虎 Yahoo! oozie introductionoozie introduction & experience sharing
Chia-Hung LinBulk Synchronous ParallelHadoop MapReduce[1]. is a popular open source framework inspired by functional programming 's map and reduce functions, saving developers lots of works by covering many underlying complicated tasks. However, not all tasks fit into MapReduce's scenario, graph related computation task (e.g. social network analysis) is one such example. Google therefore developed their in-house product, Pregel[2], based on Bulk Synchronous Parallel[3] - a bridge model suitable for performing iterative algorithms, performing large scale graph processing.

Outline:
1. What is Bulk Synchronous Parallel?
2. Apache Hama
3. Comparison between Hadoop MapReduce and Apache Hama

[1]. http://hadoop.apache.org/mapreduce/ [2]. http://dl.acm.org/citation.cfm?id=1582723 [3]. http://dl.acm.org/citation.cfm?id=79173.79181
Laurence Liew / Revolution Analytics, Asia Pacific. General Manager Big Data Analytics - Trends and Best Practises Using case studies from consumer behavior analytics to text mining and sentiment analysis, this session introduces big data analytics & the field of Data Science. An overview of data science and the data scientist toolkit will be presented. A discussion on use of R and Hadoop will also be presented.
Application / 應用案例
講者 / 單位名稱演講主題演講摘要
辜文元 / 逢甲大學GIS中心Hadoop於地理資訊系統之應用案例分享近年來由於遙測技術之快速發展,單幅影像解析度大幅提高使得檔案需要更大的儲存空間,此外動態攝影在環境觀測與記錄使用上也愈來愈廣泛,資料動輒以GB或TB為單位成長,使得遙測資料儲存管理的需求性日益增加。面對如此巨量的資料量往往導致傳統伺服器頻繁的出現儲存空間不足的狀況,雖然傳統伺服器可以增加硬碟來增加儲存空間,但垂直的空間擴展有一定的限制,如何因應日益增加的影像儲存需求,將會是一個很重要的課題。

本研究提出以Hadoop來解決巨量遙測影像儲存問題,利用其內建的HDFS分散式檔案系統之分散儲存特性,將影像檔案分散儲存到不同的雲端節點上,當客戶端數量或客戶端存取量增加時,由於檔案分散儲存,大幅提昇客戶端存取的效率。
Chun-Han Chen / OgilvyOne Mohohan: An on-line video transcoding service via HadoopA famous cloud computing file system and developing framework named Hadoop is mainly designed for massive textual data management, such as counting, sorting, indexing, pattern finding, and so on. However, it is merely to seek a multimedia-oriented service via Hadoop. Mohohan is an on-line multimedia transcoding system for video resources, which implemented with Amazon Web Service (AWS) EC2, AWS S3, AWS EMR, Hadoop, and ffmpeg. Its goal is reducing the overall execution time by parallel transcoding via the Hadoop cluster. The concept of Mohohan is simple: 1) to divide the video into several chunk of frames, 2) to transcode the chunks in parallel with multiple nodes (i.e., task tracker) of Hadoop cluster, and 3) to merge the transcoded results into the output. On the homogeneous SaaS comparison, a test report from an impartial third party organization named CloudHarmony has been chosen. Finally, the experiment result shows that Mohohan performs quite better than other on-line video transcoding services mentioned in the test report, such as Encoding, Zencoder, Sorenson, and Panda.
Vincent Chen / TCloud / Business Development Director 精準行銷上的應用- Hadoop in 移動裝置上網行為分析精準行銷上的應用- Hadoop in 移動裝置上網行為分析:
此應用在於Hadoop平台上,利用MapReduce等相關技術,整合各種移動裝置用戶資料,利用語意分析、資料探勘等分詞、分類技術,定義出完整用戶profile,除了將分析結果轉化成行銷能力,並最終實現人與內容、人與商品、人與人的智能配對。
Administrator / 維運者
Jason Shih / Etu, SYSTEX Corp.Hadoop Security Overview - From Security Infrastructure Deployment to High-Level Services The increasing trend of adoption Hadoop open-source framework for speedy data processing and analytics capabilities for organizations to manage huge data volume have brought attention to enterprise wide security concern aiming for fine grain control of sensitive information and isolation from different level/group of access on sharing storage or computing facilities. Prior to Hadoop 0.20, Unix-like file permission were introduced, providing also cluster-wide simple authentication mechanism but lack of access control per job queue, submission and other operations. With Hadoop's new security feature and it's integration with Kerberos, it's now possible to bring strong authentication and authorization to ensure rigorous access control to data, resources and also isolation between running tasks. In this presentation, we will cover the deployment details of Hadoop security on cluster environment and implementation on high-level services base on kerberized security infrastructure. We introduce also the Etu Appliance providing fast-deployment, system-automation and built-in feature of cross-realm trust mechanism which fulfill the interoperation between existing Active Domain or external LDAP realm and help reducing both integration and operation-wide overhead from administrators.
Kenneth Ho Hadoop hardware and network best practices.