Distributed Machine Learning with Apache Spark

بواسطة: edX

Overview

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Taught by

Ameet Talwalkar and Jon Bates

Distributed Machine Learning with Apache Spark
الذهاب الي الدورة

Distributed Machine Learning with Apache Spark

بواسطة: edX

  • edX
  • مجانية
  • الإنجليزية
  • متاح شهادة
  • أيام محددة
  • intermediate
  • English
8.1.2PHP Version342msRequest Duration2MBMemory UsageGET ar/الدورات/{slug}Route
    • Booting (218ms)
    • Application (123ms)
    • 1 x Booting (63.81%)
      218.00ms
      1 x Application (35.94%)
      122.80ms
      14 templates were rendered
      • public.courses.show (resources/views/public/courses/show.blade.php)3bladefile
        Params
        0
        course
        1
        links
        2
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.courses.partials.details (resources/views/public/courses/partials/details.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.layouts.main (resources/views/public/layouts/main.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.meta (resources/views/public/layouts/partials/meta.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.navbar (resources/views/public/layouts/partials/navbar.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.links (resources/views/public/auth/profile/partials/links.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.layouts.partials.flash-session (resources/views/public/layouts/partials/flash-session.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      uri
      GET ar/الدورات/{slug}
      middleware
      web, localize:ar
      controller
      App\Http\Controllers\CourseController@show
      as
      ar.courses.show
      namespace
      prefix
      /ar
      where
      file
      app/Http/Controllers/CourseController.php:17-35
      7 statements were executed6.15ms
      • select * from `courses` where `slug_ar` = 'distributed-machine-learning-with-apache-spark' limit 1
        4.61ms/app/Http/Controllers/CourseController.php:20corspedia
        Metadata
        Bindings
        • 0. distributed-machine-learning-with-apache-spark
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:20
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • update `courses` set `visitors` = `visitors` + 1, `courses`.`updated_at` = '2025-05-19 09:42:52' where `id` = 1793
        430μs/app/Http/Controllers/CourseController.php:21corspedia
        Metadata
        Bindings
        • 0. 2025-05-19 09:42:52
        • 1. 1793
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:21
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `topic_id`, `slug_en`, `slug_ar` from `subjects` where `subjects`.`id` in (4)
        240μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `slug_en`, `slug_ar` from `topics` where `topics`.`id` in (1)
        180μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 25. /app/Http/Controllers/CourseController.php:23
        • 26. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 27. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 28. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 29. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `institutions` where `institutions`.`id` in (65) and `institutions`.`deleted_at` is null
        300μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `providers` where `providers`.`id` in (1) and `providers`.`deleted_at` is null
        180μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `html_files` where `html_files`.`id` = 1784 limit 1
        210μs/app/Models/Course.php:84corspedia
        Metadata
        Bindings
        • 0. 1784
        Backtrace
        • 21. /app/Models/Course.php:84
        • 28. view::public.courses.show:29
        • 30. /vendor/laravel/framework/src/Illuminate/Filesystem/Filesystem.php:125
        • 31. /vendor/laravel/framework/src/Illuminate/View/Engines/PhpEngine.php:58
        • 32. /vendor/laravel/framework/src/Illuminate/View/Engines/CompilerEngine.php:72
      App\Models\HtmlFile
      1
      App\Models\Provider
      1
      App\Models\Institution
      1
      App\Models\Topic
      1
      App\Models\Subject
      1
      App\Models\Course
      1
        _token
        tCl602gr5LYXpobbgfHUswYg9mnIYp8aJ1mVzsp5
        locale
        ar
        _previous
        array:1 [ "url" => "https://www.corspedia.com/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distri...
        _flash
        array:2 [ "old" => [] "new" => [] ]
        PHPDEBUGBAR_STACK_DATA
        []
        path_info
        /ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distributed-machine-learning-with-apache-spark
        status_code
        200
        
        status_text
        OK
        format
        html
        content_type
        text/html; charset=UTF-8
        request_query
        []
        
        request_request
        []
        
        request_headers
        0 of 0
        array:24 [ "cf-ipcountry" => array:1 [ 0 => "US" ] "cf-connecting-ip" => array:1 [ 0 => "3.143.144.209" ] "cdn-loop" => array:1 [ 0 => "cloudflare; loops=1" ] "x-forwarded-proto" => array:1 [ 0 => "https" ] "x-forwarded-for" => array:1 [ 0 => "3.143.144.209" ] "sec-fetch-site" => array:1 [ 0 => "none" ] "accept" => array:1 [ 0 => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" ] "user-agent" => array:1 [ 0 => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" ] "upgrade-insecure-requests" => array:1 [ 0 => "1" ] "sec-ch-ua-platform" => array:1 [ 0 => ""Windows"" ] "sec-ch-ua-mobile" => array:1 [ 0 => "?0" ] "sec-ch-ua" => array:1 [ 0 => ""Chromium";v="130", "HeadlessChrome";v="130", "Not?A_Brand";v="99"" ] "cache-control" => array:1 [ 0 => "no-cache" ] "pragma" => array:1 [ 0 => "no-cache" ] "sec-fetch-dest" => array:1 [ 0 => "document" ] "cf-ray" => array:1 [ 0 => "9422a58e0c27fa1d-ORD" ] "accept-encoding" => array:1 [ 0 => "gzip, br" ] "priority" => array:1 [ 0 => "u=0, i" ] "sec-fetch-user" => array:1 [ 0 => "?1" ] "sec-fetch-mode" => array:1 [ 0 => "navigate" ] "cf-visitor" => array:1 [ 0 => "{"scheme":"https"}" ] "host" => array:1 [ 0 => "www.corspedia.com" ] "content-length" => array:1 [ 0 => "" ] "content-type" => array:1 [ 0 => "" ] ]
        request_server
        0 of 0
        array:50 [ "USER" => "www-data" "HOME" => "/var/www" "HTTP_CF_IPCOUNTRY" => "US" "HTTP_CF_CONNECTING_IP" => "3.143.144.209" "HTTP_CDN_LOOP" => "cloudflare; loops=1" "HTTP_X_FORWARDED_PROTO" => "https" "HTTP_X_FORWARDED_FOR" => "3.143.144.209" "HTTP_SEC_FETCH_SITE" => "none" "HTTP_ACCEPT" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" "HTTP_USER_AGENT" => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "HTTP_UPGRADE_INSECURE_REQUESTS" => "1" "HTTP_SEC_CH_UA_PLATFORM" => ""Windows"" "HTTP_SEC_CH_UA_MOBILE" => "?0" "HTTP_SEC_CH_UA" => ""Chromium";v="130", "HeadlessChrome";v="130", "Not?A_Brand";v="99"" "HTTP_CACHE_CONTROL" => "no-cache" "HTTP_PRAGMA" => "no-cache" "HTTP_SEC_FETCH_DEST" => "document" "HTTP_CF_RAY" => "9422a58e0c27fa1d-ORD" "HTTP_ACCEPT_ENCODING" => "gzip, br" "HTTP_PRIORITY" => "u=0, i" "HTTP_SEC_FETCH_USER" => "?1" "HTTP_SEC_FETCH_MODE" => "navigate" "HTTP_CF_VISITOR" => "{"scheme":"https"}" "HTTP_HOST" => "www.corspedia.com" "REDIRECT_STATUS" => "200" "SERVER_NAME" => "corspedia.com" "SERVER_PORT" => "443" "SERVER_ADDR" => "141.95.147.152" "REMOTE_USER" => "" "REMOTE_PORT" => "24202" "REMOTE_ADDR" => "172.70.131.204" "SERVER_SOFTWARE" => "nginx/1.18.0" "GATEWAY_INTERFACE" => "CGI/1.1" "HTTPS" => "on" "REQUEST_SCHEME" => "https" "SERVER_PROTOCOL" => "HTTP/2.0" "DOCUMENT_ROOT" => "/var/www/corspedia/public" "DOCUMENT_URI" => "/index.php" "REQUEST_URI" => "/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distributed-machine-learning-with-apache-spark" "SCRIPT_NAME" => "/index.php" "CONTENT_LENGTH" => "" "CONTENT_TYPE" => "" "REQUEST_METHOD" => "GET" "QUERY_STRING" => "" "SCRIPT_FILENAME" => "/var/www/corspedia/public/index.php" "PATH_INFO" => "" "FCGI_ROLE" => "RESPONDER" "PHP_SELF" => "/index.php" "REQUEST_TIME_FLOAT" => 1747647772.086 "REQUEST_TIME" => 1747647772 ]
        request_cookies
        []
        
        response_headers
        0 of 0
        array:5 [ "content-type" => array:1 [ 0 => "text/html; charset=UTF-8" ] "cache-control" => array:1 [ 0 => "no-cache, private" ] "date" => array:1 [ 0 => "Mon, 19 May 2025 09:42:52 GMT" ] "set-cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6IjZNMGhXT3VrSjY0U3BnM3hiWWFxZ2c9PSIsInZhbHVlIjoiNmlmUmwxVG5pU2Q3bmZnV0ZCTVNmR2tJUkNMTHI5NmMyWnhNc3lxTmw1TXFId0RIL1pBaVBwV2NwTnpqWTFWd25IdzQ4S3dyWHVqQWdNZDRrY2doSGlGdmI3TkdJZzVPR2t6eVd1U3NNbFNTRk1HbXZ1WFNIcitXYktoQ3lEbHAiLCJtYWMiOiI4MWI1MTY4MmE4ZGE5MmQxOGI5M2ViMTdlZDFjMDA2NzY0ZmRlMTFiNjVkZmIzODAxNjQ5NzgzMjQxNWUzMTM5IiwidGFnIjoiIn0%3D; expires=Mon, 19 May 2025 11:42:52 GMT; Max-Age=7200; path=/; samesite=laxXSRF-TOKEN=eyJpdiI6IjZNMGhXT3VrSjY0U3BnM3hiWWFxZ2c9PSIsInZhbHVlIjoiNmlmUmwxVG5pU2Q3bmZnV0ZCTVNmR2tJUkNMTHI5NmMyWnhNc3lxTmw1TXFId0RIL1pBaVBwV2NwTnpqWTFWd25IdzQ4S" 1 => "laravel_session=eyJpdiI6IkdxU3BTRnhGYjNlRmttUVUvWlhEVWc9PSIsInZhbHVlIjoiZUlqN1E3bXZheEwrTGY5azZuaGM5aUhySkVWcEt3aXc2VGlpTVV4RVZTbkZLUGlhdnZWakxMNnl5WWNkc2l0M3RwUUkzT0hLeUYyRVY0VFVIaTZsVlpyNmMzZDcwZlM2NmxsV3lHblFKZm1LRTlVaTNvV25vRkRzS0FXMjZXSzciLCJtYWMiOiI5NWE1YTQ3YWFiMDkxMDgzNjU5M2Q1NWFiYjRhNTNjZjBiMjVlNTNmOWVlYmU4ODExNjg5M2NlOGEyOGU1YmUxIiwidGFnIjoiIn0%3D; expires=Mon, 19 May 2025 11:42:52 GMT; Max-Age=7200; path=/; httponly; samesite=laxlaravel_session=eyJpdiI6IkdxU3BTRnhGYjNlRmttUVUvWlhEVWc9PSIsInZhbHVlIjoiZUlqN1E3bXZheEwrTGY5azZuaGM5aUhySkVWcEt3aXc2VGlpTVV4RVZTbkZLUGlhdnZWakxMNnl5WWNkc2l0M3Rw" ] "Set-Cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6IjZNMGhXT3VrSjY0U3BnM3hiWWFxZ2c9PSIsInZhbHVlIjoiNmlmUmwxVG5pU2Q3bmZnV0ZCTVNmR2tJUkNMTHI5NmMyWnhNc3lxTmw1TXFId0RIL1pBaVBwV2NwTnpqWTFWd25IdzQ4S3dyWHVqQWdNZDRrY2doSGlGdmI3TkdJZzVPR2t6eVd1U3NNbFNTRk1HbXZ1WFNIcitXYktoQ3lEbHAiLCJtYWMiOiI4MWI1MTY4MmE4ZGE5MmQxOGI5M2ViMTdlZDFjMDA2NzY0ZmRlMTFiNjVkZmIzODAxNjQ5NzgzMjQxNWUzMTM5IiwidGFnIjoiIn0%3D; expires=Mon, 19-May-2025 11:42:52 GMT; path=/XSRF-TOKEN=eyJpdiI6IjZNMGhXT3VrSjY0U3BnM3hiWWFxZ2c9PSIsInZhbHVlIjoiNmlmUmwxVG5pU2Q3bmZnV0ZCTVNmR2tJUkNMTHI5NmMyWnhNc3lxTmw1TXFId0RIL1pBaVBwV2NwTnpqWTFWd25IdzQ4S" 1 => "laravel_session=eyJpdiI6IkdxU3BTRnhGYjNlRmttUVUvWlhEVWc9PSIsInZhbHVlIjoiZUlqN1E3bXZheEwrTGY5azZuaGM5aUhySkVWcEt3aXc2VGlpTVV4RVZTbkZLUGlhdnZWakxMNnl5WWNkc2l0M3RwUUkzT0hLeUYyRVY0VFVIaTZsVlpyNmMzZDcwZlM2NmxsV3lHblFKZm1LRTlVaTNvV25vRkRzS0FXMjZXSzciLCJtYWMiOiI5NWE1YTQ3YWFiMDkxMDgzNjU5M2Q1NWFiYjRhNTNjZjBiMjVlNTNmOWVlYmU4ODExNjg5M2NlOGEyOGU1YmUxIiwidGFnIjoiIn0%3D; expires=Mon, 19-May-2025 11:42:52 GMT; path=/; httponlylaravel_session=eyJpdiI6IkdxU3BTRnhGYjNlRmttUVUvWlhEVWc9PSIsInZhbHVlIjoiZUlqN1E3bXZheEwrTGY5azZuaGM5aUhySkVWcEt3aXc2VGlpTVV4RVZTbkZLUGlhdnZWakxMNnl5WWNkc2l0M3Rw" ] ]
        session_attributes
        0 of 0
        array:5 [ "_token" => "tCl602gr5LYXpobbgfHUswYg9mnIYp8aJ1mVzsp5" "locale" => "ar" "_previous" => array:1 [ "url" => "https://www.corspedia.com/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/distributed-machine-learning-with-apache-spark" ] "_flash" => array:2 [ "old" => [] "new" => [] ] "PHPDEBUGBAR_STACK_DATA" => [] ]